• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Tuesday, July 22, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Data Science

The Fundamentals of Debugging Python Issues

Admin by Admin
July 21, 2025
in Data Science
0
Rosidi debugging python problems 1.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Debugging Python ProblemsDebugging Python Problems
Picture by Writer | Canva

 

Ever run a Python script and instantly wished you hadn’t pressed Enter?

Debugging in information science is not only an act; it’s a survival talent — notably when coping with messy datasets or devising prediction fashions on which precise folks rely.

On this article, we’ll discover the fundamentals of debugging, particularly in your information science workflows, utilizing a real-life dataset from a DoorDash supply job, and most significantly, easy methods to debug like a professional.

 

DoorDash Supply Length Prediction: What Are We Dealing With?

 
Debugging Python Problems in Delivery Duration PredictionDebugging Python Problems in Delivery Duration Prediction
 
On this information undertaking, DoorDash requested its information science candidates to foretell the supply length. Let’s first take a look at the dataset data. Right here is the code:

 

Right here is the output:

 
 Debugging Python Problem in Predicting Delivery Duration Debugging Python Problem in Predicting Delivery Duration
 

It appears that evidently they didn’t present the supply length, so it is best to calculate it right here. It’s easy, however no worries in case you are a newbie. Let’s see how it may be calculated.

import pandas as pd
from datetime import datetime

# Assuming historical_data is your DataFrame
historical_data["created_at"] = pd.to_datetime(historical_data['created_at'])
historical_data["actual_delivery_time"] = pd.to_datetime(historical_data['actual_delivery_time'])
historical_data["actual_total_delivery_duration"] = (historical_data["actual_delivery_time"] - historical_data["created_at"]).dt.total_seconds()
historical_data.head()

 

Right here is the output’s head; you’ll be able to see the actual_total_delivery_duration.

 
Output of Debugging Python Problem of Delivery Duration PredictionOutput of Debugging Python Problem of Delivery Duration Prediction
 

Good, now we are able to begin! However earlier than that, right here is the information definition language for this dataset.

 

Columns in historical_data.csv

Time options:

  • market_id: A metropolis/area during which DoorDash operates, e.g., Los Angeles, given within the information as an id.
  • created_at: Timestamp in UTC when the order was submitted by the buyer to DoorDash. (Notice: this timestamp is in UTC, however in case you want it, the precise timezone of the area was US/Pacific).
  • actual_delivery_time: Timestamp in UTC when the order was delivered to the buyer.

Retailer options:

  • store_id: An ID representing the restaurant the order was submitted for.
  • store_primary_category: Delicacies class of the restaurant, e.g., Italian, Asian.
  • order_protocol: A retailer can obtain orders from DoorDash by way of many modes. This area represents an ID denoting the protocol.

Order options:

  • total_items: Complete variety of gadgets within the order.
  • subtotal: Complete worth of the order submitted (in cents).
  • num_distinct_items: Variety of distinct gadgets included within the order.
  • min_item_price: Value of the merchandise with the least value within the order (in cents).
  • max_item_price: Value of the merchandise with the very best value within the order (in cents).

Market options:

DoorDash being a market, we now have info on the state of {the marketplace} when the order is positioned, which can be utilized to estimate supply time. The next options are values on the time of created_at (order submission time):

  • total_onshift_dashers: Variety of obtainable dashers who’re inside 10 miles of the shop on the time of order creation.
  • total_busy_dashers: Subset of the above total_onshift_dashers who’re presently engaged on an order.
  • total_outstanding_orders: Variety of orders inside 10 miles of this order which are presently being processed.

Predictions from different fashions:

Now we have predictions from different fashions for varied levels of the supply course of that we are able to use:

  • estimated_order_place_duration: Estimated time for the restaurant to obtain the order from DoorDash (in seconds).
  • estimated_store_to_consumer_driving_duration: Estimated journey time between the shop and shopper (in seconds).

Nice, so let’s get began!

 

Frequent Python Errors in Information Science Initiatives

 
Common Python Errors in Data Science ProjectsCommon Python Errors in Data Science Projects
 

On this part, we’ll uncover frequent debugging errors in one of many information science tasks, beginning with studying the dataset and going by way of to crucial half: modeling.

 

Studying the Dataset: FileNotFoundError, Dtype Warning, and Fixes

 

Case 1: File Not Discovered — Basic

In information science, your first bug usually greets you at read_csv. And never with a hi there. Let’s debug that actual second collectively, line by line. Right here is the code:

import pandas as pd

strive:
    df = pd.read_csv('Strata Questions/historical_data.csv')
    df.head(3)
besides FileNotFoundError as e:
    import os
    print("File not discovered. This is the place Python is trying:")
    print("Working listing:", os.getcwd())
    print("Obtainable recordsdata:", os.listdir())
    elevate e

 

Right here is the output.

 
Debugging Python Errors in Data Science ProjectsDebugging Python Errors in Data Science Projects
 

You don’t simply elevate an error—you interrogate it. This reveals the place the code thinks it’s and what it sees round it. In case your file’s not on the listing, now you already know. No guessing. Simply information.

Change the trail with the complete one, and voilà!

Debugging Python Errors in File Not FoundDebugging Python Errors in File Not Found
 

Case 2: Dtype Misinterpretation — Python’s Quietly Flawed Guess

You load the dataset, however one thing’s off. The bug hides inside your sorts.

# Assuming df is your loaded DataFrame
strive:
    print("Column Varieties:n", df.dtypes)
besides Exception as e:
    print("Error studying dtypes:", e)

 

Right here is the output.

 
Debugging Python Errors in Dtype MisinterpretationDebugging Python Errors in Dtype Misinterpretation
 

Case 3: Date Parsing — The Silent Saboteur

We found that we must always calculate the supply length first, and we did it with this methodology.

strive:
    # This code was proven earlier to calculate the supply length
    df["created_at"] = pd.to_datetime(df['created_at'])
    df["actual_delivery_time"] = pd.to_datetime(df['actual_delivery_time'])
    df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()
    print("Efficiently calculated supply length and checked dtypes.")
    print("Related dtypes:n", df[['created_at', 'actual_delivery_time', 'actual_total_delivery_duration']].dtypes)
besides Exception as e:
    print("Error throughout date processing:", e)

 

Right here is the output.

 
Debugging Python Errors in Data ParsingDebugging Python Errors in Data Parsing
 

Good {and professional}! Now we keep away from these purple errors, which is able to elevate our temper—I do know seeing them can dampen your motivation.

 

Dealing with Lacking Information: KeyErrors, NaNs, and Logical Pitfalls

 
Some bugs don’t crash your code. They only provide the mistaken outcomes, silently, till you surprise why your mannequin is trash.

This part digs into lacking information—not simply easy methods to clear it, however easy methods to debug it correctly.

 

Case 1: KeyError — You Thought That Column Existed

Right here is our code.

strive:
    print(df['store_rating'])
besides KeyError as e:
    print("Column not discovered:", e)
    print("Listed below are the obtainable columns:n", df.columns.tolist())

 

Right here is the output.

 
KeyError in Debugging Python problemsKeyError in Debugging Python problems
 

The code did not break due to logic; it broke due to an assumption. That’s exactly the place debugging lives. All the time listing your columns earlier than accessing them blindly.

 

Case 2: NaN Depend — Lacking Values You Didn’t Anticipate

You assume the whole lot’s clear. However real-world information at all times hides gaps. Let’s test for them.

strive:
    null_counts = df.isnull().sum()
    print("Nulls per column:n", null_counts[null_counts > 0])
besides Exception as e:
    print("Failed to examine nulls:", e)

 

Right here is the output.

 
NaN Count in Debugging Python problemsNaN Count in Debugging Python problems
 

This exposes the silent troublemakers. Possibly store_primary_category is lacking in 1000’s of rows. Possibly timestamps failed conversion and at the moment are NaT.

You wouldn’t have recognized until you checked. Debugging — confirming each assumption.

 

Case 3: Logical Pitfalls — Lacking Information That Isn’t Really Lacking

Let’s say you attempt to filter orders the place the subtotal is bigger than 1,000,000, anticipating a whole lot of rows. However this offers you zero:

strive:
    filtered = df[df['subtotal'] > 1000000]
    print("Rows with subtotal > 1,000,000:", filtered.form[0])
besides Exception as e:
    print("Filtering error:", e)

 

That’s not a code error—it’s a logic error. You anticipated high-value orders, however possibly none exist above that threshold. Debug it with a variety test:

print("Subtotal vary:", df['subtotal'].min(), "to", df['subtotal'].max())

 

Right here is the output.

 
Logical Pitfalls in Debugging Python ProblemsLogical Pitfalls in Debugging Python Problems
 

Case 4: isna() ≠ Zero Doesn’t Imply It’s Clear

Even when isna().sum() reveals zero, there is likely to be soiled information, like whitespace or ‘None’ as a string. Run a extra aggressive test:

strive:
    fake_nulls = df[df['store_primary_category'].isin(['', ' ', 'None', None])]
    print("Rows with faux lacking classes:", fake_nulls.form[0])
besides Exception as e:
    print("Faux lacking worth test failed:", e)

 

This catches hidden trash that isnull() misses.

 
Handling Missing Data in Debugging Python ProblemsHandling Missing Data in Debugging Python Problems
 

Characteristic Engineering Glitches: TypeErrors, Date Parsing, and Extra

Characteristic engineering appears enjoyable at first, till your new column breaks each mannequin or throws a TypeError mid-pipeline. Right here’s easy methods to debug that section like somebody who’s been burned earlier than.

 

Case 1: You Assume You Can Divide, However You Can’t

Let’s create a brand new characteristic. If an error happens, our try-except block will catch it.

strive:
    df['value_per_item'] = df['subtotal'] / df['total_items']
    print("value_per_item created efficiently")
besides Exception as e:
    print("Error occurred:", e)

 

Right here is the output.

 
Feature Engineering Glitches in Debugging Python ProblemsFeature Engineering Glitches in Debugging Python Problems
 

No errors? Good. However let’s look nearer.

print(df[['subtotal', 'total_items', 'value_per_item']].pattern(3))

 

Right here is the output.

 
Feature Engineering Glitches in Debugging Python ProblemsFeature Engineering Glitches in Debugging Python Problems
 

Case 2: Date Parsing Gone Flawed

Now, altering your dtype is necessary, however what when you assume the whole lot was performed appropriately, but issues persist?

# That is the usual approach, however it will possibly fail silently on combined sorts
df["created_at"] = pd.to_datetime(df["created_at"])
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"])

 

You may assume it’s okay, but when your column has combined sorts, it may fail silently or break your pipeline. That’s why, as an alternative of instantly making transformations, it is higher to make use of a strong perform.

from datetime import datetime

def parse_date_debug(df, col):
    strive:
        parsed = pd.to_datetime(df[col])
        print(f"[SUCCESS] '{col}' parsed efficiently.")
        return parsed
    besides Exception as e:
        print(f"[ERROR] Did not parse '{col}':", e)
        # Discover non-date-like values to debug
        non_datetimes = df[pd.to_datetime(df[col], errors="coerce").isna()][col].distinctive()
        print("Pattern values inflicting difficulty:", non_datetimes[:5])
        elevate

df["created_at"] = parse_date_debug(df, "created_at")
df["actual_delivery_time"] = parse_date_debug(df, "actual_delivery_time")

 

Right here is the output.

 
Wrong Date Parsing in Debugging Python ProblemsWrong Date Parsing in Debugging Python Problems
 

This helps you hint defective rows when datetime parsing crashes.

 

Case 3: Naive Division That May Mislead

This received’t throw an error in our DataFrame because the columns are already numeric. However this is the problem: some datasets sneak in object sorts, even after they appear like numbers. That results in:

  • Deceptive ratios
  • Flawed mannequin habits
  • No warnings
df["busy_dashers_ratio"] = df["total_busy_dashers"] / df["total_onshift_dashers"]

 

Let’s validate sorts earlier than computing, even when the operation received’t throw an error.

import numpy as np

def create_ratio_debug(df, num_col, denom_col, new_col):
    num_type = df[num_col].dtype
    denom_type = df[denom_col].dtype

    if not np.issubdtype(num_type, np.quantity) or not np.issubdtype(denom_type, np.quantity):
        print(f"[TYPE WARNING] '{num_col}' or '{denom_col}' shouldn't be numeric.")
        print(f"{num_col}: {num_type}, {denom_col}: {denom_type}")
        df[new_col] = np.nan
        return df
    
    if (df[denom_col] == 0).any():
        print(f"[DIVISION WARNING] '{denom_col}' comprises zeros.")
    
    df[new_col] = df[num_col] / df[denom_col]
    return df

df = create_ratio_debug(df, "total_busy_dashers", "total_onshift_dashers", "busy_dashers_ratio")

 

Right here is the output.

 
Naive Division Misleading in Debugging Python ProblemsNaive Division Misleading in Debugging Python Problems
 

This provides visibility into potential division-by-zero points and prevents silent bugs.

 

Modeling Errors: Form Mismatch and Analysis Confusion

 

Case 1: NaN Values in Options Trigger Mannequin to Crash

Let’s say we need to construct a linear regression mannequin. LinearRegression() doesn’t help NaN values natively. If any row in X has a lacking worth, the mannequin refuses to coach.

Right here is the code, which intentionally creates a form mismatch to set off an error:

from sklearn.linear_model import LinearRegression

X_train = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]].iloc[:-10]
y_train = df["actual_total_delivery_duration"].iloc[:-5] 
mannequin = LinearRegression()
mannequin.match(X_train, y_train)

 

Right here is the output.

 
Modeling Mistakes in Debugging Python ProblemsModeling Mistakes in Debugging Python Problems
 

Let’s debug this difficulty. First, we test for NaNs.

print(X_train.isna().sum())

 

Right here is the output.

 
Debugging Python Problems in NaN ValuesDebugging Python Problems in NaN Values
 

Good, let’s test the opposite variable too.

print(y_train.isna().sum())

 

Right here is the output.

 
Debugging Python Problems in NaN ValuesDebugging Python Problems in NaN Values
 

The mismatch and NaN values have to be resolved. Right here is the code to repair it.

from sklearn.linear_model import LinearRegression

# Re-align X and y to have the identical size
X = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]]
y = df["actual_total_delivery_duration"]

# Step 1: Drop rows with NaN in options (X)
valid_X = X.dropna()

# Step 2: Align y to match the remaining indices of X
y_aligned = y.loc[valid_X.index]

# Step 3: Discover indices the place y shouldn't be NaN
valid_idx = y_aligned.dropna().index

# Step 4: Create last clear datasets
X_clean = valid_X.loc[valid_idx]
y_clean = y_aligned.loc[valid_idx]

mannequin = LinearRegression()
mannequin.match(X_clean, y_clean)
print("✅ Mannequin educated efficiently!")

 

And voilà! Right here is the output.

 
Dataset of Debugging Python ProblemsDataset of Debugging Python Problems
 

Case 2: Object Columns (Dates) Crash the Mannequin

Let’s say you attempt to prepare a mannequin utilizing a timestamp like actual_delivery_time.

However — oh no — it is nonetheless an object or datetime sort, and also you by accident combine it with numeric columns. Linear regression doesn’t like that one bit.

from sklearn.linear_model import LinearRegression

X = df[["actual_delivery_time", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]

mannequin = LinearRegression()
mannequin.match(X, y)

 

Right here is the error code:

 
Debugging Python in Object ColumnsDebugging Python in Object Columns
 

You are combining two incompatible information sorts within the X matrix:

  • One column (actual_delivery_time) is datetime64.
  • The opposite (estimated_order_place_duration) is int64.

Scikit-learn expects all options to be the identical numeric dtype. It could’t deal with combined sorts like datetime and int. Let’s clear up it by changing the datetime column to a numeric illustration (Unix timestamp).

# Guarantee datetime columns are parsed appropriately, coercing errors to NaT
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"], errors="coerce")
df["created_at"] = pd.to_datetime(df["created_at"], errors="coerce")

# Recalculate length in case of latest NaNs
df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()

# Convert datetime to a numeric characteristic (Unix timestamp in seconds)
df["delivery_time_timestamp"] = df["actual_delivery_time"].astype("int64") // 10**9

 

Good. Now that the dtypes are numeric, let’s apply the ML mannequin.

from sklearn.linear_model import LinearRegression

# Use the brand new numeric timestamp characteristic
X = df[["delivery_time_timestamp", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]

# Drop any remaining NaNs from our characteristic set and goal
X_clean = X.dropna()
y_clean = y.loc[X_clean.index].dropna()
X_clean = X_clean.loc[y_clean.index]

mannequin = LinearRegression()
mannequin.match(X_clean, y_clean)
print("✅ Mannequin educated efficiently!")

 

Right here is the output.

 
Debugging Python in Object ColumnsDebugging Python in Object Columns
 

Nice job!

 

Ultimate Ideas: Debug Smarter, Not More durable

 
Mannequin crashes don’t at all times stem from complicated bugs — typically, it is only a stray NaN or an unconverted date column sneaking into your information pipeline.

Quite than wrestling with cryptic stack traces or tossing try-except blocks like darts at nighttime, dig into your DataFrame early. Peek at .data(), test .isna().sum(), and don’t draw back from .dtypes. These easy steps unveil hidden landmines earlier than you even hit match().

I’ve proven you that even one missed object sort or a sneaky lacking worth can sabotage a mannequin. However with a sharper eye, cleaner prep, and intentional characteristic extraction, you’ll shift from debugging reactively to constructing intelligently.
 
 

Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor educating analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from prime corporations. Nate writes on the most recent traits within the profession market, offers interview recommendation, shares information science tasks, and covers the whole lot SQL.



READ ALSO

From Immediate to Coverage: Constructing Moral GenAI Chatbots for Enterprises

How CIS Credentials Can Launch Your AI Growth Profession

Tags: BasicsDebuggingProblemsPython

Related Posts

Ethical genai chatbots cover.webp.webp
Data Science

From Immediate to Coverage: Constructing Moral GenAI Chatbots for Enterprises

July 22, 2025
Christina wocintechchat com 6dv3pe jnsg unsplash.jpg
Data Science

How CIS Credentials Can Launch Your AI Growth Profession

July 21, 2025
Exxact logo 2 1 dark background 0725.png
Data Science

From Reactive to Proactive: The Rise of Agentic AI

July 20, 2025
Fuzzy matching.png
Data Science

How Fuzzy Matching and Machine Studying Are Reworking AML Expertise

July 20, 2025
Awan 7 python web development frameworks 1.png
Data Science

7 Python Net Growth Frameworks for Knowledge Scientists

July 19, 2025
Image.jpeg
Data Science

AI And The Acceleration Of Data Flows From Fund Managers To Buyers

July 19, 2025
Next Post
Prediction20markets id a6e573a9 a192 45cb 9cfb fb6521f4d798 size900.jpg

Prediction Platform Polymarket Buys QCEX Change in $112 Million Deal to Reenter the U.S.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
1da3lz S3h Cujupuolbtvw.png

Scaling Statistics: Incremental Customary Deviation in SQL with dbt | by Yuval Gorchover | Jan, 2025

January 2, 2025
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024

EDITOR'S PICK

July week 5 crypto outlook btc eth sol on the radar.webp.webp

BTC, ETH, & SOL on the Radar

July 27, 2024
Caption Transformer.jpg

Picture Captioning, Transformer Mode On

March 9, 2025
Drivenets logo 2 1 0625.png

Re-Engineering Ethernet for AI Cloth

July 14, 2025
1bmlekg4e8dwnwmfqpry4ag.jpeg

Subject Modelling in Enterprise Intelligence: FASTopic and BERTopic in Code | by Petr Korab | Jan, 2025

January 23, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • How To Considerably Improve LLMs by Leveraging Context Engineering
  • From Immediate to Coverage: Constructing Moral GenAI Chatbots for Enterprises
  • Prediction Platform Polymarket Buys QCEX Change in $112 Million Deal to Reenter the U.S.
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?