Why Is My Code So Gradual? A Information to Py-Spy Python Profiling

How one can Make Your AI App Quicker and Extra Interactive with Response Streaming

Following Up on Like-for-Like for Shops: Dealing with PY

irritating points to debug in information science code aren’t syntax errors or logical errors. Moderately, they arrive from code that does precisely what it’s presupposed to do, however takes its candy time doing it.

Purposeful however inefficient code could be a large bottleneck in a knowledge science workflow. On this article, I’ll present a quick introduction and walk-through of py-spy, a strong software designed to profile your Python code. It will possibly pinpoint precisely the place your program is spending essentially the most time so inefficiencies might be recognized and corrected.

Instance Drawback

Let’s arrange a easy analysis query to put in writing some code for:

“For all flights going between US states and territories, which departing airport has the longest flights on common?”

Under is a straightforward Python script to reply this analysis query, utilizing information retrieved from the Bureau of Transportation Statistics (BTS). The dataset consists of knowledge from each flight inside US states and territories between January and June of 2025 with data on the origin and vacation spot airports. It’s roughly 3.5 million rows.

It calculates the Haversine Distance — the shortest distance between two factors on a sphere — for every flight. Then, it teams the outcomes by departing airport to seek out the common distance and stories the highest 5.

import pandas as pd  
import math  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = math.radians(lat_1)  
    lon_1_rad = math.radians(lon_1)  
    lat_2_rad = math.radians(lat_2)  
    lon_2_rad = math.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    haversine_dists = []  
    for i, row in flights_df.iterrows():  
        haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],  
                                         lon_1=row["LONGITUDE_ORIGIN"],  
                                         lat_2=row["LATITUDE_DEST"],  
                                         lon_2=row["LONGITUDE_DEST"]))  
  
    flights_df["Distance"] = haversine_dists  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Operating this code provides the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Ok Inouye Worldwide        2211.857407
Took 169.8935534954071 s

These outcomes make sense, because the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all areas exterior of the contiguous United States the place one would anticipate lengthy common flight distances.

The issue right here isn’t the outcomes — that are legitimate — however the execution time: nearly three minutes! Whereas three minutes is perhaps tolerable for a one-off run, it turns into a productiveness killer throughout growth. Think about this as a part of an extended information pipeline. Each time a parameter is tweaked, a bug is mounted, or a cell is re-run, you’re pressured to sit down idle whereas this system runs. That friction breaks your circulate and turns a fast evaluation into an all-afternoon affair.

Now let’s see how py-spy may help us diagnose precisely what traces are taking so lengthy.

What Is Py-Spy?

To grasp what py-spy is doing and the advantages of utilizing it, it helps to check py-spy to the built-in Python profiler cProfile.

cProfile: This can be a Tracing Profiler, working much like a stopwatch on every perform name. The time between every perform name and return is measured and reported. Whereas extremely correct, this provides important overhead, because the profiler has to continuously pause and report information, which may decelerate the script considerably.
py-spy: This can be a Sampling Profiler, working much like a excessive pace digicam wanting on the complete program directly. py-spy sits fully exterior the operating Python script and takes high-frequency snapshots of this system’s state. It seems on the whole “Name Stack” to see precisely what line of code is being run and what perform known as it, all the way in which as much as the highest stage.

Operating Py-spy

With the intention to run py-spy on a Python script, the py-spy library should be put in within the Python setting.

pip set up py-spy

As soon as the py-spy library is put in, our script might be profiled by operating the next command within the terminal:

py-spy report -o profile.svg -r 100 -- python essential.py

Here’s what every a part of this command is definitely doing:

py-spy: Calls the software.
report: This tells py-spy to make use of its “report” mode, which is able to repeatedly monitor this system whereas it runs and saves the info.
-o profile.svg: This specifies the output filename and format, telling it to output the outcomes as an SVG file known as profile.svg.
-r 100: This specifies the sampling price, setting it to 100 occasions per second. Which means that py-spy will test what this system is doing 100 occasions per second.
--: This separates the py-spy command from the Python script command. It tells py-spy that every thing following this flag is the command to run, not arguments for py-spy itself.
python essential.py: That is the command to run the Python script to be profiled with py-spy, on this case operating essential.py.

Be aware: If operating on Linux, sudo privileges are sometimes a requirement for operating py-spy, for safety causes.

After this command is completed operating, an output file profile.svg will seem which is able to permit us to dig deeper into what components of the code are taking the longest.

Py-spy Output

Opening up the output profile.svg reveals the visualization that py-spy has created for a way a lot time our program spent in several components of the code. This is called a Icicle Graph (or generally a Flame Graph if the y-axis is inverted) and is interpreted as follows:

Bars: Every coloured bar represents a specific perform that was known as through the execution of this system.
X-axis (Inhabitants): The horizontal axis represents the gathering of all samples taken through the profiling. They’re grouped in order that the width of a specific bar represents the proportion of the overall samples that this system was within the perform represented by that bar. Be aware: That is not a timeline; the ordering doesn’t signify when the perform was known as, solely the overall quantity of time spent.
Y-axis (Stack Depth): The vertical axis represents the decision stack. The highest bar labeled “all” represents the complete program, and the bars beneath it signify capabilities known as from “all”. This continues down recursively with every bar damaged down into the capabilities that have been known as throughout its execution. The very backside bar reveals the perform that was really operating on the CPU when the pattern was taken.

Interacting with the Graph

Whereas the picture above is static, the precise .svg file generated by py-spy is totally interactive. Whenever you open it in an internet browser, you possibly can:

Search (Ctrl+F): Spotlight particular capabilities to see the place they seem within the stack.
Zoom: Click on on any bar to zoom in on that particular perform and its kids, permitting you to isolate advanced components of the decision stack.
Hover: Hovering over any bar shows the precise perform title, file path, line quantity, and the precise proportion of time it consumed.

Probably the most essential rule for studying the icicle graph is solely: The broader the bar, the extra frequent the perform. If a perform bar spans 50% of the graph’s width, it signifies that this system was engaged on executing that perform for 50% of the overall runtime.

Prognosis

From the icicle graph above, we will see that the bar representing the Pandas iterrows() perform is noticeably extensive. Hovering over that bar when viewing the profile.svg file reveals that the true proportion for this perform was 68.36%. So over 2/3 of the runtime was spent within the iterrows() perform. Intuitively this bottleneck is smart, as iterrows() creates a Pandas Sequence object for each single row within the loop, inflicting large overhead. This reveals a transparent goal to try to optimize the runtime of the script.

Optimizing The Script

The clearest path to optimize this script primarily based on what was discovered from py-spy is to cease utilizing iterrows() to loop over each row to calculate that haversine distance. As an alternative, it must be changed with a vectorized calculation utilizing NumPy that can do the calculation for each row with only one perform name. So the modifications to be made are:

Rewrite the haversine() perform to make use of vectorized and environment friendly C-level NumPy operations that permit complete arrays to be handed in reasonably than one set of coordinates at a time.
Change the iterrows() loop with a single name to this newly vectorized haversine() perform.

import pandas as pd  
import numpy as np  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = np.radians(lat_1)  
    lon_1_rad = np.radians(lon_1)  
    lat_2_rad = np.radians(lat_2)  
    lon_2_rad = np.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],  
                                       lon_1=flights_df["LONGITUDE_ORIGIN"],  
                                       lat_2=flights_df["LATITUDE_DEST"],  
                                       lon_2=flights_df["LONGITUDE_DEST"])  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Operating this code provides the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Ok Inouye Worldwide        2211.857407
Took 0.5649983882904053 s

These outcomes are similar to the outcomes from earlier than the code was optimized, however as an alternative of taking almost three minutes to course of, it took simply over half a second!

Wanting Forward

If you’re studying this from the longer term (late 2026 or past), test in case you are operating Python 3.15 or newer. Python 3.15 is predicted to introduce a local sampling profiler in the usual library, providing related performance to py-spy with out requiring exterior set up. For anybody on Python 3.14 or older py-spy stays the gold commonplace.

This text explored a software for tackling a standard frustration in information science — a script that capabilities as supposed, however is inefficiently written and takes a very long time to run. An instance script was supplied to study which US departure airports have the longest common flight distance in accordance with the Haversine distance. This script labored as anticipated, however took nearly three minutes to run.

Utilizing the py-spy Python profiler, we have been capable of study that the reason for the inefficiency was using the iterrows() perform. By changing iterrows() with a extra environment friendly vectorized calculation of the Haversine distance, the runtime was optimized from three minutes down to only over half a second.

See my GitHub Repository for the code from this text, together with the preprocessing of the uncooked information from BTS.

Thanks for studying!

Information Sources

Information from the Bureau of Transportation Statistics (BTS) is a piece of the U.S. Federal Authorities and is within the public area beneath 17 U.S.C. § 105. It’s free to make use of, share, and adapt with out copyright restriction.