• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Thursday, July 2, 2026
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

Pandas Isn’t Going Anyplace: Why It’s Nonetheless My Go-To for Knowledge Wrangling

Admin by Admin
May 17, 2026
in Artificial Intelligence
0
Efe yagiz soysal sgu7 izn8m8 unsplash medium.jpeg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Why Highly effective ML Is Deceptively Simple — Half 2

2026 BAIR Graduate Showcase – The Berkeley Synthetic Intelligence Analysis Weblog


studying knowledge science in 2020, Pandas was one of the in style instruments. Though new instruments deal with enhancing Pandas’ weaknesses in dealing with very giant datasets, I nonetheless use Pandas for a lot of knowledge cleansing, processing, and evaluation duties. Sure, Pandas offers me a tough time when working with billions of rows, however it’s positively greater than sufficient for working with something under that.

I see Pandas being utilized in not just for EDA or in notebooks but additionally in manufacturing methods.

On this article, I’ll go over some knowledge cleansing and processing operations to exhibit how succesful Pandas is.

Let’s begin with the dataset, which accommodates inventory preserving items (SKUs) and a search API responses for these SKUs.

import pandas as pd

search_results = pd.read_csv("search_results.csv")

search_results.head()

Search result’s a listing of dictionaries and appears like this:

search_results.loc[0, "search_result"]

"[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}}, 
{'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}}, 
{'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}}, 
{'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}}, 
{'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}}, 
{'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}}, 
{'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}}, 
{'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}}, 
{'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}}, 
{'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}] 
... and 5 entities remaining"

As we see within the output, it’s not a correct record of dictionary format due to the final half (“… and 5 entities remaining”). Additionally, it’s saved as a single string.

With a purpose to make higher use of it, we have to convert it to a correct record of dictionaries. The next line of code removes the final half by splitting the string at “…” and takes the primary cut up.

search_results.loc[0, "search_result"].cut up("...")[0].strip()

Nonetheless, the output continues to be a single string. We are able to use the built-in ast module of Python to transform it to a listing:

import ast

res = ast.literal_eval(search_results.loc[0, "search_result"].cut up("...")[0].strip())

res

[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
 {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
 {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
 {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
 {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
 {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
 {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]

We now have the search outcomes as a correct record of dictionaries. This was just for a single row. We have to apply the identical operation to all SKUs (i.e. total SKU column).

One possibility is to go over all of the rows in a for loop and carry out the identical operation. Nonetheless, this isn’t the best choice. We should always desire vectorized operations once we can. A vectorized operation principally means executing the code on all rows directly.

On a single row, I used splitting to do away with the final a part of the string nevertheless it didn’t work in a vectorized operation. A extra strong possibility appears to be utilizing a regex.

search_results.loc[:, 'search_result'] = search_results['search_result'].str.substitute(r"....*", "", regex=True).str.strip()

This code selects “…” and all the things that comes after it and replaces them with nothing. In different phrases, it removes “… and 5 entities remaining” half.

We now have all of the rows within the search outcomes column as a correct record of dictionaries.

search_results.loc[10, "search_result"]

"[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
 {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
 {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
 {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
 {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
 {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
 {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
 {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]"

They’re nonetheless saved as a string however I can simply convert them to a listing utilizing the ast module, which I’ll do within the subsequent step.

What I’m occupied with is the SKUs returned within the search outcomes. I’ll create a brand new column by extracting the SKUs within the dictionaries. I can entry them utilizing the “my_id” key of the dictionary.

There are 3 components of this operation:

  • Convert the search outcome string to record utilizing the literal_eval operate
  • Extract SKU from the my_id key of the dictionary
  • Do that in a listing comprehension to get SKUs from all of the dictionaries within the record

We are able to do all these operations by making use of a lambda operate to all rows as follows:

search_results.loc[:, "result_skus"] = 
search_results["search_result"].apply(lambda x: [item['my_id'] for merchandise in ast.literal_eval(x)])

search_results.head()

Every row within the result_skus column accommodates a listing of 10 SKUs. Let’s say I have to have these 10 SKUs in numerous rows. For every row within the sku column, there will likely be 10 rows created from the record within the result_skus column. There’s a quite simple method of doing this in Pandas, which is the explode operate.

knowledge = search_results[["sku", "result_skus"]].explode("result_skus", ignore_index=True)

knowledge.head()

We created a brand new dataframe with sku and result_skus column. The drawing under demonstrates what the explode operate does:

Contemplate the alternative. Now we have a dataframe as proven above however need to have all outcomes for an sku in a single row.

We are able to use the groupby operate to group the rows by sku after which apply the record operate on the result_skus column:

new_data = knowledge.groupby("sku", as_index=False)["result_skus"].apply(record)

new_data.head()

This may get us again to the earlier step:

Utilizing the explode operate, we created a dataframe with a separate row for every sku within the result_skus column. What if we have to have them separated to completely different columns as an alternative of rows?

One possibility is to use the pd.Collection operate to the result_skus column and concatenate the ensuing columns to the unique dataframe.

new_cols = new_data["result_skus"].apply(pd.Collection)

new_data = pd.concat([new_data, new_cols], axis=1)

new_data.head()

Columns from 0 to 9 accommodates the ten SKUs within the result_skus column. This code utilizing the apply operate just isn’t a vectorized operation.

Now we have an alternative choice, which is vectorized and far sooner.

new_cols = pd.DataFrame(new_data["result_skus"].tolist())

new_data = pd.concat([new_data, new_cols], axis=1)

This code will give us the identical dataframe as above however a lot sooner.

I demonstrated a typical knowledge cleansing and processing activity an information scientist or analyst might encounter of their job. I’ve been within the subject for over 5 years and Pandas has all the time been sufficient to do what I would like apart from when working very giant datasets (e.g. billions of rows).

The instruments which are higher match for such giant datasets have related syntax to Pandas. For instance, PySpark is form of a mix of Pandas and SQL. Polars is similar to Pandas by way of syntax. Thus, studying and practicind Pandas continues to be a extremely beneficial talent for anybody working within the knowledge science and AI area.

Thanks for studying.

Tags: DataGoToisntPandaswrangling

Related Posts

Screenshot 2026 06 28 at 16.15.56.jpg
Artificial Intelligence

Why Highly effective ML Is Deceptively Simple — Half 2

July 2, 2026
Bair Logo.png
Artificial Intelligence

2026 BAIR Graduate Showcase – The Berkeley Synthetic Intelligence Analysis Weblog

July 1, 2026
Ig 020b8d354f1edfb1016a2c5177d9c88193bc7dddbc59220a90.jpg
Artificial Intelligence

Construct and Run Your Personal AI Agent within the Cloud

July 1, 2026
Compare pasta bowls 2720445 v3 card.jpg
Artificial Intelligence

Context Engineering for RAG : The 4 Typed Inputs Behind Each RAG Reply

June 30, 2026
Prompt engineering.jpg
Artificial Intelligence

Immediate Engineering Fails Quietly —  Immediate Regression Is Why

June 30, 2026
Chatgpt image jun 26 2026 09 33 20 am.jpg
Artificial Intelligence

The right way to Select Between Small and Frontier Fashions

June 29, 2026
Next Post
Kdn 5 must know python concepts.png

5 Should-Know Python Ideas - KDnuggets

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
Chainlink Link And Cardano Ada Dominate The Crypto Coin Development Chart.jpg

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

May 17, 2025
Image 100 1024x683.png

Easy methods to Use LLMs for Highly effective Computerized Evaluations

August 13, 2025
Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025

EDITOR'S PICK

Bitcoin bull calls 220000 all time high as ether shiba inu cardano solana xrp see crazy momentum.jpg

Bitcoin, Ether, XRP, Solana, Cardano, Shiba Inu to Witness its Last 2025 Leap, Skilled Reveals Key Expectations ‬ ⋆ ZyCrypto

October 2, 2025
Bala ts feature engg itertools.png

Time-Sequence Characteristic Engineering with Python Itertools

May 15, 2026
Mitchell Luo Z1c9juter5c Unsplash 1024x718 1.jpg

Benchmarking Tabular Reinforcement Studying Algorithms

May 6, 2025
Image fx 49.jpg

How Huge Information Is Modifications How We Purchase and Promote Actual Property

March 3, 2026

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Persistent Latent Reminiscence for Multi-Hop LLM Brokers: How a 6G Handover Paper Closes the Agent Chilly-Begin
  • Ethereum is splitting into three energy facilities and ETH treasury companies are paying for 2
  • 5 AI Coding Platforms to Construct Apps With out the Headache
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?