• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Saturday, June 27, 2026
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Data Science

5 Light-weight Alternate options to Pandas You Ought to Strive

Admin by Admin
December 14, 2025
in Data Science
0
Alternatives to pandas.png
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


5 Lightweight Alternatives to Pandas You Should Try5 Lightweight Alternatives to Pandas You Should Try
Picture by Writer

 

# Introduction

 
Builders use pandas for information manipulation, however it may be gradual, particularly with massive datasets. Due to this, many are searching for quicker and lighter alternate options. These choices maintain the core options wanted for evaluation whereas specializing in velocity, decrease reminiscence use, and ease. On this article, we take a look at 5 light-weight alternate options to pandas you possibly can strive.

 

# 1. DuckDB

 
DuckDB is like SQLite for analytics. You’ll be able to run SQL queries immediately on comma-separated values (CSV) recordsdata. It’s helpful if you realize SQL or work with machine studying pipelines. Set up it with:

 

We’ll use the Titanic dataset and run a easy SQL question on it like this:

import duckdb

url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"

# Run SQL question on the CSV
end result = duckdb.question(f"""
    SELECT intercourse, age, survived
    FROM read_csv_auto('{url}')
    WHERE age > 18
""").to_df()

print(end result.head())

 

Output:


      intercourse     age   survived
0     male    22.0          0
1   feminine    38.0          1
2   feminine    26.0          1
3   feminine    35.0          1
4     male    35.0          0

 

DuckDB runs the SQL question immediately on the CSV file after which converts the output right into a DataFrame. You get SQL velocity with Python flexibility.

 

# 2. Polars

 
Polars is without doubt one of the hottest information libraries accessible right now. It’s applied within the Rust language and is exceptionally quick with minimal reminiscence necessities. The syntax can be very clear. Let’s set up it utilizing pip:

 

Now, let’s use the Titanic dataset to cowl a easy instance:

import polars as pl

# Load dataset 
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"
df = pl.read_csv(url)

end result = df.filter(pl.col("age") > 40).choose(["sex", "age", "survived"])
print(end result)

 

Output:


form: (150, 3)
┌────────┬──────┬──────────┐
│ intercourse    ┆ age  ┆ survived │
│ ---    ┆ ---  ┆ ---      │
│ str    ┆ f64  ┆ i64      │
╞════════╪══════╪══════════╡
│ male   ┆ 54.0 ┆ 0        │
│ feminine ┆ 58.0 ┆ 1        │
│ feminine ┆ 55.0 ┆ 1        │
│ male   ┆ 66.0 ┆ 0        │
│ male   ┆ 42.0 ┆ 0        │
│ …      ┆ …    ┆ …        │
│ feminine ┆ 48.0 ┆ 1        │
│ feminine ┆ 42.0 ┆ 1        │
│ feminine ┆ 47.0 ┆ 1        │
│ male   ┆ 47.0 ┆ 0        │
│ feminine ┆ 56.0 ┆ 1        │
└────────┴──────┴──────────┘

 

Polars reads the CSV, filters rows primarily based on an age situation, and selects a subset of the columns.

 

# 3. PyArrow

 
PyArrow is a light-weight library for columnar information. Instruments like Polars use Apache Arrow for velocity and reminiscence effectivity. It’s not a full substitute for pandas however is superb for studying recordsdata and preprocessing. Set up it with:

 

For our instance, let’s use the Iris dataset in CSV type as follows:

import pyarrow.csv as csv
import pyarrow.compute as laptop
import urllib.request

# Obtain the Iris CSV 
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/iris.csv"
local_file = "iris.csv"
urllib.request.urlretrieve(url, local_file)

# Learn with PyArrow
desk = csv.read_csv(local_file)

# Filter rows
filtered = desk.filter(laptop.higher(desk['sepal_length'], 5.0))

print(filtered.slice(0, 5))

 

Output:


pyarrow.Desk
sepal_length: double
sepal_width: double
petal_length: double
petal_width: double
species: string
----
sepal_length: [[5.1,5.4,5.4,5.8,5.7]]
sepal_width: [[3.5,3.9,3.7,4,4.4]]
petal_length: [[1.4,1.7,1.5,1.2,1.5]]
petal_width: [[0.2,0.4,0.2,0.2,0.4]]
species: [["setosa","setosa","setosa","setosa","setosa"]]

 

PyArrow reads the CSV and converts it right into a columnar format. Every column’s identify and kind are listed in a transparent schema. This setup makes it quick to examine and filter massive datasets.

 

# 4. Modin

 
Modin is for anybody who needs quicker efficiency with out studying a brand new library. It makes use of the identical pandas API however runs operations in parallel. You don’t want to vary your current code; simply replace the import. All the pieces else works like regular pandas. Set up it with pip:

 

For higher understanding, let’s strive a small instance utilizing the identical Titanic dataset as follows:

import modin.pandas as pd
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"

# Load the dataset
df = pd.read_csv(url)

# Filter the dataset 
adults = df[df["age"] > 18]

# Choose only some columns to show
adults_small = adults[["survived", "sex", "age", "class"]]

# Show end result
adults_small.head()

 

Output:


   survived     intercourse   age   class
0         0    male  22.0   Third
1         1  feminine  38.0   First
2         1  feminine  26.0   Third
3         1  feminine  35.0   First
4         0    male  35.0   Third

 

Modin spreads work throughout CPU cores, which suggests you’ll get higher efficiency with out having to do something additional.

 

# 5. Dask

 
How do you deal with large information with out growing RAM? Dask is a superb alternative when you will have recordsdata which might be greater in measurement than your pc’s random entry reminiscence (RAM). It makes use of lazy analysis, so it doesn’t load your entire dataset into reminiscence. This helps you course of thousands and thousands of rows easily. Set up it with:

pip set up dask[complete]

 

To strive it out, we are able to use the Chicago Crime dataset, as follows:

import dask.dataframe as dd
import urllib.request

url = "https://information.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD"
local_file = "chicago_crime.csv"
urllib.request.urlretrieve(url, local_file)

# Learn CSV with Dask (lazy analysis)
df = dd.read_csv(local_file, dtype=str)  # all columns as string

# Filter crimes categorised as 'THEFT'
thefts = df[df['Primary Type'] == 'THEFT']

# Choose just a few related columns
thefts_small = thefts[["ID", "Date", "Primary Type", "Description", "District"]]

print(thefts_small.head())

 

Output:


          ID                   Date Major Kind       Description District            
5   13204489 09/06/2023 11:00:00 AM        THEFT         OVER $500      001
50  13179181 08/17/2023 03:15:00 PM        THEFT      RETAIL THEFT      014
51  13179344 08/17/2023 07:25:00 PM        THEFT      RETAIL THEFT      014
53  13181885 08/20/2023 06:00:00 AM        THEFT    $500 AND UNDER      025
56  13184491 08/22/2023 11:44:00 AM        THEFT      RETAIL THEFT      014

 

Filtering (Major Kind == 'THEFT') and choosing columns are lazy operations. Filtering occurs immediately as a result of Dask processes information in chunks slightly than loading all the things without delay.

 

# Conclusion

 
We lined 5 alternate options to pandas and the way to use them. The article retains issues easy and targeted. Examine the official documentation for every library for full particulars:

When you run into any points, go away a remark and I’ll assist.
 
 

Kanwal Mehreen is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with medication. She co-authored the e book “Maximizing Productiveness with ChatGPT”. As a Google Era Scholar 2022 for APAC, she champions variety and educational excellence. She’s additionally acknowledged as a Teradata Range in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.

READ ALSO

Tremendous-tuning Language Fashions on Apple Silicon with MLX

How AI Is Altering Instagram Reel Advertising and marketing


5 Lightweight Alternatives to Pandas You Should Try5 Lightweight Alternatives to Pandas You Should Try
Picture by Writer

 

# Introduction

 
Builders use pandas for information manipulation, however it may be gradual, particularly with massive datasets. Due to this, many are searching for quicker and lighter alternate options. These choices maintain the core options wanted for evaluation whereas specializing in velocity, decrease reminiscence use, and ease. On this article, we take a look at 5 light-weight alternate options to pandas you possibly can strive.

 

# 1. DuckDB

 
DuckDB is like SQLite for analytics. You’ll be able to run SQL queries immediately on comma-separated values (CSV) recordsdata. It’s helpful if you realize SQL or work with machine studying pipelines. Set up it with:

 

We’ll use the Titanic dataset and run a easy SQL question on it like this:

import duckdb

url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"

# Run SQL question on the CSV
end result = duckdb.question(f"""
    SELECT intercourse, age, survived
    FROM read_csv_auto('{url}')
    WHERE age > 18
""").to_df()

print(end result.head())

 

Output:


      intercourse     age   survived
0     male    22.0          0
1   feminine    38.0          1
2   feminine    26.0          1
3   feminine    35.0          1
4     male    35.0          0

 

DuckDB runs the SQL question immediately on the CSV file after which converts the output right into a DataFrame. You get SQL velocity with Python flexibility.

 

# 2. Polars

 
Polars is without doubt one of the hottest information libraries accessible right now. It’s applied within the Rust language and is exceptionally quick with minimal reminiscence necessities. The syntax can be very clear. Let’s set up it utilizing pip:

 

Now, let’s use the Titanic dataset to cowl a easy instance:

import polars as pl

# Load dataset 
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"
df = pl.read_csv(url)

end result = df.filter(pl.col("age") > 40).choose(["sex", "age", "survived"])
print(end result)

 

Output:


form: (150, 3)
┌────────┬──────┬──────────┐
│ intercourse    ┆ age  ┆ survived │
│ ---    ┆ ---  ┆ ---      │
│ str    ┆ f64  ┆ i64      │
╞════════╪══════╪══════════╡
│ male   ┆ 54.0 ┆ 0        │
│ feminine ┆ 58.0 ┆ 1        │
│ feminine ┆ 55.0 ┆ 1        │
│ male   ┆ 66.0 ┆ 0        │
│ male   ┆ 42.0 ┆ 0        │
│ …      ┆ …    ┆ …        │
│ feminine ┆ 48.0 ┆ 1        │
│ feminine ┆ 42.0 ┆ 1        │
│ feminine ┆ 47.0 ┆ 1        │
│ male   ┆ 47.0 ┆ 0        │
│ feminine ┆ 56.0 ┆ 1        │
└────────┴──────┴──────────┘

 

Polars reads the CSV, filters rows primarily based on an age situation, and selects a subset of the columns.

 

# 3. PyArrow

 
PyArrow is a light-weight library for columnar information. Instruments like Polars use Apache Arrow for velocity and reminiscence effectivity. It’s not a full substitute for pandas however is superb for studying recordsdata and preprocessing. Set up it with:

 

For our instance, let’s use the Iris dataset in CSV type as follows:

import pyarrow.csv as csv
import pyarrow.compute as laptop
import urllib.request

# Obtain the Iris CSV 
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/iris.csv"
local_file = "iris.csv"
urllib.request.urlretrieve(url, local_file)

# Learn with PyArrow
desk = csv.read_csv(local_file)

# Filter rows
filtered = desk.filter(laptop.higher(desk['sepal_length'], 5.0))

print(filtered.slice(0, 5))

 

Output:


pyarrow.Desk
sepal_length: double
sepal_width: double
petal_length: double
petal_width: double
species: string
----
sepal_length: [[5.1,5.4,5.4,5.8,5.7]]
sepal_width: [[3.5,3.9,3.7,4,4.4]]
petal_length: [[1.4,1.7,1.5,1.2,1.5]]
petal_width: [[0.2,0.4,0.2,0.2,0.4]]
species: [["setosa","setosa","setosa","setosa","setosa"]]

 

PyArrow reads the CSV and converts it right into a columnar format. Every column’s identify and kind are listed in a transparent schema. This setup makes it quick to examine and filter massive datasets.

 

# 4. Modin

 
Modin is for anybody who needs quicker efficiency with out studying a brand new library. It makes use of the identical pandas API however runs operations in parallel. You don’t want to vary your current code; simply replace the import. All the pieces else works like regular pandas. Set up it with pip:

 

For higher understanding, let’s strive a small instance utilizing the identical Titanic dataset as follows:

import modin.pandas as pd
url = "https://uncooked.githubusercontent.com/mwaskom/seaborn-data/grasp/titanic.csv"

# Load the dataset
df = pd.read_csv(url)

# Filter the dataset 
adults = df[df["age"] > 18]

# Choose only some columns to show
adults_small = adults[["survived", "sex", "age", "class"]]

# Show end result
adults_small.head()

 

Output:


   survived     intercourse   age   class
0         0    male  22.0   Third
1         1  feminine  38.0   First
2         1  feminine  26.0   Third
3         1  feminine  35.0   First
4         0    male  35.0   Third

 

Modin spreads work throughout CPU cores, which suggests you’ll get higher efficiency with out having to do something additional.

 

# 5. Dask

 
How do you deal with large information with out growing RAM? Dask is a superb alternative when you will have recordsdata which might be greater in measurement than your pc’s random entry reminiscence (RAM). It makes use of lazy analysis, so it doesn’t load your entire dataset into reminiscence. This helps you course of thousands and thousands of rows easily. Set up it with:

pip set up dask[complete]

 

To strive it out, we are able to use the Chicago Crime dataset, as follows:

import dask.dataframe as dd
import urllib.request

url = "https://information.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD"
local_file = "chicago_crime.csv"
urllib.request.urlretrieve(url, local_file)

# Learn CSV with Dask (lazy analysis)
df = dd.read_csv(local_file, dtype=str)  # all columns as string

# Filter crimes categorised as 'THEFT'
thefts = df[df['Primary Type'] == 'THEFT']

# Choose just a few related columns
thefts_small = thefts[["ID", "Date", "Primary Type", "Description", "District"]]

print(thefts_small.head())

 

Output:


          ID                   Date Major Kind       Description District            
5   13204489 09/06/2023 11:00:00 AM        THEFT         OVER $500      001
50  13179181 08/17/2023 03:15:00 PM        THEFT      RETAIL THEFT      014
51  13179344 08/17/2023 07:25:00 PM        THEFT      RETAIL THEFT      014
53  13181885 08/20/2023 06:00:00 AM        THEFT    $500 AND UNDER      025
56  13184491 08/22/2023 11:44:00 AM        THEFT      RETAIL THEFT      014

 

Filtering (Major Kind == 'THEFT') and choosing columns are lazy operations. Filtering occurs immediately as a result of Dask processes information in chunks slightly than loading all the things without delay.

 

# Conclusion

 
We lined 5 alternate options to pandas and the way to use them. The article retains issues easy and targeted. Examine the official documentation for every library for full particulars:

When you run into any points, go away a remark and I’ll assist.
 
 

Kanwal Mehreen is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with medication. She co-authored the e book “Maximizing Productiveness with ChatGPT”. As a Google Era Scholar 2022 for APAC, she champions variety and educational excellence. She’s additionally acknowledged as a Teradata Range in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.

Tags: AlternativeslightweightPandas

Related Posts

Kdn chugani fine tuning language models apple silicon mlx feature.png
Data Science

Tremendous-tuning Language Fashions on Apple Silicon with MLX

June 26, 2026
Chatgpt image jun 15 2026 02 55 54 pm.png
Data Science

How AI Is Altering Instagram Reel Advertising and marketing

June 26, 2026
Apple spatial reframing photos app.jpg
Data Science

Apple’s Inventive Device Play and The Authenticity Drawback |

June 25, 2026
Awan top 7 coding models run locally 2026 1.png
Data Science

Prime 7 Coding Fashions You Can Run Regionally in 2026

June 25, 2026
Ai image generator.png
Data Science

Why Each Small Enterprise Ought to Care About an AI Picture Generator

June 24, 2026
Risk based data quality tiering.jpg.jpg
Data Science

The Case for Danger-Based mostly Information High quality |

June 24, 2026
Next Post
Solana firedance.jpg

Firedancer is stay, however Solana is violating the one security rule Ethereum treats as non-negotiable

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
Chainlink Link And Cardano Ada Dominate The Crypto Coin Development Chart.jpg

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

May 17, 2025
Image 100 1024x683.png

Easy methods to Use LLMs for Highly effective Computerized Evaluations

August 13, 2025
Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025

EDITOR'S PICK

Kalshi bitcoin perpetual futures.jpeg

CFTC Greenlights Kalshi’s Bitcoin Perpetual Futures, Marking Main U.S. Crypto Milestone

May 30, 2026
Feature image2.png

How I High quality-Tuned Granite-Imaginative and prescient 2B to Beat a 90B Mannequin — Insights and Classes Discovered

July 26, 2025
Museums victoria i 0ykumumlo unsplash scaled 1.jpg

Exploring Patterns of Survival from the Titanic Dataset

May 13, 2026
Shutterstock cougar puma mountain lion.jpg

Advert trackers say Anthropic beat OpenAI however ai.com gained the day • The Register

February 10, 2026

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • CAP is accessible for buying and selling!
  • From Native LLM to Instrument-Utilizing Agent
  • Tremendous-tuning Language Fashions on Apple Silicon with MLX
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?