

Picture by Writer | ChatGPT
# Introduction
Getting real-world knowledge in your knowledge science tasks is usually the toughest half. Toy datasets are simple to seek out, however for high-quality or real-time knowledge you normally want to make use of APIs or construct {custom} scraping pipelines to extract info from the net.
On this article, I share my 10 favourite free APIs—those I take advantage of day by day for knowledge assortment, knowledge integration, and constructing AI brokers. These APIs are organized into 5 classes, spanning trusted knowledge repositories, internet scraping, and internet search, so you may rapidly select the fitting software and transfer from knowledge to perception quicker.
# Foundational Knowledge Repositories
A foundational knowledge repository is a community-based platform the place totally different organizations and open-source contributors share their datasets with the broader world. With a easy command, you may entry these datasets in your challenge.
// 1. Kaggle API
Kaggle datasets are extraordinarily fashionable when engaged on knowledge science tasks. As an alternative of downloading them manually, you may create an information pipeline that can mechanically obtain the dataset, unzip it, and cargo it into your workspace.
These datasets are shared by the open-source group for everybody to make use of. To get began, generate an API key out of your Kaggle account and set it as an surroundings variable. After that, you may run the next instructions in your terminal. Kaggle additionally offers a Python SDK, which permits for simple integration together with your code.
kaggle datasets obtain -d kingabzpro/world-vaccine-progress -p knowledge --unzip
// 2. Hugging Face CLI
Much like Kaggle, Hugging Face can also be an information science and machine studying group the place individuals share datasets, fashions, and demos. You’ll be able to simply set up the Hugging Face CLI and combine it into your workflows utilizing both CLI instructions or Python code. Each choices can help you obtain datasets without having an API key.
An API key’s solely required when the dataset is gated.
hf obtain kingabzpro/dermatology-qa-firecrawl-dataset
# Internet and Crawling APIs
The net incorporates all kinds of information. If you cannot discover the data you want on the platforms talked about above, you might have to curate your personal knowledge by scraping the net or utilizing an online search API.
// 3. Firecrawl
Firecrawl offers an API for extracting content material from web sites and changing it right into a markdown format for simpler AI integrations. It additionally comes with a scraping and extraction API that’s built-in with an LLM (massive language mannequin) for superior internet scraping choices.
This API is a must have. I take advantage of it every single day for knowledge creation and for integrating it into my AI tasks.
curl -s -X POST "https://api.firecrawl.dev/v2/scrape"
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
-H "Content material-Sort: utility/json"
-d '{
"url": "https://abid.work",
"codecs": ["markdown", "html"]
}'
// 4. Tavily
Tavily is a quick internet search API that gives 1,000 search requests per thirty days without cost. It’s each correct and fast. You should use it to create datasets, combine it into your AI tasks, or put it to use as a easy search API in your improvement wants.
curl --request POST
--url https://api.tavily.com/search
--header "Authorization: Bearer "
--header "Content material-Sort: utility/json"
--data '{
"question": "who's Leo Messi?",
"auto_parameters": false,
"matter": "normal",
"search_depth": "primary",
"chunks_per_source": 3,
"max_results": 1,
"days": 7,
"include_answer": true,
"include_raw_content": true,
"include_images": false,
"include_image_descriptions": false,
"include_favicon": false,
"include_domains": [],
"exclude_domains": [],
"nation": null
}'
# Geospatial and Climate APIs
In case you are on the lookout for climate and geospatial datasets, you’ll know that issues preserve altering. That is why you want real-time entry to those datasets through API.
// 5. OpenWeatherMap
OpenWeatherMap is a service that gives international climate knowledge through APIs, together with present circumstances, forecasts, nowcasts, historic information, and even minute-by-minute hyperlocal precipitation forecasts.
curl "https://api.openweathermap.org/knowledge/2.5/climate?q=London&appid=YOUR_API_KEY&items=metric"
// 6. OpenStreetMap
OpenStreetMap offers world map knowledge, and the Overpass API is a read-only internet database that serves custom-selected elements of OSM and may be queried with Overpass QL. The instance beneath fetches cafe nodes inside a small London bounding field.
curl -G "https://overpass-api.de/api/interpreter"
--data-urlencode 'knowledge=[out:json];node["amenity"="cafe"](51.50,-0.15,51.52,-0.10);out;'
# Monetary Market Knowledge APIs
Monetary market knowledge APIs are extremely really helpful if you’re engaged on a monetary challenge and want real-time knowledge on shares, crypto, and different finance-related info and information.
// 7. Alpha Vantage
Alpha Vantage is a monetary knowledge platform providing free APIs for real-time and historic market knowledge throughout shares, foreign exchange, cryptocurrencies, commodities, and choices, with outputs in JSON or CSV. It additionally offers chart-ready time collection at intraday, day by day, weekly, and month-to-month intervals, and over 50 technical indicators for evaluation.
curl "https://www.alphavantage.co/question?perform=TIME_SERIES_DAILY&image=IBM&apikey=YOUR_API_KEY"
// 8. Yahoo Finance
Many freshmen and practitioners use the yfinance API to entry inventory quotes, historic time collection knowledge, dividends and splits, in addition to primary metadata. This permits them to create analysis-ready knowledge frames for fast prototypes and classroom tasks.
Yahoo Finance presents free inventory quotes, information, portfolio instruments, and protection of worldwide markets, enabling customers to discover a variety of market knowledge at no direct price.
import yfinance as yf
print(yf.obtain("AAPL", interval="1y").head())
# Social and Neighborhood Knowledge APIs
In case you are engaged on a challenge to research textual content and group conversations from prime social media platforms, then these APIs present easy accessibility to actual social media knowledge.
// 9. Reddit
Reddit presents a wealthy, community-driven knowledge supply, and the Python Reddit API Wrapper (PRAW) makes it easy to entry the official Reddit API for duties like fetching posts, feedback, and subreddit metadata in Python.
PRAW works by sending requests to Reddit’s API beneath the hood and is often utilized in instructing and analysis to gather dialogue threads for evaluation.
import praw
r = praw.Reddit(
client_id="ID",
client_secret="SECRET",
user_agent="myapp:ds-project:v1 (by u/yourname)"
)
print([s.title for s in r.subreddit("Python").hot(limit=5)])
// 10. X
X (beforehand generally known as Twitter) offers a developer platform with REST endpoints for consumer and content material retrieval, plus streaming choices for real-time knowledge. Entry typically requires authentication, adherence to fee limits and coverage, and choosing an entry tier applicable in your quantity and use case.
curl -H "Authorization: Bearer YOUR_BEARER_TOKEN"
"https://api.x.com/2/customers/by/username/jack"
# Ultimate Ideas
These APIs present free entry to knowledge that’s typically troublesome to acquire. They enormously improve your skill to collect internet knowledge or enhance your internet scraping efforts, permitting you to create custom-made datasets.
I extremely suggest bookmarking this text to revisit if you want high-quality, real-time knowledge from the net. By leveraging these APIs, you may unlock precious insights that can support in your analysis and evaluation.
Abid Ali Awan (@1abidaliawan) is a licensed knowledge scientist skilled who loves constructing machine studying fashions. At present, he’s specializing in content material creation and writing technical blogs on machine studying and knowledge science applied sciences. Abid holds a Grasp’s diploma in expertise administration and a bachelor’s diploma in telecommunication engineering. His imaginative and prescient is to construct an AI product utilizing a graph neural community for college students fighting psychological sickness.


Picture by Writer | ChatGPT
# Introduction
Getting real-world knowledge in your knowledge science tasks is usually the toughest half. Toy datasets are simple to seek out, however for high-quality or real-time knowledge you normally want to make use of APIs or construct {custom} scraping pipelines to extract info from the net.
On this article, I share my 10 favourite free APIs—those I take advantage of day by day for knowledge assortment, knowledge integration, and constructing AI brokers. These APIs are organized into 5 classes, spanning trusted knowledge repositories, internet scraping, and internet search, so you may rapidly select the fitting software and transfer from knowledge to perception quicker.
# Foundational Knowledge Repositories
A foundational knowledge repository is a community-based platform the place totally different organizations and open-source contributors share their datasets with the broader world. With a easy command, you may entry these datasets in your challenge.
// 1. Kaggle API
Kaggle datasets are extraordinarily fashionable when engaged on knowledge science tasks. As an alternative of downloading them manually, you may create an information pipeline that can mechanically obtain the dataset, unzip it, and cargo it into your workspace.
These datasets are shared by the open-source group for everybody to make use of. To get began, generate an API key out of your Kaggle account and set it as an surroundings variable. After that, you may run the next instructions in your terminal. Kaggle additionally offers a Python SDK, which permits for simple integration together with your code.
kaggle datasets obtain -d kingabzpro/world-vaccine-progress -p knowledge --unzip
// 2. Hugging Face CLI
Much like Kaggle, Hugging Face can also be an information science and machine studying group the place individuals share datasets, fashions, and demos. You’ll be able to simply set up the Hugging Face CLI and combine it into your workflows utilizing both CLI instructions or Python code. Each choices can help you obtain datasets without having an API key.
An API key’s solely required when the dataset is gated.
hf obtain kingabzpro/dermatology-qa-firecrawl-dataset
# Internet and Crawling APIs
The net incorporates all kinds of information. If you cannot discover the data you want on the platforms talked about above, you might have to curate your personal knowledge by scraping the net or utilizing an online search API.
// 3. Firecrawl
Firecrawl offers an API for extracting content material from web sites and changing it right into a markdown format for simpler AI integrations. It additionally comes with a scraping and extraction API that’s built-in with an LLM (massive language mannequin) for superior internet scraping choices.
This API is a must have. I take advantage of it every single day for knowledge creation and for integrating it into my AI tasks.
curl -s -X POST "https://api.firecrawl.dev/v2/scrape"
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
-H "Content material-Sort: utility/json"
-d '{
"url": "https://abid.work",
"codecs": ["markdown", "html"]
}'
// 4. Tavily
Tavily is a quick internet search API that gives 1,000 search requests per thirty days without cost. It’s each correct and fast. You should use it to create datasets, combine it into your AI tasks, or put it to use as a easy search API in your improvement wants.
curl --request POST
--url https://api.tavily.com/search
--header "Authorization: Bearer "
--header "Content material-Sort: utility/json"
--data '{
"question": "who's Leo Messi?",
"auto_parameters": false,
"matter": "normal",
"search_depth": "primary",
"chunks_per_source": 3,
"max_results": 1,
"days": 7,
"include_answer": true,
"include_raw_content": true,
"include_images": false,
"include_image_descriptions": false,
"include_favicon": false,
"include_domains": [],
"exclude_domains": [],
"nation": null
}'
# Geospatial and Climate APIs
In case you are on the lookout for climate and geospatial datasets, you’ll know that issues preserve altering. That is why you want real-time entry to those datasets through API.
// 5. OpenWeatherMap
OpenWeatherMap is a service that gives international climate knowledge through APIs, together with present circumstances, forecasts, nowcasts, historic information, and even minute-by-minute hyperlocal precipitation forecasts.
curl "https://api.openweathermap.org/knowledge/2.5/climate?q=London&appid=YOUR_API_KEY&items=metric"
// 6. OpenStreetMap
OpenStreetMap offers world map knowledge, and the Overpass API is a read-only internet database that serves custom-selected elements of OSM and may be queried with Overpass QL. The instance beneath fetches cafe nodes inside a small London bounding field.
curl -G "https://overpass-api.de/api/interpreter"
--data-urlencode 'knowledge=[out:json];node["amenity"="cafe"](51.50,-0.15,51.52,-0.10);out;'
# Monetary Market Knowledge APIs
Monetary market knowledge APIs are extremely really helpful if you’re engaged on a monetary challenge and want real-time knowledge on shares, crypto, and different finance-related info and information.
// 7. Alpha Vantage
Alpha Vantage is a monetary knowledge platform providing free APIs for real-time and historic market knowledge throughout shares, foreign exchange, cryptocurrencies, commodities, and choices, with outputs in JSON or CSV. It additionally offers chart-ready time collection at intraday, day by day, weekly, and month-to-month intervals, and over 50 technical indicators for evaluation.
curl "https://www.alphavantage.co/question?perform=TIME_SERIES_DAILY&image=IBM&apikey=YOUR_API_KEY"
// 8. Yahoo Finance
Many freshmen and practitioners use the yfinance API to entry inventory quotes, historic time collection knowledge, dividends and splits, in addition to primary metadata. This permits them to create analysis-ready knowledge frames for fast prototypes and classroom tasks.
Yahoo Finance presents free inventory quotes, information, portfolio instruments, and protection of worldwide markets, enabling customers to discover a variety of market knowledge at no direct price.
import yfinance as yf
print(yf.obtain("AAPL", interval="1y").head())
# Social and Neighborhood Knowledge APIs
In case you are engaged on a challenge to research textual content and group conversations from prime social media platforms, then these APIs present easy accessibility to actual social media knowledge.
// 9. Reddit
Reddit presents a wealthy, community-driven knowledge supply, and the Python Reddit API Wrapper (PRAW) makes it easy to entry the official Reddit API for duties like fetching posts, feedback, and subreddit metadata in Python.
PRAW works by sending requests to Reddit’s API beneath the hood and is often utilized in instructing and analysis to gather dialogue threads for evaluation.
import praw
r = praw.Reddit(
client_id="ID",
client_secret="SECRET",
user_agent="myapp:ds-project:v1 (by u/yourname)"
)
print([s.title for s in r.subreddit("Python").hot(limit=5)])
// 10. X
X (beforehand generally known as Twitter) offers a developer platform with REST endpoints for consumer and content material retrieval, plus streaming choices for real-time knowledge. Entry typically requires authentication, adherence to fee limits and coverage, and choosing an entry tier applicable in your quantity and use case.
curl -H "Authorization: Bearer YOUR_BEARER_TOKEN"
"https://api.x.com/2/customers/by/username/jack"
# Ultimate Ideas
These APIs present free entry to knowledge that’s typically troublesome to acquire. They enormously improve your skill to collect internet knowledge or enhance your internet scraping efforts, permitting you to create custom-made datasets.
I extremely suggest bookmarking this text to revisit if you want high-quality, real-time knowledge from the net. By leveraging these APIs, you may unlock precious insights that can support in your analysis and evaluation.
Abid Ali Awan (@1abidaliawan) is a licensed knowledge scientist skilled who loves constructing machine studying fashions. At present, he’s specializing in content material creation and writing technical blogs on machine studying and knowledge science applied sciences. Abid holds a Grasp’s diploma in expertise administration and a bachelor’s diploma in telecommunication engineering. His imaginative and prescient is to construct an AI product utilizing a graph neural community for college students fighting psychological sickness.
















