Accessing Information Commons with the New Python API Consumer

Accessing Data Commons with the New Python API Client

Picture by Editor

# Introduction

Information is on the core of any information skilled’s work. With out helpful and legitimate information sources, we can not carry out our tasks. Moreover, poor-quality or irrelevant information will solely trigger our work to go to waste. That’s why accessing dependable datasets is a crucial start line for information professionals.

Information Commons is an open-source initiative by Google to arrange the world’s accessible information and make it accessible for everybody to make use of. It’s free for anybody to question publicly accessible information. What units Information Commons other than different public dataset initiatives is that it already performs the schematic work, making information prepared to make use of far more rapidly.

Given the utility of Information Commons for our work, accessing it’s turning into essential for a lot of information duties. Happily, Information Commons gives a brand new Python API consumer to entry these datasets.

# Accessing Information Commons with Python

Information Commons works by organizing information right into a queryable information graph that unifies info from various sources. At its core, it makes use of the schema-based mannequin from schema.org to standardize information representations.

Utilizing this schema, Information Commons can join information from varied sources right into a single graph the place nodes symbolize entities (reminiscent of cities, places, and folks), occasions, and statistical variables. Edges depict the relationships between these nodes. Every node is exclusive and identifiable by a DCID (Information Commons ID), and lots of nodes embrace observations — measurements linked to the variable, entity, and interval.

With the Python API, we will simply entry the information graph to amass the mandatory information. Let’s check out how we will try this.

First, we have to purchase a free API key to entry Information Commons. Create a free account and duplicate the API key to a safe location. You may as well use the trial API key, however entry is extra restricted.

Subsequent, set up the Information Commons Python library. We are going to use the V2 API consumer, as it’s the newest model. To do this, run the next command to put in the Information Commons consumer with non-compulsory assist for Pandas DataFrames as nicely.

pip set up "datacommons-client[Pandas]"

With the library put in, we’re able to fetch information utilizing the Information Commons Python consumer.

To create the consumer that can entry the information from the cloud, run the next code.

from datacommons_client.consumer import DataCommonsClient

consumer = DataCommonsClient(api_key="YOUR-API-KEY")

Probably the most vital ideas in Information Commons is the entity, which refers to a persistent and bodily factor in the actual world, reminiscent of a metropolis or a rustic. It turns into an vital a part of fetching information, as most datasets require specifying the entity. You may go to the Information Commons Place web page to find out about all accessible entities.

For many customers, the information that we wish to purchase is extra particular: the statistical variables saved in Information Commons. To pick the information we wish to retrieve, we have to know the DCID of the statistical variables, which yow will discover through the Statistical Variable Explorer.

You may filter variables and choose a dataset from the choices above. For instance, select the World Financial institution dataset for “ATMs per 100,000 adults.” On this case, you may get hold of the DCID by inspecting the data offered within the explorer.

If you happen to click on on the DCID, you may see all the data associated to the node, together with the way it connects to different info.

For the statistical variable DCID, we additionally have to specify the entity DCID for the geography. We are able to discover the Information Commons Place web page talked about above, or we will use the next code to see the accessible DCIDs for a sure place identify.

# Search for DCIDs by place identify (returns a number of candidates)
resp = consumer.resolve.fetch_dcids_by_name(names="Indonesia").to_dict()
dcid_list = [c["dcid"] for c in resp["entities"][0]["candidates"]]
print(dcid_list)

With output just like the next:

['country/IDN', 'geoId/...' , '...']

Utilizing the code above, we fetch the DCID candidates accessible for a particular place identify. For instance, among the many candidates for “Indonesia,” we will choose nation/IDN because the nation DCID.

All the data we’d like is now prepared, and we solely have to execute the next code:

variable = ["worldBank/GFDD_AI_25"]
entity = ["country/IDN"]

df = consumer.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

The result’s proven within the dataset under.

The present code returns all accessible observations for the chosen variables and entities throughout the whole timeframe. Within the code above, additionally, you will discover that we’re utilizing lists as an alternative of single strings.

It is because we will move a number of variables and entities concurrently to amass a mixed dataset. For instance, the code under fetches two distinct statistical variables and two entities without delay.

variable = ["worldBank/GFDD_AI_25", "worldBank/SP_DYN_LE60_FE_IN"]
entity = ["country/IDN", "country/USA"]

df = consumer.observations_dataframe(
    variable_dcids=variable,
    date="all",
    entity_dcids=entity
)

With output like the next:

You may see that the ensuing DataFrame combines the variables and entities you set beforehand. With this technique, you may purchase the information you want with out executing separate queries for every mixture.

That’s all it’s essential to find out about accessing Information Commons with the brand new Python API consumer. Use this library everytime you want dependable public information in your work.

# Wrapping Up

Information Commons is an open-source undertaking by Google geared toward democratizing information entry. The undertaking is inherently totally different from many public information initiatives, because the datasets are constructed on prime of a information graph schema, which makes the information simpler to unify.

On this article, we explored the right way to entry datasets throughout the graph utilizing Python—leveraging statistical variables and entities to retrieve observations.

I hope this has helped!

Cornellius Yudha Wijaya is an information science assistant supervisor and information author. Whereas working full-time at Allianz Indonesia, he likes to share Python and information suggestions through social media and writing media. Cornellius writes on a wide range of AI and machine studying matters.

5 Rising Tendencies in Information Engineering for 2026

High 7 Open Supply OCR Fashions