Information Visualization Defined (Half 5): Visualizing Time-Sequence Information in Python (Matplotlib, Plotly, and Altair)

How I’m Making Positive My Analytics Profession Doesn’t Get Eaten by AI

Pydantic + OpenAI: The Cleanest Strategy to Get Structured Outputs from LLMs

in my knowledge visualization collection. See the next:

It’s time to begin constructing your personal knowledge visualizations. On this article, I’ll stroll by means of the method of visualizing time-series knowledge in Python intimately. You probably have not learn the earlier articles in my knowledge visualization collection, I strongly suggest studying not less than the earlier article for a overview of Python.

Over the course of coding visualizations in Python, I’ll give attention to three Python packages: Matplotlib, Plotly, and Altair. One strategy to studying these may contain writing 1-2 articles per bundle, every one delving into the chosen bundle intimately. Whereas this can be a legitimate strategy, the main focus of my collection shouldn’t be on any specific library; it’s in regards to the knowledge visualization course of itself. These packages are merely instruments—a way to an finish.

In consequence, I’ll construction this text and those to observe every round a selected sort of information visualization, and I’ll talk about learn how to implement that visualization in every of the listed packages to make sure you have a breadth of approaches obtainable to you.

First up: a definition for time-series knowledge.

What Is Time-Sequence Information?

Formally, time-series knowledge entails a variable that could be a operate of time. In easy phrases, this simply means some knowledge that modifications over time.

For instance, a public firm’s inventory value over the past ten years is time-series knowledge. In case you’d want a extra scientific instance, take into account the climate. A graph depicting the day by day temperature of your favourite metropolis over the course of the yr is a graph that depicts time-series knowledge.

Time-series knowledge is a wonderful place to begin for knowledge visualization for just a few causes:

It’s an especially widespread and helpful sort of information. There’s fairly a bit of data that’s depending on time, and understanding it offers significant perception into the topic of curiosity going ahead.
There are tried and true strategies to visualise time-series knowledge successfully, as you’ll see under. Grasp these, and also you’ll be in good condition.
As in contrast with another varieties of knowledge, time-series visualizations are pretty intuitive to people and align with our notion of time. This makes it simpler to give attention to the essential components of visualization design when beginning out, as a substitute of getting slowed down in attempting to make sense of very complicated knowledge.

Let’s begin by having a look at completely different visualization strategies on a conceptual degree.

How Is Time-Sequence Information Visualized?

The usual for time-series visualization is the famed line chart:

This chart usually places time on the x-axis, and the variable that modifications with time on the y-axis. This offers a view that look like “transferring ahead,” consistent with people’ linear notion of time.

Although the road chart is the usual, there are different, associated potentialities.

A number of Line Chart

This strategy is a direct extension of a singular line chart and shows a number of associated time collection on the identical plot, permitting comparability between teams or classes (e.g., gross sales by area):

Space Chart

Functionally, an space chart is nearly precisely the identical as a line chart, however the space beneath the road is crammed in. It emphasizes the magnitude of change:

Stacked Space Chart

Technically, the stacked space chart is the analogue to the a number of line chart, however it’s a bit trickier to learn. Particularly, the overall is cumulative, with the baseline for every stacked line beginning on the one under it. As an example, at 2023 within the chart under, “Ages 25-64” represents about 4 billion individuals, since we begin counting the place “Ages 15-24” ends.

Bar Chart (Vertical or Horizontal)

Lastly, in some instances, a bar chart can also be applicable for time-series visualization. This strategy is beneficial in case you want to present discrete time intervals—corresponding to month-to-month sum or yearly common of some metric—reasonably than steady knowledge. That mentioned, I can’t be coding bar charts on this article.

Now, let’s get to really constructing these visualizations. In every of the examples under, I’ll stroll by means of the code in a selected visualization library for establishing line charts and space charts. I’ve linked the information right here and encourage you to observe alongside. To internalize these strategies, it’s essential to apply utilizing them your self.

Coding Time-Sequence Visualizations in Matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# Load knowledge
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Instance 1: Easy Line Chart
fig1, ax1 = plt.subplots(figsize=(10, 6))
ax1.plot(df['Date'], df['Product A Sales'], linewidth=2)
ax1.set_xlabel('Date')
ax1.set_ylabel('Gross sales')
ax1.set_title('Product A Gross sales Over Time')
ax1.grid(True, alpha=0.3)
plt.tight_layout()
# Show with: fig1

# Instance 2: A number of Line Chart
fig2, ax2 = plt.subplots(figsize=(10, 6))
ax2.plot(df['Date'], df['Product A Sales'], label='Product A', linewidth=2)
ax2.plot(df['Date'], df['Product B Sales'], label='Product B', linewidth=2)
ax2.plot(df['Date'], df['Product C Sales'], label='Product C', linewidth=2)
ax2.set_xlabel('Date')
ax2.set_ylabel('Gross sales')
ax2.set_title('Gross sales Comparability - All Merchandise')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
# Show with: fig2

# Instance 3: Space Chart
fig3, ax3 = plt.subplots(figsize=(10, 6))
ax3.fill_between(df['Date'], df['Product A Sales'], alpha=0.4)
ax3.plot(df['Date'], df['Product A Sales'], linewidth=2)
ax3.set_xlabel('Date')
ax3.set_ylabel('Gross sales')
ax3.set_title('Product A Gross sales - Space Chart')
ax3.grid(True, alpha=0.3)
plt.tight_layout()
# Show with: fig3

# Instance 4: Stacked Space Chart
fig4, ax4 = plt.subplots(figsize=(10, 6))
ax4.stackplot(df['Date'], df['Product A Sales'], df['Product B Sales'], df['Product C Sales'],
              labels=['Product A', 'Product B', 'Product C'], alpha=0.7)
ax4.set_xlabel('Date')
ax4.set_ylabel('Gross sales')
ax4.set_title('Complete Gross sales - Stacked Space Chart')
ax4.legend(loc='higher left')
ax4.grid(True, alpha=0.3)
plt.tight_layout()
# Show with: fig4

Operating this code produces the next 4 visualizations:

Let’s break the code down step-by-step to make sure you perceive what is occurring:

First, we load the information into pandas as a CSV file and make sure the date is correctly represented as a datetime object.
Matplotlib constructions charts inside the Determine object, which represents your complete Canvas. This may be accessed immediately utilizing plt.determine, however having a number of variables utilizing plt.subplots is extra intuitive for a number of visualizations. Each name to plt.subplots defines a brand new, separate Determine (canvas).
The road fig1, ax1 = plt.subplots(figsize=(10, 6)) defines the primary subplot; fig1 represents the canvas, however ax1 represents the precise plotting space inside it and is the variable the place you’ll make most modifications.
Matplotlib has completely different capabilities for various charts. The plot operate plots 2-D factors after which connects them to assemble a line chart. That is what we specify within the line ax1.plot(df['Date'], df['Product A Sales'], linewidth=2).
The remaining traces are primarily aesthetic capabilities that do precisely what their names counsel: labeling axes, including gridlines, and specifying structure.
For the a number of line chart, the code is exactly the identical, besides we name plot thrice: one for every set of x-y factors that we need to graph to point out all of the merchandise.
The space chart is nearly equivalent to the road chart, apart from the addition of ax3.fill_between(df['Date'], df['Product A Sales'], alpha=0.4), which tells Matplotlib to shade the realm under the road.
The stacked space chart, in contrast, requires us to make use of the stacked_plot operate, which takes in all three knowledge arrays we need to plot without delay. The remaining aesthetic code, nevertheless, is identical.

Attempt programming these your self in your favourite IDE or in a Jupyter pocket book. What patterns do you see? Which chart do you favor probably the most?

Additionally, keep in mind that you do not want to memorize this syntax, particularly in case you are new to programming knowledge visualizations or new to Python on the whole. Give attention to attempting to know what is occurring on a conceptual degree; you’ll be able to all the time search for the actual syntax and plug your knowledge in as wanted.

It will maintain true for the remaining two examples as effectively.

Coding Time-Sequence Visualizations in Plotly

Right here is the code to generate the identical visualizations as above, this time in Plotly’s type:

import pandas as pd
import plotly.graph_objects as go

# Load knowledge
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Instance 1: Easy Line Chart
fig1 = go.Determine()
fig1.add_trace(go.Scatter(x=df['Date'], y=df['Product A Sales'], mode='traces', identify='Product A'))
fig1.update_layout(
    title='Product A Gross sales Over Time',
    xaxis_title='Date',
    yaxis_title='Gross sales',
    template='plotly_white'
)
# Show with: fig1

# Instance 2: A number of Line Chart
fig2 = go.Determine()
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product A Sales'], mode='traces', identify='Product A'))
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product B Sales'], mode='traces', identify='Product B'))
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product C Sales'], mode='traces', identify='Product C'))
fig2.update_layout(
    title='Gross sales Comparability - All Merchandise',
    xaxis_title='Date',
    yaxis_title='Gross sales',
    template='plotly_white'
)
# Show with: fig2

# Instance 3: Space Chart
fig3 = go.Determine()
fig3.add_trace(go.Scatter(
    x=df['Date'], y=df['Product A Sales'],
    mode='traces',
    identify='Product A',
    fill='tozeroy'
))
fig3.update_layout(
    title='Product A Gross sales - Space Chart',
    xaxis_title='Date',
    yaxis_title='Gross sales',
    template='plotly_white'
)
# Show with: fig3

# Instance 4: Stacked Space Chart
fig4 = go.Determine()
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product A Sales'],
    mode='traces',
    identify='Product A',
    stackgroup='one'
))
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product B Sales'],
    mode='traces',
    identify='Product B',
    stackgroup='one'
))
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product C Sales'],
    mode='traces',
    identify='Product C',
    stackgroup='one'
))
fig4.update_layout(
    title='Complete Gross sales - Stacked Space Chart',
    xaxis_title='Date',
    yaxis_title='Gross sales',
    template='plotly_white'
)
# Show with: fig4

We receive the next 4 visualizations:

Here’s a breakdown of the code:

Plotly is totally unbiased of Matplotlib. It makes use of equally named Determine objects, however doesn’t have any ax objects.
The Scatter operate with mode “traces” is used to construct a line chart with the desired x- and y-axis knowledge. You’ll be able to consider the add_trace operate as including a brand new element to an current Determine. Thus, for the a number of line chart, we merely name add_trace with the suitable Scatter inputs thrice.
For labeling and aesthetics in Plotly, use the update_layout operate.
The realm chart is constructed identically to the road chart, with the addition of the non-compulsory argument fill='tozeroy'.
- Upon first look, this will appear like some obscure colour, however it’s truly saying “TO ZERO Y,” specifying to Plotly the realm that ought to be crammed in.
- In case you’re having hassle visualizing this, attempt altering the argument to “tozerox” and see what occurs.
For the stacked space chart, we want a unique non-compulsory parameter: stackgroup='one'. Including this to every of the Scatter calls tells Plotly that they’re all to be constructed as a part of the identical stack.

A bonus of Plotly is that by default, all Plotly charts are interactive and include the power to zoom, hover for tooltips, and toggle the legend. (Notice the photographs above are saved as PNGs, so you’ll need to generate the plots your self as a way to see this.)

Coding Time-Sequence Visualizations in Altair

Let’s end off by producing these 4 visualizations in Altair. Right here is the code:

import pandas as pd
import altair as alt

# Load knowledge
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Instance 1: Easy Line Chart
chart1 = alt.Chart(df).mark_line().encode(
    x='Date:T',
    y='Product A Gross sales:Q'
).properties(
    title='Product A Gross sales Over Time',
    width=700,
    peak=400
)
# Show with: chart1

# Instance 2: A number of Line Chart
# Reshape knowledge for Altair
df_melted = df.soften(id_vars='Date', var_name='product', value_name='gross sales')

chart2 = alt.Chart(df_melted).mark_line().encode(
    x='Date:T',
    y='gross sales:Q',
    colour='product:N'
).properties(
    title='Gross sales Comparability - All Merchandise',
    width=700,
    peak=400
)
# Show with: chart2

# Instance 3: Space Chart
chart3 = alt.Chart(df).mark_area(opacity=0.7).encode(
    x='Date:T',
    y='Product A Gross sales:Q'
).properties(
    title='Product A Gross sales - Space Chart',
    width=700,
    peak=400
)
# Show with: chart3

# Instance 4: Stacked Space Chart
chart4 = alt.Chart(df_melted).mark_area(opacity=0.7).encode(
    x='Date:T',
    y='gross sales:Q',
    colour='product:N'
).properties(
    title='Complete Gross sales - Stacked Space Chart',
    width=700,
    peak=400
)
# Show with: chart4

We receive the next charts:

Let’s break down the code:

Altair has a barely completely different construction from Matplotlib and Plotly. It takes some apply to know, however when you perceive it, its intuitiveness makes constructing new visualizations easy.
Every little thing in Altair revolves across the Chart object, into which you go in your knowledge. Then, you utilize a mark_ operate to specify what sort of chart you need to construct, and the encoding operate to specify what variables will correspond to what visible components on the chart (e.g., x-axis, y-axis, colour, measurement, and so on.).
For the road chart, we use the mark_line operate, after which specify that we would like the date on the x-axis and the gross sales on the y-axis.
The soften operate doesn’t change the information itself, simply its construction. It places the merchandise all right into a single column, a “lengthy format” which is extra amenable to Altair’s visualization mannequin. For extra particulars, try this beneficial article.
As soon as we rework the information as above, we will construct our a number of line chart just by including a “colour” encoding, as proven within the code. This was made potential as a result of all of the product sorts are actually obtainable in a single column, and we will inform Altair to differentiate them by colour.
The code for producing space charts showcases the fantastic thing about Altair’s construction. Every little thing stays the identical—all it’s essential do is change the operate getting used to mark_area!

As you discover different varieties of visualizations by yourself (and in future articles!), Altair’s mannequin for constructing visualizations will turn into simpler to implement (and hopefully admire).

What’s Subsequent?

In future articles, I’ll cowl learn how to use these libraries to construct extra varieties of visualizations. As you proceed studying, keep in mind that the aim of those articles is not to grasp anybody instrument. That is about studying knowledge visualization holistically, and my hope is that you’ve got walked away from this text with a greater understanding of how time-series knowledge is visualized.

As for the code, that consolation comes with time and apply. For now, it is best to be happy to take the examples above and modify them in your personal knowledge as wanted.

Till subsequent time.