• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Monday, June 29, 2026
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Data Science

Time-Sequence Characteristic Engineering with Python Itertools

Admin by Admin
May 15, 2026
in Data Science
0
Bala ts feature engg itertools.png
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Time-Series Feature Engineering with Python Itertools
 

# Introduction

 
Time sequence function engineering would not observe the identical guidelines as tabular knowledge. Observations aren’t unbiased, row order is not incidental, and probably the most helpful options are hardly ever particular person readings. You may must establish patterns throughout time like charges of change, lag comparisons, deviations from a rolling baseline, and extra.

Constructing lags, sliding home windows, and grouping throughout resolutions are all, at their core, iteration issues over ordered sequences. Python’s itertools module is a pure match for this type of work. It would not exchange high-level pandas abstractions like .rolling(), but it surely offers you lower-level constructing blocks to assemble precisely the options you want, with full management over the logic.

On this article, you may construct seven classes of time sequence options utilizing itertools. You may additionally apply every to a pattern dataset.

You may get the code on GitHub.

 

# Making a Pattern Dataset

 
Earlier than we begin constructing the options, let’s spin up a pattern sensor dataset to work with all through the article.

import numpy as np
import pandas as pd
import itertools

np.random.seed(42)

durations = 168  # one week of hourly readings
index = pd.date_range(begin="2024-03-01", durations=durations, freq="h")
hours = np.arange(durations)

# Temperature (°C): every day cycle + gradual drift + noise
temp_base = 3.5
temp_daily = 1.2 * np.sin(2 * np.pi * hours / 24)
temp_drift = 0.003 * hours
temp_noise = np.random.regular(0, 0.3, durations)
temperature = temp_base + temp_daily + temp_drift + temp_noise

# Humidity (%): inverse relationship with temperature + noise
humidity = 78 - 2.1 * (temperature - temp_base) + np.random.regular(0, 1.2, durations)

# Energy draw (kW): peaks throughout enterprise hours, larger on weekdays
day_of_week = index.dayofweek
business_hours = ((index.hour >= 8) & (index.hour <= 18)).astype(int)
weekend_factor = np.the place(day_of_week >= 5, 0.6, 1.0)
energy = (
    42.0
    + 18.0 * business_hours * weekend_factor
    + np.random.regular(0, 2.1, durations)
)

df = pd.DataFrame({
    "temperature_c": np.spherical(temperature, 3),
    "humidity_pct":  np.spherical(humidity, 2),
    "power_kw":      np.spherical(energy, 2),
}, index=index)
df.index.identify = "timestamp"

print(df.head(8))
print(f"nShape: {df.form}")

 

Output:

                     temperature_c  humidity_pct  power_kw
timestamp
2024-03-01 00:00:00          3.649         77.39     40.27
2024-03-01 01:00:00          3.772         76.52     41.33
2024-03-01 02:00:00          4.300         75.25     42.87
2024-03-01 03:00:00          4.814         74.26     40.82
2024-03-01 04:00:00          4.481         75.85     40.27
2024-03-01 05:00:00          4.604         76.09     42.51
2024-03-01 06:00:00          5.192         74.78     42.51
2024-03-01 07:00:00          4.910         76.03     40.94

Form: (168, 3)

 

We now have 168 hourly readings throughout three sensor channels. Now let’s construct options.

 

# 1. Producing Lag Options with islice

 
Lag options are probably the most elementary time sequence function: the worth of a variable at a hard and fast variety of steps previously. For instance, values from 1 step in the past, 6 steps in the past, or 24 steps in the past can every seize distinct patterns similar to short-term fluctuations, recurring intra-period habits, and longer-term traits or seasonality.

Let’s construct lag options for our pattern dataset utilizing islice:

sensor_readings = df["temperature_c"].tolist()
lag_offsets = [1, 6, 12, 24]

lag_features = {}
for lag in lag_offsets:
    lagged = checklist(itertools.islice(sensor_readings, 0, len(sensor_readings) - lag))
    # Pad the start with None to protect index alignment
    lag_features[f"temp_lag_{lag}h"] = [None] * lag + lagged

lag_df = pd.DataFrame(lag_features, index=df.index)
lag_df["temperature_c"] = df["temperature_c"]

print(lag_df.iloc[24:30])

 

Output:

                     temp_lag_1h  temp_lag_6h  temp_lag_12h  temp_lag_24h  
timestamp
2024-03-02 00:00:00        2.831        2.082         3.609         3.649
2024-03-02 01:00:00        3.409        1.974         2.654         3.772
2024-03-02 02:00:00        3.919        2.960         2.425         4.300
2024-03-02 03:00:00        3.833        2.647         2.528         4.814
2024-03-02 04:00:00        4.542        2.986         2.205         4.481
2024-03-02 05:00:00        4.443        2.831         2.486         4.604

                     temperature_c
timestamp
2024-03-02 00:00:00          3.409
2024-03-02 01:00:00          3.919
2024-03-02 02:00:00          3.833
2024-03-02 03:00:00          4.542
2024-03-02 04:00:00          4.443
2024-03-02 05:00:00          4.659

 

islice(sensor_readings, 0, len - lag) extracts the sequence shifted again by lag steps with out creating a duplicate of the complete checklist. The None padding on the entrance retains each lag function aligned with the unique index. This issues while you later drop NaNs for mannequin coaching.

 

# 2. Constructing Rolling Window Options with islice and accumulate

 
A single lag worth tells you what the sensor learn at a degree previously. A rolling statistic tells you what the sensor has been doing over a window of time, which is commonly much more helpful.

readings = df["temperature_c"].tolist()
window_size = 6  # 6-hour rolling window

rolling_features = []

for i in vary(len(readings)):
    if i < window_size:
        rolling_features.append({
            "rolling_mean_6h": None,
            "rolling_std_6h":  None,
            "rolling_min_6h":  None,
            "rolling_max_6h":  None,
        })
        proceed

    window = checklist(itertools.islice(readings, i - window_size, i))

    # Use accumulate to compute working sum for imply
    running_sum = checklist(itertools.accumulate(window))
    window_mean = running_sum[-1] / window_size
    window_mean_sq = sum(x**2 for x in window) / window_size

    rolling_features.append({
        "rolling_mean_6h": spherical(window_mean, 4),
        "rolling_std_6h":  spherical((window_mean_sq - window_mean**2) ** 0.5, 4),
        "rolling_min_6h":  spherical(min(window), 4),
        "rolling_max_6h":  spherical(max(window), 4),
    })

roll_df = pd.DataFrame(rolling_features, index=df.index)
roll_df["temperature_c"] = df["temperature_c"]

print(roll_df.iloc[6:12])

 

Output:

                     rolling_mean_6h  rolling_std_6h  rolling_min_6h  
timestamp
2024-03-01 06:00:00           4.2700          0.4256           3.649
2024-03-01 07:00:00           4.5272          0.4386           3.772
2024-03-01 08:00:00           4.7168          0.2929           4.300
2024-03-01 09:00:00           4.7372          0.2662           4.422
2024-03-01 10:00:00           4.6912          0.2728           4.422
2024-03-01 11:00:00           4.6095          0.3769           3.991

                     rolling_max_6h  temperature_c
timestamp
2024-03-01 06:00:00           4.814          5.192
2024-03-01 07:00:00           5.192          4.910
2024-03-01 08:00:00           5.192          4.422
2024-03-01 09:00:00           5.192          4.538
2024-03-01 10:00:00           5.192          3.991
2024-03-01 11:00:00           5.192          3.704

 

The accumulate name right here computes the working sum of the window so we get the overall in a single go — running_sum[-1] — with out calling sum() individually. For giant datasets processed in a streaming style, avoiding redundant passes over the identical knowledge is environment friendly.

 

# 3. Creating Seasonal Interplay Options with product

 
Many time sequence exhibit layered seasonality, the place a number of temporal cycles work together — similar to time of day, day of week, and broader operational or cyclical durations. Interplay options that mix these dimensions can seize patterns that particular person time parts alone might overlook.

Now let’s construct interplay options with product:

hours_of_day = checklist(vary(24))
day_types = ["weekday", "weekend"]
operational_shifts = ["off_peak", "on_peak"]  # on_peak: 08:00–18:00

# Construct a full lookup grid for all combos
season_grid = checklist(itertools.product(hours_of_day, day_types, operational_shifts))
season_df = pd.DataFrame(season_grid, columns=["hour", "day_type", "shift"])

# Simulate anticipated baseline temperature per mixture
np.random.seed(14)
season_df["baseline_temp_c"] = np.spherical(
    3.5
    + 0.8 * np.sin(2 * np.pi * season_df["hour"] / 24)
    + np.the place(season_df["day_type"] == "weekend", 0.3, 0.0)
    + np.the place(season_df["shift"] == "on_peak", 0.5, 0.0)
    + np.random.regular(0, 0.1, len(season_df)),
    3
)

print(season_df[season_df["hour"].isin([0, 8, 14, 20])].head(16).to_string(index=False))
print(f"nTotal grid combos: {len(season_df)}")

 

Output:

hour day_type    shift  baseline_temp_c
   0  weekday off_peak            3.655
   0  weekday  on_peak            4.008
   0  weekend off_peak            3.817
   0  weekend  on_peak            4.293
   8  weekday off_peak            4.325
   8  weekday  on_peak            4.601
   8  weekend off_peak            4.446
   8  weekend  on_peak            4.978
  14  weekday off_peak            3.370
  14  weekday  on_peak            3.628
  14  weekend off_peak            3.279
  14  weekend  on_peak            3.959
  20  weekday off_peak            2.726
  20  weekday  on_peak            3.256
  20  weekend off_peak            3.056
  20  weekend  on_peak            3.530

Complete grid combos: 96

 

This grid merges again onto your most important dataset as a baseline_temp_c function per row — giving each studying a context-aware anticipated worth. The deviation from that baseline, temperature_c - baseline_temp_c, is then a helpful anomaly detection function.

 

# 4. Extracting Sliding Window Statistics with tee

 
Generally that you must course of the identical sequence by a number of statistical lenses concurrently — imply, variance, price of change — with out iterating over it a number of occasions. itertools.tee creates unbiased iterators from a single supply, which is strictly what you want.

def sliding_window_stats(sequence, window_size):
    """Compute imply, vary and rate-of-change over sliding home windows utilizing tee."""
    outcomes = []
    it = iter(sequence)

    window = checklist(itertools.islice(it, window_size))
    if len(window) < window_size:
        return outcomes

    outcomes.append({
        "window_mean":    spherical(sum(window) / window_size, 4),
        "window_range":   spherical(max(window) - min(window), 4),
        "rate_of_change": spherical(window[-1] - window[0], 4),
    })

    for next_val in it:
        window = window[1:] + [next_val]

        # tee creates two unbiased iterators over the identical window
        iter_a, iter_b = itertools.tee(iter(window))

        values_a = checklist(iter_a)
        values_b = checklist(iter_b)

        mean_val = sum(values_a) / window_size
        outcomes.append({
            "window_mean":    spherical(mean_val, 4),
            "window_range":   spherical(max(values_b) - min(values_b), 4),
            "rate_of_change": spherical(window[-1] - window[0], 4),
        })

    return outcomes

power_readings = df["power_kw"].tolist()
stats = sliding_window_stats(power_readings, window_size=8)

stats_df = pd.DataFrame(stats, index=df.index[7:])
stats_df["power_kw"] = df["power_kw"].iloc[7:].values

print(stats_df.iloc[0:8])

 

Output:

                     window_mean  window_range  rate_of_change  power_kw
timestamp
2024-03-01 07:00:00      41.4400          2.60            0.67     40.94
2024-03-01 08:00:00      43.7825         18.74           17.68     59.01
2024-03-01 09:00:00      46.1775         20.22           17.62     60.49
2024-03-01 10:00:00      47.9387         20.22           16.14     56.96
2024-03-01 11:00:00      49.9663         20.22           16.77     57.04
2024-03-01 12:00:00      52.2437         19.55           15.98     58.49
2024-03-01 13:00:00      54.3738         19.55           17.04     59.55
2024-03-01 14:00:00      56.6412         19.71           19.71     60.65

 

As seen, tee enables you to go the identical window iterator into two separate downstream computations with out rewinding or copying the checklist your self.

 

# 5. Combining Multi-Decision Time Options with chain

 
Helpful time sequence options usually come from a number of temporal resolutions concurrently: the uncooked hourly studying, a 6-hour rolling imply, a 24-hour rolling imply, and a calendar function like hour-of-day. These are often in separate arrays and want assembling into one clear function checklist. This is how you should use chain to mix such options:

humidity = df["humidity_pct"].tolist()

def rolling_means(sequence, window):
    means = []
    for i in vary(len(sequence)):
        if i < window:
            means.append(None)
        else:
            w = checklist(itertools.islice(sequence, i - window, i))
            means.append(spherical(sum(w) / window, 3))
    return means

rolling_6h       = rolling_means(humidity, 6)
rolling_24h      = rolling_means(humidity, 24)
hour_of_day      = df.index.hour.tolist()
is_business_hour = [1 if 8 <= h <= 18 else 0 for h in hour_of_day]

# chain assembles function identify checklist from logically grouped sublists
feature_names = checklist(itertools.chain(
    ["humidity_raw"],
    ["humidity_roll_6h", "humidity_roll_24h"],
    ["hour_of_day", "is_business_hour"],
))

multi_res_df = pd.DataFrame({
    identify: vals for identify, vals in zip(
        feature_names,
        [humidity, rolling_6h, rolling_24h, hour_of_day, is_business_hour]
    )
}, index=df.index)

print(multi_res_df.iloc[24:30])

 

Output:

                     humidity_raw  humidity_roll_6h  humidity_roll_24h  
timestamp
2024-03-02 00:00:00         78.45            79.622             78.055
2024-03-02 01:00:00         75.63            79.105             78.100
2024-03-02 02:00:00         77.51            78.190             78.062
2024-03-02 03:00:00         76.27            78.088             78.157
2024-03-02 04:00:00         74.96            77.805             78.240
2024-03-02 05:00:00         75.75            77.208             78.203

                     hour_of_day  is_business_hour
timestamp
2024-03-02 00:00:00            0                 0
2024-03-02 01:00:00            1                 0
2024-03-02 02:00:00            2                 0
2024-03-02 03:00:00            3                 0
2024-03-02 04:00:00            4                 0
2024-03-02 05:00:00            5                 0

 

chain right here assembles the function identify checklist from logically grouped sublists — uncooked sensor, rolling aggregates, calendar options. As your function set grows throughout extra sensor channels and extra resolutions, chain retains that meeting readable and straightforward to increase.

 

# 6. Computing Pairwise Temporal Correlations with combos

 
In a multi-sensor setting, the relationships between variables over time usually comprise worthwhile indicators that particular person measurements alone can’t seize. For instance, simultaneous will increase throughout two sensors might reveal rising situations or interactions that might not be obvious when every sequence is analyzed in isolation.

Incorporating options that replicate these joint dynamics can enhance a mannequin’s capability to detect delicate patterns and dependencies. Let’s attempt constructing pairwise correlations utilizing combos:

sensor_cols = ["temperature_c", "humidity_pct", "power_kw"]
window_size = 12

pairwise_features = {}

for col_a, col_b in itertools.combos(sensor_cols, 2):
    feature_name = f"corr_{col_a[:4]}_{col_b[:4]}_12h"
    correlations = []

    series_a = df[col_a].tolist()
    series_b = df[col_b].tolist()

    for i in vary(len(series_a)):
        if i < window_size:
            correlations.append(None)
            proceed

        win_a = checklist(itertools.islice(series_a, i - window_size, i))
        win_b = checklist(itertools.islice(series_b, i - window_size, i))

        mean_a = sum(win_a) / window_size
        mean_b = sum(win_b) / window_size

        cov   = sum((a - mean_a) * (b - mean_b) for a, b in zip(win_a, win_b)) / window_size
        std_a = (sum((a - mean_a)**2 for a in win_a) / window_size) ** 0.5
        std_b = (sum((b - mean_b)**2 for b in win_b) / window_size) ** 0.5

        corr = spherical(cov / (std_a * std_b), 4) if std_a > 0 and std_b > 0 else None
        correlations.append(corr)

    pairwise_features[feature_name] = correlations

corr_df = pd.DataFrame(pairwise_features, index=df.index)
print(corr_df.iloc[12:18])

 

Output:

                     corr_temp_humi_12h  corr_temp_powe_12h  
timestamp
2024-03-01 12:00:00             -0.6700             -0.2281
2024-03-01 13:00:00             -0.7208             -0.4960
2024-03-01 14:00:00             -0.7442             -0.6669
2024-03-01 15:00:00             -0.7678             -0.7076
2024-03-01 16:00:00             -0.8116             -0.7265
2024-03-01 17:00:00             -0.8368             -0.7482

                     corr_humi_powe_12h
timestamp
2024-03-01 12:00:00              0.5380
2024-03-01 13:00:00              0.6614
2024-03-01 14:00:00              0.7202
2024-03-01 15:00:00              0.7311
2024-03-01 16:00:00              0.7233
2024-03-01 17:00:00              0.7219

 

# 7. Accumulating Working Baselines with accumulate

 
A given worth can carry totally different significance relying on when it happens in a sequence. What issues is its deviation from the evolving baseline — the working imply as much as that time limit. Utilizing an incremental method similar to accumulate, you may compute this working imply effectively with out storing all the historical past.

readings = df["temperature_c"].tolist()

running_sums   = checklist(itertools.accumulate(readings))
running_counts = checklist(itertools.accumulate([1] * len(readings)))
running_means  = [
    round(s / c, 4)
    for s, c in zip(running_sums, running_counts)
]

# Working max — highest temperature seen thus far, helpful for breach monitoring
running_max = checklist(itertools.accumulate(readings, func=max))

deviation_from_baseline = [
    round(r - m, 4)
    for r, m in zip(readings, running_means)
]

baseline_df = pd.DataFrame({
    "temperature_c":           readings,
    "running_mean":            running_means,
    "running_max":             running_max,
    "deviation_from_baseline": deviation_from_baseline,
}, index=df.index)

print(baseline_df.iloc[20:28])

 

Output:

                     temperature_c  running_mean  running_max  
timestamp
2024-03-01 20:00:00          2.960        3.5857        5.192
2024-03-01 21:00:00          2.647        3.5430        5.192
2024-03-01 22:00:00          2.986        3.5188        5.192
2024-03-01 23:00:00          2.831        3.4902        5.192
2024-03-02 00:00:00          3.409        3.4869        5.192
2024-03-02 01:00:00          3.919        3.5035        5.192
2024-03-02 02:00:00          3.833        3.5157        5.192
2024-03-02 03:00:00          4.542        3.5524        5.192

                     deviation_from_baseline
timestamp
2024-03-01 20:00:00                  -0.6257
2024-03-01 21:00:00                  -0.8960
2024-03-01 22:00:00                  -0.5328
2024-03-01 23:00:00                  -0.6592
2024-03-02 00:00:00                  -0.0779
2024-03-02 01:00:00                   0.4155
2024-03-02 02:00:00                   0.3173
2024-03-02 03:00:00                   0.9896

 

# Abstract

 
Time sequence function engineering is basically about describing context — what has this sign been doing, relative to what we count on it to be doing? Each perform lined here’s a totally different method of formalizing that query right into a quantity a mannequin can study from.

This is a abstract of the patterns we have lined on this article:
 

itertools Operate Time Sequence Characteristic Instance
islice Lag options Temperature 1h, 6h, 24h in the past
islice + accumulate Rolling window stats 6h imply, std, min, max
product Seasonal interplay grid Hour × day kind × shift baseline
tee Parallel window statistics Imply + vary + price of change
chain Multi-resolution function meeting Uncooked + rolling + calendar options
combos Pairwise cross-sensor correlations Temp–humidity, temp–energy rolling corr
accumulate Working baseline + deviation Drift detection from historic imply

 
And since itertools works on the iterator degree, all of those patterns compose cleanly into streaming pipelines as nicely. Completely happy function engineering!
 
 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and low! Presently, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.



READ ALSO

5 AI Coding Subscription Plans That Give Builders the Finest Worth

Digital Transformation Begins The place Selections Occur, Not The place Information Is Saved  |

Tags: EngineeringFeatureItertoolsPythontimeseries

Related Posts

Awan 5 ai coding subscription plans give developers best value 3.png
Data Science

5 AI Coding Subscription Plans That Give Builders the Finest Worth

June 29, 2026
Centralized data bottlenecks vs governed decentralization.jpg.png
Data Science

Digital Transformation Begins The place Selections Occur, Not The place Information Is Saved  |

June 28, 2026
Kdn shittu agentic workflows to automate your data science pipeline scaled 1.png
Data Science

5 Agentic Workflows to Automate Your Information Science Pipeline

June 28, 2026
Chatgpt image jun 22 2026 03 37 20 pm.png
Data Science

The Significance Of Defending Delicate Information In Public Companies

June 27, 2026
Jeff bezos prometheus ai funding.png
Data Science

Bezos Unretired to Construct AI for Jet Engines, The Business Ought to Pay Consideration |

June 27, 2026
Kdn chugani fine tuning language models apple silicon mlx feature.png
Data Science

Tremendous-tuning Language Fashions on Apple Silicon with MLX

June 26, 2026
Next Post
Hi the crypto ipo race is on from mining companies to exchanges new2.jpg

Sign Says it May Exit Canada if Compelled to Adjust to Lawful Entry Invoice

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
Chainlink Link And Cardano Ada Dominate The Crypto Coin Development Chart.jpg

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

May 17, 2025
Image 100 1024x683.png

Easy methods to Use LLMs for Highly effective Computerized Evaluations

August 13, 2025
Blog.png

XMN is accessible for buying and selling!

October 10, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025

EDITOR'S PICK

Nvidia Kai Scheduler 2 1 0425.png

NVIDIA Open Sources Run:ai Scheduler

April 1, 2025
Unnamed 39.png

Imaginative and prescient Transformers (ViT) Defined: Are They Higher Than CNNs?

March 1, 2025
Robot shutterstock.jpg

Google Deepmind hackathon to pit meatbags v machines • The Register

March 18, 2026
9e270cf6 cc57 4182 bc1f 28970dc64e73 800x420.jpg

Peter Thiel-backed crypto trade Bullish formally recordsdata for IPO

July 19, 2025

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • 5 AI Coding Subscription Plans That Give Builders the Finest Worth
  • The right way to Select Between Small and Frontier Fashions
  • Vitalik Particulars Cryptographic Path To Non-public Onchain Voting
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?