5 Helpful DIY Python Capabilities for Parsing Dates and Instances

5 Useful DIY Python Functions for Parsing Dates and Times

Picture by Creator

# Introduction

Parsing dates and instances is a type of duties that appears easy till you truly attempt to do it. Python’s datetime module handles commonplace codecs properly, however real-world information is messy. Person enter, scraped internet information, and legacy programs usually throw curveballs.

This text walks you thru 5 sensible features for dealing with frequent date and time parsing duties. By the top, you will perceive methods to construct versatile parsers that deal with the messy date codecs you see in initiatives.

Hyperlink to the code on GitHub

# 1. Parsing Relative Time Strings

Social media apps, chat purposes, and exercise feeds show timestamps like “5 minutes in the past” or “2 days in the past”. Once you scrape or course of this information, it is advisable convert these relative strings again into precise datetime objects.

Here is a perform that handles frequent relative time expressions:

from datetime import datetime, timedelta
import re

def parse_relative_time(time_string, reference_time=None):
    """
    Convert relative time strings to datetime objects.
    
    Examples: "2 hours in the past", "3 days in the past", "1 week in the past"
    """
    if reference_time is None:
        reference_time = datetime.now()
    
    # Normalize the string
    time_string = time_string.decrease().strip()
    
    # Sample: quantity + time unit + "in the past"
    sample = r'(d+)s*(second|minute|hour|day|week|month|12 months)s?s*in the past'
    match = re.match(sample, time_string)
    
    if not match:
        elevate ValueError(f"Can't parse: {time_string}")
    
    quantity = int(match.group(1))
    unit = match.group(2)
    
    # Map items to timedelta kwargs
    unit_mapping = {
        'second': 'seconds',
        'minute': 'minutes',
        'hour': 'hours',
        'day': 'days',
        'week': 'weeks',
    }
    
    if unit in unit_mapping:
        delta_kwargs = {unit_mapping[unit]: quantity}
        return reference_time - timedelta(**delta_kwargs)
    elif unit == 'month':
        # Approximate: 30 days per thirty days
        return reference_time - timedelta(days=quantity * 30)
    elif unit == '12 months':
        # Approximate: one year per 12 months
        return reference_time - timedelta(days=quantity * 365)

The perform makes use of a common expression (regex) to extract the quantity and time unit from the string. The sample (d+) captures a number of digits, and (second|minute|hour|day|week|month|12 months) matches the time unit. The s? makes the plural ‘s’ elective, so each “hour” and “hours” work.

For items that timedelta helps immediately (seconds via weeks), we create a timedelta and subtract it from the reference time. For months and years, we approximate utilizing 30 and one year respectively. This is not good, however it’s adequate for many use circumstances.

The reference_time parameter helps you to specify a unique “now” for testing or when processing historic information.

Let’s take a look at it:

result1 = parse_relative_time("2 hours in the past")
result2 = parse_relative_time("3 days in the past")
result3 = parse_relative_time("1 week in the past")

print(f"2 hours in the past: {result1}")
print(f"3 days in the past: {result2}")
print(f"1 week in the past: {result3}")

Output:

2 hours in the past: 2026-01-06 12:09:34.584107
3 days in the past: 2026-01-03 14:09:34.584504
1 week in the past: 2025-12-30 14:09:34.584558

# 2. Extracting Dates from Pure Language Textual content

Typically it is advisable discover dates buried in textual content: “The assembly is scheduled for January fifteenth, 2026” or “Please reply by March third”. As an alternative of manually parsing your entire sentence, you need to extract simply the date.

Here is a perform that finds and extracts dates from pure language:

import re
from datetime import datetime

def extract_date_from_text(textual content, current_year=None):
    """
    Extract dates from pure language textual content.
    
    Handles codecs like:
    - "January fifteenth, 2024"
    - "March third"
    - "Dec twenty fifth, 2023"
    """
    if current_year is None:
        current_year = datetime.now().12 months
    
    # Month names (full and abbreviated)
    months = {
        'january': 1, 'jan': 1,
        'february': 2, 'feb': 2,
        'march': 3, 'mar': 3,
        'april': 4, 'apr': 4,
        'could': 5,
        'june': 6, 'jun': 6,
        'july': 7, 'jul': 7,
        'august': 8, 'aug': 8,
        'september': 9, 'sep': 9, 'sept': 9,
        'october': 10, 'oct': 10,
        'november': 11, 'nov': 11,
        'december': 12, 'dec': 12
    }
    
    # Sample: Month Day(st/nd/rd/th), 12 months (12 months elective)
    sample = r'(january|jan|february|feb|march|mar|april|apr|could|june|jun|july|jul|august|aug|september|sep|sept|october|oct|november|nov|december|dec)s+(d{1,2})(?:st|nd|rd|th)?(?:,?s+(d{4}))?'
    
    matches = re.findall(sample, textual content.decrease())
    
    if not matches:
        return None
    
    # Take the primary match
    month_str, day_str, year_str = matches[0]
    
    month = months[month_str]
    day = int(day_str)
    12 months = int(year_str) if year_str else current_year
    
    return datetime(12 months, month, day)

The perform builds a dictionary mapping month names (each full and abbreviated) to their numeric values. The regex sample matches month names adopted by day numbers with elective ordinal suffixes (st, nd, rd, th) and an elective 12 months.

The (?:...) syntax creates a non-capturing group. This implies we match the sample however do not reserve it individually. That is helpful for elective elements just like the ordinal suffixes and the 12 months.

When no 12 months is supplied, the perform defaults to the present 12 months. That is logical as a result of if somebody mentions “March third” in January, they usually confer with the upcoming March, not the earlier 12 months’s.

Let’s take a look at it with numerous textual content codecs:

text1 = "The assembly is scheduled for January fifteenth, 2026 at 3pm"
text2 = "Please reply by March third"
text3 = "Deadline: Dec twenty fifth, 2026"

date1 = extract_date_from_text(text1)
date2 = extract_date_from_text(text2)
date3 = extract_date_from_text(text3)

print(f"From '{text1}': {date1}")
print(f"From '{text2}': {date2}")
print(f"From '{text3}': {date3}")

Output:

From 'The assembly is scheduled for January fifteenth, 2026 at 3pm': 2026-01-15 00:00:00
From 'Please reply by March third': 2026-03-03 00:00:00
From 'Deadline: Dec twenty fifth, 2026': 2026-12-25 00:00:00

# 3. Parsing Versatile Date Codecs with Sensible Detection

Actual-world information is available in many codecs. Writing separate parsers for every format is tedious. As an alternative, let’s construct a perform that tries a number of codecs mechanically.

Here is a wise date parser that handles frequent codecs:

from datetime import datetime

def parse_flexible_date(date_string):
    """
    Parse dates in a number of frequent codecs.
    
    Tries numerous codecs and returns the primary match.
    """
    date_string = date_string.strip()
    
    # Record of frequent date codecs
    codecs = [
        '%Y-%m-%d',           
        '%Y/%m/%d',           
        '%d-%m-%Y',           
        '%d/%m/%Y',         
        '%m/%d/%Y',           
        '%d.%m.%Y',          
        '%Y%m%d',            
        '%B %d, %Y',      
        '%b %d, %Y',         
        '%d %B %Y',          
        '%d %b %Y',           
    ]
    
    # Attempt every format
    for fmt in codecs:
        strive:
            return datetime.strptime(date_string, fmt)
        besides ValueError:
            proceed
    
    # If nothing labored, elevate an error
    elevate ValueError(f"Unable to parse date: {date_string}")

This perform makes use of a brute-force method. It tries every format till one works. The strptime perform raises a ValueError if the date string does not match the format, so we catch that exception and transfer to the subsequent format.

The order of codecs issues. We put Worldwide Group for Standardization (ISO) format (%Y-%m-%d) first as a result of it is the commonest in technical contexts. Ambiguous codecs like %d/%m/%Y and %m/%d/%Y seem later. If you realize your information makes use of one constantly, reorder the checklist to prioritize it.

Let’s take a look at it with numerous date codecs:

# Take a look at completely different codecs
dates = [
    "2026-01-15",
    "15/01/2026",
    "01/15/2026",
    "15.01.2026",
    "20260115",
    "January 15, 2026",
    "15 Jan 2026"
]

for date_str in dates:
    parsed = parse_flexible_date(date_str)
    print(f"{date_str:20} -> {parsed}")

Output:

2026-01-15           -> 2026-01-15 00:00:00
15/01/2026           -> 2026-01-15 00:00:00
01/15/2026           -> 2026-01-15 00:00:00
15.01.2026           -> 2026-01-15 00:00:00
20260115             -> 2026-01-15 00:00:00
January 15, 2026     -> 2026-01-15 00:00:00
15 Jan 2026          -> 2026-01-15 00:00:00

This method is not probably the most environment friendly, however it’s easy and handles the overwhelming majority of date codecs you will encounter.

# 4. Parsing Time Durations

Video gamers, exercise trackers, and time-tracking apps show durations like “1h 30m” or “2:45:30”. When parsing consumer enter or scraped information, it is advisable convert these to timedelta objects for calculations.

Here is a perform that parses frequent period codecs:

from datetime import timedelta
import re

def parse_duration(duration_string):
    """
    Parse period strings into timedelta objects.
    
    Handles codecs like:
    - "1h 30m 45s"
    - "2:45:30" (H:M:S)
    - "90 minutes"
    - "1.5 hours"
    """
    duration_string = duration_string.strip().decrease()
    
    # Attempt colon format first (H:M:S or M:S)
    if ':' in duration_string:
        elements = duration_string.break up(':')
        if len(elements) == 2:
            # M:S format
            minutes, seconds = map(int, elements)
            return timedelta(minutes=minutes, seconds=seconds)
        elif len(elements) == 3:
            # H:M:S format
            hours, minutes, seconds = map(int, elements)
            return timedelta(hours=hours, minutes=minutes, seconds=seconds)
    
    # Attempt unit-based format (1h 30m 45s)
    total_seconds = 0
    
    # Discover hours
    hours_match = re.search(r'(d+(?:.d+)?)s*h(?:ours?)?', duration_string)
    if hours_match:
        total_seconds += float(hours_match.group(1)) * 3600
    
    # Discover minutes
    minutes_match = re.search(r'(d+(?:.d+)?)s*m(?:in(?:ute)?s?)?', duration_string)
    if minutes_match:
        total_seconds += float(minutes_match.group(1)) * 60
    
    # Discover seconds
    seconds_match = re.search(r'(d+(?:.d+)?)s*s(?:ec(?:ond)?s?)?', duration_string)
    if seconds_match:
        total_seconds += float(seconds_match.group(1))
    
    if total_seconds > 0:
        return timedelta(seconds=total_seconds)
    
    elevate ValueError(f"Unable to parse period: {duration_string}")

The perform handles two essential codecs: colon-separated time and unit-based strings. For colon format, we break up on the colon and interpret the elements as hours, minutes, and seconds (or simply minutes and seconds for two-part durations).

For unit-based format, we use three separate regex patterns to search out hours, minutes, and seconds. The sample (d+(?:.d+)?) matches integers or decimals like “1.5”. The sample s*h(?:ours?)? matches “h”, “hour”, or “hours” with elective whitespace.

Every matched worth is transformed to seconds and added to the full. This method lets the perform deal with partial durations like “45s” or “2h 15m” with out requiring all items to be current.

Let’s now take a look at the perform with numerous period codecs:

durations = [
    "1h 30m 45s",
    "2:45:30",
    "90 minutes",
    "1.5 hours",
    "45s",
    "2h 15m"
]

for period in durations:
    parsed = parse_duration(period)
    print(f"{period:15} -> {parsed}")

Output:

1h 30m 45s      -> 1:30:45
2:45:30         -> 2:45:30
90 minutes      -> 1:30:00
1.5 hours       -> 1:30:00
45s             -> 0:00:45
2h 15m          -> 2:15:00

# 5. Parsing ISO Week Dates

Some programs use ISO week dates as an alternative of normal calendar dates. An ISO week date like “2026-W03-2” means “week 3 of 2026, day 2 (Tuesday)”. This format is frequent in enterprise contexts the place planning occurs weekly.

Here is a perform to parse ISO week dates:

from datetime import datetime, timedelta

def parse_iso_week_date(iso_week_string):
    """
    Parse ISO week date format: YYYY-Www-D
    
    Instance: "2024-W03-2" = Week 3 of 2024, Tuesday
    
    ISO week numbering:
    - Week 1 is the week with the primary Thursday of the 12 months
    - Days are numbered 1 (Monday) via 7 (Sunday)
    """
    # Parse the format: YYYY-Www-D
    elements = iso_week_string.break up('-')
    
    if len(elements) != 3 or not elements[1].startswith('W'):
        elevate ValueError(f"Invalid ISO week format: {iso_week_string}")
    
    12 months = int(elements[0])
    week = int(elements[1][1:])  # Take away 'W' prefix
    day = int(elements[2])
    
    if not (1 <= week <= 53):
        elevate ValueError(f"Week should be between 1 and 53: {week}")
    
    if not (1 <= day <= 7):
        elevate ValueError(f"Day should be between 1 and seven: {day}")
    
    # Discover January 4th (at all times in week 1)
    jan_4 = datetime(12 months, 1, 4)
    
    # Discover Monday of week 1
    week_1_monday = jan_4 - timedelta(days=jan_4.weekday())
    
    # Calculate the goal date
    target_date = week_1_monday + timedelta(weeks=week - 1, days=day - 1)
    
    return target_date

ISO week dates observe particular guidelines. Week 1 is outlined because the week containing the 12 months’s first Thursday. This implies week 1 may begin in December of the earlier 12 months.

The perform makes use of a dependable method: discover January 4th (which is at all times in week 1), then discover the Monday of that week. From there, we add the suitable variety of weeks and days to succeed in the goal date.

The calculation jan_4.weekday() returns 0 for Monday via 6 for Sunday. Subtracting this from January 4th offers us the Monday of week 1. Then we add (week - 1) weeks and (day - 1) days to get the ultimate date.

Let’s take a look at it:

# Take a look at ISO week dates
iso_dates = [
    "2024-W01-1",  # Week 1, Monday
    "2024-W03-2",  # Week 3, Tuesday
    "2024-W10-5",  # Week 10, Friday
]

for iso_date in iso_dates:
    parsed = parse_iso_week_date(iso_date)
    print(f"{iso_date} -> {parsed.strftime('%Y-%m-%d (%A)')}")

Output:

2024-W01-1 -> 2024-01-01 (Monday)
2024-W03-2 -> 2024-01-16 (Tuesday)
2024-W10-5 -> 2024-03-08 (Friday)

This format is much less frequent than common dates, however when encountered, having a parser prepared saves vital time.

# Wrapping Up

Every perform on this article makes use of regex patterns and datetime arithmetic to deal with variations in formatting. These strategies switch to different parsing challenges, as you may adapt these patterns for customized date codecs in your initiatives.

Constructing your personal parsers helps you perceive how date parsing operates. Once you run right into a non-standard date format that commonplace libraries can’t deal with, you may be prepared to write down a customized answer.

These features are notably helpful for small scripts, prototypes, and studying initiatives the place including heavy exterior dependencies is perhaps overkill. Comfortable coding!

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, information science, and content material creation. Her areas of curiosity and experience embody DevOps, information science, and pure language processing. She enjoys studying, writing, coding, and low! At the moment, she’s engaged on studying and sharing her data with the developer neighborhood by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.