• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
Thursday, May 15, 2025
newsaiworld
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us
No Result
View All Result
Morning News
No Result
View All Result
Home Artificial Intelligence

Producing Information Dictionary for Excel Information Utilizing OpenPyxl and AI Brokers

Admin by Admin
May 8, 2025
in Artificial Intelligence
0
Ai Agent Data Docs Wide.jpeg
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

READ ALSO

Parquet File Format – All the pieces You Must Know!

Survival Evaluation When No One Dies: A Worth-Based mostly Strategy


Each firm I labored for till right now, there it was: the resilient MS Excel.

Excel was first launched in 1985 and has remained robust till right now. It has survived the rise of relational databases, the evolution of many programming languages, the Web with its infinite variety of on-line purposes, and at last, it’s also surviving the period of the AI.

Phew!

Do you might have any doubts about how resilient Excel is? I don’t.

I believe the explanation for that’s its practicality to start out and manipulate a doc rapidly. Take into consideration this case: we’re at work, in a gathering, and instantly the management shares a CSV file and asks for a fast calculation or a number of calculated numbers. Now, the choices are:

1. Open an IDE (or a pocket book) and begin coding like loopy to generate a easy matplotlib graphic;

2. Open Energy BI, import the information, and begin making a report with dynamic graphics.

3. Open the CSV in Excel, write a few formulation, and create a graphic.

I can’t converse for you, however many occasions I am going for possibility 3. Particularly as a result of Excel information are appropriate with every thing, simply shareable, and beginner-friendly.

I’m saying all of this as an Introduction to make my level that I don’t assume that Excel information are going away anytime quickly, even with the quick growth of AI. Many will love that, many will hate that.

So, my motion right here was to leverage AI to make Excel information higher documented. One of many fundamental complaints of knowledge groups about Excel is the dearth of finest practices and reproducibility, on condition that the names of the columns can have any names and knowledge sorts, however zero documentation.

So, I’ve created an AI Agent that reads the Excel file and creates this small documentation. Right here is the way it works:

  1. The Excel file is transformed to CSV and fed into the Massive Language Mannequin (LLM).
  2. The AI Agent generates the information dictionary with column data (variable identify, knowledge kind, description).
  3. The info dictionary will get added as feedback to the Excel file’s header.
  4. Output file saved with feedback.

Okay. Fingers-on now. Let’s get that achieved on this tutorial.

Code

Let’s code! | Picture generated by AI. Meta Llama, 2025. https://meta.ai

We are going to start by organising a digital surroundings. Create a venv with the instrument of your selection, resembling Poetry, Python Venv, Anaconda, or UV. I actually like UV, as it’s the quickest and the best, for my part. In case you have UV put in [5], open a terminal and create your venv.

uv init data-docs
cd data-docs
uv venv
uv add streamlit openpyxl pandas agno mcp google-genai

Now, allow us to import the required modules. This mission was created with Python 3.12.1, however I imagine Python 3.9 or larger would possibly do the trick already. We are going to use:

  • Agno: for the AI Agent administration
  • OpenPyxl: for the manipulation of Excel information
  • Streamlit: for the front-end interface.
  • Pandas, OS, JSON, Dedent and Google Genai as assist modules.
# Imports
import os
import json
import streamlit as st
from textwrap import dedent

from agno.agent import Agent
from agno.fashions.google import Gemini
from agno.instruments.file import FileTools

from openpyxl import load_workbook
from openpyxl.feedback import Remark
import pandas as pd

Nice. The following step is creating the capabilities we’ll have to deal with the Excel information and to create the AI Agent.

Discover that each one the capabilities have detailed docstrings. That is intentional as a result of LLMs use docstrings to know what a given perform does and determine whether or not to make use of it or not as a instrument.

So, for those who’re utilizing Python capabilities as Instruments for an AI Agent, ensure to make use of detailed docstrings. These days, with free copilots resembling Windsurf [6] it’s even simpler to create them.

Changing the file to CSV

This perform will:

  • Take the Excel file and skim solely the primary 10 rows. That is sufficient for us to ship to the LLM. Doing that, we’re additionally stopping sending too many tokens as enter and making this agent too costly.
  • Save the file as CSV to make use of as enter for the AI Agent. The CSV format is simpler for the mannequin to absorb, as it’s a bunch of textual content separated by commas. And we all know LLMs shine working with textual content.

Right here is the perform.

def convert_to_csv(file_path:str):
   """
    Use this instrument to transform the excel file to CSV.

    * file_path: Path to the Excel file to be transformed
    """
   # Load the file  
   df = pd.read_excel(file_path).head(10)

   # Convert to CSV
   st.write("Changing to CSV... :leftwards_arrow_with_hook:")
   return df.to_csv('temp.csv', index=False)

Let’s transfer on.

Creating the Agent

The following perform creates the AI agent. I’m utilizing Agno [1], as it is extremely versatile and simple to make use of. I additionally selected the mannequin Gemini 2.0 Flash. In the course of the check section, this was the best-performing mannequin producing the information docs. To make use of it, you will want an API Key from Google. Don’t neglect to get one right here [7].

The perform:

  • Receives the CSV output from the earlier perform.
  • Passes by the AI Agent, which generates the information dictionary with column identify, description, and knowledge kind.
  • Discover that the description argument is the immediate for the agent. Make it detailed and exact.
  • The info dictionary shall be saved as a JSON file utilizing a instrument known as FileTools that may learn and write information.
  • I’ve arrange retries=2 so we will work round any error on a primary strive.
def create_agent(apy_key):
    agent = Agent(
        mannequin=Gemini(id="gemini-2.0-flash", api_key=apy_key),
        description= dedent("""
                            You might be an agent that reads the temp.csv dataset offered to you and 
                            primarily based on the identify and knowledge kind of every column header, decide the next data:
                            - The info kinds of every column
                            - The outline of every column
                            - The primary column numer is 0

                            Utilizing the FileTools offered, create an information dictionary in JSON format that features the beneath data:
                            {: {ColName: , DataType: , Description: }}

                            If you're unable to find out the information kind or description of a column, return 'N/A' for that column for the lacking values.
                            
                            """),
        instruments=[ FileTools(read_files=True, save_files=True) ],
        retries=2,
        show_tool_calls=True
        )

    return agent

Okay. Now we’d like one other perform to avoid wasting the information dictionary to the file.

Including Information Dictionary to the File’s Header

That is the final perform to be created. It would:

  • Get the information dictionary json from the earlier step and the unique Excel file.
  • Add the information dictionary to the file’s header as feedback.
  • Save the output file.
  • As soon as the file is saved, it shows a obtain button for the person to get the modified file.
def add_comments_to_header(file_path:str, data_dict:dict="data_dict.json"):
    """
    Use this instrument so as to add the information dictionary {data_dict.json} as feedback to the header of an Excel file and save the output file.

    The perform takes the Excel file path as argument and provides the {data_dict.json} as feedback to every cell
    Begin counting from column 0
    within the first row of the Excel file, utilizing the next format:    
        * Column Quantity: 
        * Column Title: 
        * Information Sort: 
        * Description: 

    Parameters
    ----------
    * file_path : str
        The trail to the Excel file to be processed
    * data_dict : dict
        The info dictionary containing the column quantity, column identify, knowledge kind, description, and variety of null values

    """
    
    # Load the information dictionary
    data_dict = json.load(open(data_dict))

    # Load the workbook
    wb = load_workbook(file_path)

    # Get the lively worksheet
    ws = wb.lively

    # Iterate over every column within the first row (header)
    for n, col in enumerate(ws.iter_cols(min_row=1, max_row=1)):
        for header_cell in col:
            header_cell.remark = Remark(dedent(f"""
                              ColName: {data_dict[str(n)]['ColName']}, 
                              DataType: {data_dict[str(n)]['DataType']},
                              Description: {data_dict[str(n)]['Description']}
    """),'AI Agent')

    # Save the workbook
    st.write("Saving File... :floppy_disk:")
    wb.save('output.xlsx')

    # Create a obtain button
    with open('output.xlsx', 'rb') as f:
        st.download_button(
            label="Obtain output.xlsx",
            knowledge=f,
            file_name='output.xlsx',
            mime='utility/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
        )

Okay. The following step is to attach all of this collectively on a Streamlit front-end script.

Streamlit Entrance-Finish

On this step, I may have created a distinct file for the front-end and imported the capabilities in there. However I made a decision to make use of the identical file, so let’s begin with the well-known:

if __name__ == "__main__":

First, a few strains to configure the web page and messages displayed within the Net Utility. We are going to use the content material centered on the web page, and there’s some details about how the App works.

# Config web page Streamlit
    st.set_page_config(structure="centered", 
                       page_title="Information Docs", 
                       page_icon=":paperclip:",
                       initial_sidebar_state="expanded")
    
    # Title
    st.title("Information Docs :paperclip:")
    st.subheader("Generate an information dictionary to your Excel file.")
    st.caption("1. Enter your Gemini API key and the trail of the Excel file on the sidebar.")
    st.caption("2. Run the agent.")
    st.caption("3. The agent will generate an information dictionary and add it as feedback to the header of the Excel file.")
    st.caption("ColName:  | DataType:  | Description: ")
    
    st.divider()

Subsequent, we’ll arrange the sidebar, the place the person can enter their API Key from Google and choose a .xlsx file to be modified.

There’s a button to run the applying, one other to reset the app state, and a progress bar. Nothing too fancy.

with st.sidebar:
        # Enter your API key
        st.caption("Enter your API key and the trail of the Excel file.")
        api_key = st.text_input("API key: ", placeholder="Google Gemini API key", kind="password")
        
        # Add file
        input_file = st.file_uploader("File add", 
                                       kind='xlsx')
        

        # Run the agent
        agent_run = st.button("Run")

        # progress bar
        progress_bar = st.empty()
        progress_bar.progress(0, textual content="Initializing...")

        st.divider()

        # Reset session state
        if st.button("Reset Session"):
            st.session_state.clear()
            st.rerun()

As soon as the run button is clicked, it triggers the remainder of the code to run the Agent. Right here is the sequence of steps carried out:

  1. The primary perform is named to rework the file to CSV
  2. The progress is registered on the progress bar.
  3. The Agent is created.
  4. Progress bar up to date.
  5. A immediate is fed into the agent to learn the temp.csv file, create the information dictionary, and save the output to data_dictionary.json.
  6. The info dictionary is printed on the display, so the person can see what was generated whereas it’s being saved to the Excel file.
  7. The Excel file is modified and saved.
# Create the agent
    if agent_run:
        # Convert Excel file to CSV
        convert_to_csv(input_file)

        # Register progress
        progress_bar.progress(15, textual content="Processing CSV...")

        # Create the agent
        agent = create_agent(api_key)

        # Begin the script
        st.write("Working Agent... :runner:")

        # Register progress
        progress_bar.progress(50, textual content="AI Agent is operating...")

        # Run the agent    
        agent.print_response(dedent(f"""
                                1. Use FileTools to learn the temp.csv as enter to create the information dictionary for the columns within the dataset. 
                                2. Utilizing the FileTools instrument, save the information dictionary to a file named 'data_dict.json'.
                                
                                """),
                        markdown=True)

        # Print the information dictionary
        st.write("Producing Information Dictionary... :page_facing_up:")
        with open('data_dict.json', 'r') as f:
            data_dict = json.load(f)
            st.json(data_dict, expanded=False)

        # Add feedback to header
        add_comments_to_header(input_file, 'data_dict.json')

        # Take away momentary information
        st.write("Eradicating momentary information... :wastebasket:")
        os.take away('temp.csv')
        os.take away('data_dict.json')    
    
    # If file exists, present success message
    if os.path.exists('output.xlsx'):
        st.success("Performed! :white_check_mark:")
        os.take away('output.xlsx')

    # Progress bar finish
    progress_bar.progress(100, textual content="Performed!")

That’s it. Here’s a demonstration of the agent in motion.

Information Docs added to your Excel File. Picture by the writer.

Lovely end result!

Attempt It

You may strive the deployed app right here: https://excel-datadocs.streamlit.app/

Earlier than You Go

In my humble opinion, Excel information will not be going away anytime quickly. Loving or hating them, we’ll have to stay with them for some time.

Excel information are versatile, straightforward to deal with and share, thus they’re nonetheless very helpful for the routine ad-hoc duties at work.

Nevertheless, now we will leverage AI to assist us deal with these information and make them higher. Synthetic Intelligence is touching so many factors of our lives. The routine and instruments at work are solely one other one.

Let’s make the most of AI and work smarter daily!

In case you appreciated this content material, discover extra of my work in my web site and GitHub, shared beneath.

GitHub Repository

Right here is the GitHub Repository for this mission.

https://github.com/gurezende/Information-Dictionary-GenAI

Discover Me

Yow will discover extra about my work on my web site.

https://gustavorsantos.me

References

[1. Agno Docs] https://docs.agno.com/introduction/brokers

[2. Openpyxl Docs] https://openpyxl.readthedocs.io/en/steady/index.html

[3. Streamlit Docs] https://docs.streamlit.io/

[4. Data-Docs Web App] https://excel-datadocs.streamlit.app/

[5. Installing UV] https://docs.astral.sh/uv/getting-started/set up/

[6. Windsurf Coding Copilot] https://windsurf.com/vscode_tutorial

[7. Google Gemini API Key] https://ai.google.dev/gemini-api/docs/api-key

Tags: AgentsDataDictionaryExcelFilesGeneratingOpenPyxl

Related Posts

Image 109.png
Artificial Intelligence

Parquet File Format – All the pieces You Must Know!

May 14, 2025
Cover.png
Artificial Intelligence

Survival Evaluation When No One Dies: A Worth-Based mostly Strategy

May 14, 2025
Image 81.png
Artificial Intelligence

How I Lastly Understood MCP — and Bought It Working in Actual Life

May 13, 2025
Chatgpt Image May 10 2025 08 59 39 Am.png
Artificial Intelligence

Working Python Applications in Your Browser

May 12, 2025
Model Compression 2 1024x683.png
Artificial Intelligence

Mannequin Compression: Make Your Machine Studying Fashions Lighter and Sooner

May 12, 2025
Doppleware Ai Robot Facepalming Ar 169 V 6.1 Ffc36bad C0b8 41d7 Be9e 66484ca8c4f4 1 1.png
Artificial Intelligence

How To not Write an MCP Server

May 11, 2025
Next Post
A633230d 3f9a 4749 8bab 017c4b9435a0.jpeg

Why is the crypto market up right now?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Gemini 2.0 Fash Vs Gpt 4o.webp.webp

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

January 19, 2025
0 3.png

College endowments be a part of crypto rush, boosting meme cash like Meme Index

February 10, 2025
How To Maintain Data Quality In The Supply Chain Feature.jpg

Find out how to Preserve Knowledge High quality within the Provide Chain

September 8, 2024
0khns0 Djocjfzxyr.jpeg

Constructing Data Graphs with LLM Graph Transformer | by Tomaz Bratanic | Nov, 2024

November 5, 2024
1vrlur6bbhf72bupq69n6rq.png

The Artwork of Chunking: Boosting AI Efficiency in RAG Architectures | by Han HELOIR, Ph.D. ☕️ | Aug, 2024

August 19, 2024

EDITOR'S PICK

Dall·e 2024 11 08 17.19.52 A Digital Illustration Showcasing Bitcoin Nearing Its All Time High Ath As Miners Hold Back On Selling Signaling Bullish Sentiment. The Image Promi.jpg

Bitcoin Nears ATH As Miners Maintain Again On Promoting—Bullish Alerts Flashing

November 9, 2024
Securities Exchange Commission.jpg

File $4.68 billion fines mark SEC’s hardest yr on crypto

September 9, 2024
Robotic Cyborg Hand Pressing Keyboard Laptop.jpg

From Chaos to Management: How Check Automation Supercharges Actual-Time Dataflow Processing

March 28, 2025
Mining20bitcoin Id 4b377401 Ef6e 45ef 966c E359e6f85fb0 Size900.jpg

Bitcoin Miner Marathon Shares Drop 8%: $138 Million Penalty and Income Challenges

August 3, 2024

About Us

Welcome to News AI World, your go-to source for the latest in artificial intelligence news and developments. Our mission is to deliver comprehensive and insightful coverage of the rapidly evolving AI landscape, keeping you informed about breakthroughs, trends, and the transformative impact of AI technologies across industries.

Categories

  • Artificial Intelligence
  • ChatGPT
  • Crypto Coins
  • Data Science
  • Machine Learning

Recent Posts

  • Kraken completes latest Proof of Reserves, elevating the bar for crypto platform transparency
  • LangGraph Orchestrator Brokers: Streamlining AI Workflow Automation
  • Intel Xeon 6 CPUs make their title in AI, HPC • The Register
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy

© 2024 Newsaiworld.com. All rights reserved.

No Result
View All Result
  • Home
  • Artificial Intelligence
  • ChatGPT
  • Data Science
  • Machine Learning
  • Crypto Coins
  • Contact Us

© 2024 Newsaiworld.com. All rights reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?