Introduction
have all the time walked side-by-side, holding arms.
I keep in mind listening to “Be taught Statistics to know what’s behind the algorithms” once I began learning Knowledge Science. Whereas all of that was fascinating to me, it was additionally actually overwhelming.
The actual fact is that there are too many statistical ideas, exams, and distributions to maintain monitor of. If you happen to don’t know what I’m speaking about, simply go to the Scipy.stats web page, and you’ll perceive.
In case you are sufficiently old within the Knowledge Science subject, you most likely bookmarked (and even printed) a kind of statistical check cheatsheets. They had been fashionable for some time. However now, the Giant Language Fashions have gotten type of a “second mind” for us, serving to us to shortly seek the advice of just about any info we li,ke with the additional advantage of getting it summarized and tailored to our wants.
With that in thoughts, my pondering was that selecting the best statistical check may be complicated as a result of it is dependent upon variable sorts, assumptions, and so on.
So, I assumed I may get an assistant to assist with that. Then, my mission took kind.
- I used LangGraph to construct a multi-step agent
- The front-end was constructed with Streamlit
- The Agent can shortly seek the advice of Scipy Stats documentation and retrieve the precise code for each particular scenario.
- Then, it provides us a pattern Python code
- It’s deployed in Streamlit Apps, in case you need to strive it.
- App Hyperlink: https://ai-statistical-advisor.streamlit.app/
Superb!
Let’s dive in and discover ways to construct this agent.
LangGraph
LangGraph is a library that helps construct advanced, multi-step purposes with massive language fashions (LLMs) by representing them as a graph. This graph structure permits the builders to create situations, loops, which make it helpful for creating refined brokers and chatbots that may determine what to do subsequent primarily based on the outcomes of a earlier step
It primarily turns a inflexible sequence of actions into a versatile, dynamic decision-making course of. In LangGraph, every node is a operate or device.
Subsequent, let’s be taught extra in regards to the agent we’re going to create on this submit.
Statistical Advisor Agent
This agent is a Statistical Advisor. So, the primary concept is that:
- The bot receives a statistics-related query, akin to “The right way to examine the technique of two teams“.
- It checks the query and determines if it must seek the advice of Scipy’s documentation or simply give a direct reply.
- If wanted, the agent makes use of a RAG device on embedded SciPy documentation
- Returns a solution.
- If relevant, it returns a pattern Python code on how one can carry out the statistical check.
Let’s shortly have a look at the Graph generated by LangGraph to point out this agent.

Nice. Now, let’s minimize to the chase and begin coding!
Code
To make issues simpler, I’ll break the event down into modules. First, let’s set up the packages we’ll want.
pip set up chromadb langchain-chroma langchain-community langchain-openai
langchain langgraph openai streamlit
Chunk and Embed
Subsequent, we’ll create the script to take our documentation and create chunks of textual content, in addition to embed these chunks. We do this to make it simpler for vector databases like ChromaDB to look and retrieve info.
So, I created this operate embed_docs() that you would be able to see within the GitHub repository linked right here.
- The operate takes Scipy’s documentation (which is open supply beneath BSD license)
- Splits it into chinks of 500 tokens and overlap of fifty tokens.
- Makes the embedding (remodel textual content into numerical values for optimized vector db search) utilizing
OpenAIEmbedding - Saves the embeddings in an occasion of
ChromaDB
Now the info is prepared as a data base for a Retrieval-Augmented Era (RAG). Nevertheless it wants a retriever that may search and discover the info. That’s what the retriever does.
Retriever
The get_doc_answer() operate will:
- Load the ChromaDB occasion beforehand created.
- Create an occasion of
OpenAI GPT 4o - Create a
retrieverobject - Glue all the things collectively in a
retrieval_chainthat will get a query from the person, sends it to the LLM - The mannequin makes use of the
retrieverto entry the ChromaDB occasion, get related information about statistical exams, and return the reply to the person.
Now we’ve the RAG accomplished with the paperwork embedded and the retriever prepared. Let’s transfer on to the Agent nodes.
Agent Nodes
LangGraph has this attention-grabbing structure that considers every node as a operate. Subsequently, now we should create the features to deal with every a part of the agent.
We’ll comply with the movement and begin with the classify_intent node. Since some nodes must work together with an LLM, we have to generate a consumer.
from rag.retriever import get_doc_answer
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
# Occasion of OpenAI
consumer = OpenAI()
As soon as we begin the agent, it should obtain a question from the person. So, this node will examine the query and determine if the subsequent node shall be a easy response or if it wants to look Scipy’s documentation.
def classify_intent(state):
"""Test if the person query wants a doc search or may be answered immediately."""
query = state["question"]
response = consumer.chat.completions.create(
mannequin="gpt-4o",
messages=[
{"role": "system", "content": "You are an assistant that decides if a question about statistical tests needs document lookup or not. If it is about definitions or choosing the right test, return 'search'. Otherwise return 'simple'."},
{"role": "user", "content": f"Question: {question}"}
]
)
resolution = response.selections[0].message.content material.strip().decrease()
return {"intent": resolution} # "search" or "easy"
If a query about statistical ideas or exams is requested, then the retrieve_info() node is activated. It performs the RAG within the documentation.
def retrieve_info(state):
"""Use the RAG device to reply from embedded docs."""
query = state["question"]
reply = get_doc_answer(query=query)
return {"rag_answer": reply}
As soon as the right chunk of textual content is retrieved from ChromaDB, the agent goes to the subsequent node to generate a solution.
def reply(state):
"""Construct the ultimate reply."""
if state.get("rag_answer"):
return {"final_answer": state["rag_answer"]}
else:
return {"final_answer": "I am undecided how one can assist with that but."}
Lastly, the final node is to generate a code, if that’s relevant. That means, if there may be a solution the place the check may be performed utilizing Scipy, there shall be a pattern code.
def generate_code(state):
"""Generate Python code to carry out the really useful statistical check."""
query = state["question"]
suggested_test = state.get("rag_answer") or "a statistical check"
immediate = f"""
You're a Python tutor.
Primarily based on the next person query, generate a brief Python code snippet utilizing scipy.stats that performs the suitable statistical check.
Consumer query:
{query}
Reply given:
{suggested_test}
Solely output code. Do not embody explanations.
"""
response = consumer.chat.completions.create(
mannequin="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return {"code_snippet": response.selections[0].message.content material.strip()}
Discover one thing essential right here: all features in our nodes all the time have state as an argument as a result of the state is the only supply of reality for your entire workflow. Every operate, or “node,” within the graph reads from and writes to this central state object.
For instance:
- The
classify_intentoperate reads the query from the state and provides an intent key. - The
retrieve_infooperate can learn the identical query and add a rag_answer, which the reply operate lastly reads to assemble the final_answer. This shared state dictionary is how the totally different steps within the agent’s reasoning and action-taking course of keep related.
Subsequent, let’s put all the things collectively and construct our graph!
Constructing the Graph
The graph is the agent itself. So, what we’re doing right here is mainly telling LangGraph what the nodes are that we’ve and the way they join to one another, so the framework could make the data run in line with that movement.
Let’s import the modules.
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
from langgraph_agent.nodes import classify_intent, retrieve_info, reply, generate_code
Outline our state schema. Keep in mind that dictionary that the agent makes use of to attach the steps of the method? That’s it.
# Outline the state schema (only a dictionary for now)
class TypedDictState(TypedDict):
query: str
intent: str
rag_answer: str
code_snippet: str
final_answer: str
Right here, we’ll create a operate that builds the graph.
- To inform LangGraph what the steps (features) within the course of are, we use
add_node - As soon as we’ve listed all of the features, we begin creating the perimeters, that are the connections between the nodes.
- We begin the method with
set_entry_point. That is the primary operate for use. - We use
add_edgeto attach one node to a different, utilizing the primary argument because the operate from which the data comes, and the second argument is the place it goes. - If we’ve a situation to comply with, we use
add_conditional_edges - We use
ENDto complete the graph andcompileto construct it.
def build_graph():
# Construct the LangGraph movement
builder = StateGraph(TypedDictState)
# Add nodes
builder.add_node("classify_intent", classify_intent)
builder.add_node("retrieve_info", retrieve_info)
builder.add_node("reply", reply)
builder.add_node("generate_code", generate_code)
# Outline movement
builder.set_entry_point("classify_intent")
builder.add_conditional_edges(
"classify_intent",
lambda state: state["intent"],
{
"search": "retrieve_info",
"easy": "reply"
}
)
builder.add_edge("retrieve_info", "reply")
builder.add_edge("reply", "generate_code")
builder.add_edge("generate_code", END)
return builder.compile()
With our graph builder operate prepared, all we’ve to do now’s create a fantastic front-end the place we are able to work together with this agent.
Let’s do this now.
Streamlit Entrance-Finish
The front-end is the ultimate piece of the puzzle, the place we create a Consumer Interface that enables us to simply enter a query in a correct textual content field and see the reply correctly formatted.
I selected Streamlit as a result of it is vitally straightforward to prototype and deploy. Let’s start with the imports.
import os
import time
import streamlit as st
Then, we configure the web page’s look.
# Config web page
st.set_page_config(page_title="Stats Advisor Agent",
page_icon='🤖',
format="large",
initial_sidebar_state="expanded")
Create a sidebar, the place the person can enter their OpenAI API key, together with a “Clear” session button.
# Add a spot to enter the API key
with st.sidebar:
api_key = st.text_input("OPENAI_API_KEY", sort="password")
# Save the API key to the surroundings variable
if api_key:
os.environ["OPENAI_API_KEY"] = api_key
# Clear
if st.button('Clear'):
st.rerun()
Subsequent, we arrange the web page title and directions and add a textual content field for the person to enter a query.
# Title and Directions
if not api_key:
st.warning("Please enter your OpenAI API key within the sidebar.")
st.title('Statistical Advisor Agent | 🤖')
st.caption('This AI Agent is educated to reply questions on statistical exams from the [Scipy](https://docs.scipy.org/doc/scipy/reference/stats.html) bundle.')
st.caption('Ask questions like: "What's the finest statistical check to check two means".')
st.divider()
# Consumer query
query = st.text_input(label="Ask me one thing:",
placeholder= "e.g. What's the finest check to check 3 teams means?")
Lastly, we are able to run the graph builder and show the reply on display.
# Run the graph
if st.button('Search'):
# Progress bar
progress_bar = st.progress(0)
with st.spinner("Pondering..", show_time=True):
from langgraph_agent.graph import build_graph
progress_bar.progress(10)
# Construct the graph
graph = build_graph()
consequence = graph.invoke({"query": query})
# Progress bar
progress_bar.progress(50)
# Print the consequence
st.subheader("📖 Reply:")
# Progress bar
progress_bar.progress(100)
st.write(consequence["final_answer"])
if "code_snippet" in consequence:
st.subheader("💻 Steered Python Code:")
st.write(consequence["code_snippet"])
Let’s see the consequence now.

Wow, the result’s spectacular!
- I requested: What’s the finest check to check two teams means?
- Reply: To check the technique of two teams, probably the most applicable check is usually the impartial two-sample t-test if the teams are impartial and the info is often distributed. If the info will not be usually distributed, a non-parametric check just like the Mann-Whitney U check may be extra appropriate. If the teams are paired or associated, a paired pattern t-test can be applicable.
Mission completed for what we proposed to create.
Strive It Your self
Do you need to give this Agent a Strive?
Go forward and check the deployed model now!
https://ai-statistical-advisor.streamlit.app
Earlier than You Go
It is a lengthy submit, I do know. However I hope it was price to learn it to the tip. We realized rather a lot about LangGraph. It makes us suppose differently about creating AI brokers.
The framework forces us to consider each step of the data, from the second a query is prompted to the LLM till the reply that shall be displayed. Questions like these begin to pop in your thoughts throughout the growth course of:
- What occurs after the person asks the query?
- Does the agent must confirm one thing earlier than shifting on?
- Are there situations to think about throughout the interplay?
This structure turns into a bonus as a result of it makes the entire course of cleaner and scalable, since including a brand new characteristic may be so simple as including a brand new operate (node).
However, LangGraph will not be as user-friendly as frameworks like Agno or CrewAI, which encapsulate many of those abstractions in easier strategies, making the method a lot simpler to be taught and develop, but additionally much less versatile.
In the long run, it’s all a matter of what downside is being solved and the way versatile you want it to be.
GitHub Repository
https://github.com/gurezende/AI-Statistical-Advisor
About Me
If you happen to preferred this content material and need to be taught extra about my work, right here is my web site, the place you too can discover all my contacts.
[1. LangGraph Docs] https://langchain-ai.github.io/langgraph/ideas/why-langgraph/
[2. Scipy Stats] https://docs.scipy.org/doc/scipy/reference/stats.html
[3. Streamlit Docs] https://docs.streamlit.io/
[4. Statistical Advisor App] https://ai-statistical-advisor.streamlit.app/
















