Why CrewAI’s Supervisor-Employee Structure Fails — and Tips on how to Repair It

Generative AI, Discriminative Human | In direction of Knowledge Science

Introduction to Small Language Fashions: The Full Information for 2026

is without doubt one of the most promising functions of LLMs, and CrewAI has shortly grow to be a preferred framework for constructing agent groups. However considered one of its most necessary options—the hierarchical manager-worker course of—merely doesn’t perform as documented. In actual workflows, the supervisor doesn’t successfully coordinate brokers; as a substitute, CrewAI executes duties sequentially, resulting in incorrect reasoning, pointless software calls, and very excessive latency. This challenge has been highlighted in a number of on-line boards with no clear decision.

On this article, I reveal why CrewAI’s hierarchical course of fails, present the proof from precise Langfuse traces, and supply a reproducible pathway to make the manager-worker sample work reliably utilizing customized prompting.

Multi-agent Orchestration

Earlier than we get into the main points, allow us to perceive what orchestration means in an agentic context. In easy phrases, orchestration is managing and coordinating a number of inter-dependent duties in a workflow. However have’nt workflow administration instruments (eg; RPA) been obtainable eternally to just do that? So what modified with LLMs?

The reply is the power of LLMs to grasp that means and intent from pure language directions, simply as folks in a staff would. Whereas earlier workflow instruments had been rule-based and inflexible, with LLMs functioning as brokers, the expectation is that they are going to be capable to perceive the intent of the consumer’s question, use reasoning to create a multi-step plan, infer the instruments for use, derive their inputs within the right codecs, and synthesize all of the completely different intermediate leads to a exact response to the consumer’s question. And the orchestration frameworks are supposed to information the LLM with applicable prompts for planning, tool-calling, producing response and so on.

Among the many orchestration frameworks, CrewAI, with its pure language based mostly definition of duties, brokers and crews relies upon essentially the most on the LLM’s means to grasp language and handle workflows. Whereas not as deterministic as LangGraph (since LLM outputs can’t be totally deterministic), it abstracts away many of the complexity of routing, error dealing with and so on into easy, user-friendly constructs with parameters, which the consumer can tune for applicable conduct. So it’s a good framework for creating prototypes by product groups and even non-developers.

Besides that the manager-worker sample doesn’t work as meant…

As an example, let’s take a use-case to work with. And in addition consider the response based mostly on the next standards:

High quality of orchestration
High quality of ultimate response
Explainability
Latency and utilization price

Use Case

Take the case the place a staff of buyer assist brokers resolve technical or billing tickets. When a ticket comes, a triage agent categorizes the ticket, then assigns to the technical or billing assist specialist for decision. There’s a Buyer Help Supervisor who coordinates the staff, delegates duties and validates high quality of response.

Collectively they are going to be fixing queries resembling:

Why is my laptop computer overheating?
Why was I charged twice final month?
My laptop computer is overheating and in addition, I used to be charged twice final month?
My bill quantity is inaccurate after system glitch?

The primary question is solely technical, so solely the technical assist agent must be invoked by the supervisor, the second is Billing solely and the third and fourth ones require solutions from each technical and billing brokers.

Let’s construct this staff of CrewAI brokers and see how properly it really works.

Hierarchical Course of

In keeping with CrewAI documentation ,“adopting a hierarchical strategy permits for a transparent hierarchy in process administration, the place a ‘supervisor’ agent coordinates the workflow, delegates duties, and validates outcomes for streamlined and efficient execution. “ Additionally, the supervisor agent may be created in two methods, routinely by CrewAI or explicitly set by the consumer. Within the latter case, you’ve extra management over directions to the supervisor agent. We’ll attempt each methods for our use case.

CrewAI Code

Following is the code for the use case. I’ve used gpt-4o because the LLM and Langfuse for observability.

from crewai import Agent, Crew, Course of, Activity, LLM
from dotenv import load_dotenv
import os
from observe import * # Langfuse hint

load_dotenv()
verbose = False
max_iter = 4

API_VERSION = os.getenv(API_VERSION')
# Create your LLM
llm_a = LLM(
    mannequin="gpt-4o",
    api_version=API_VERSION,
    temperature = 0.2,
    max_tokens = 8000,
)

# Outline the supervisor agent
supervisor = Agent(
    function="Buyer Help Supervisor",
    aim="Oversee the assist staff to make sure well timed and efficient decision of buyer inquiries. Use the software to categorize the consumer question first, then resolve the subsequent steps.Syntesize responses from completely different brokers if wanted to offer a complete reply to the client.",
    backstory=( """
        You don't attempt to discover a solution to the consumer ticket {ticket} your self. 
        You delegate duties to coworkers based mostly on the next logic:
        Notice the class of the ticket first by utilizing the triage agent.
        If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Be sure that the ultimate response solutions each technical and billing points raised within the ticket based mostly on the responses from each Technical and Billing Help Specialists.
        ELSE
        If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
        Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
        ELSE
        If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
        Lastly, compile and current the ultimate response to the client based mostly on the outputs from the assigned brokers.
        """
    ),
    llm = llm_a,
    allow_delegation=True,
    verbose=verbose,
)

# Outline the triage agent
triage_agent = Agent(
    function="Question Triage Specialist",
    aim="Categorize the consumer question into technical or billing associated points. If a question requires each features, reply with 'Each'.",
    backstory=(
        "You're a seasoned skilled in analysing intent of consumer question. You reply exactly with one phrase: 'Technical', 'Billing' or 'Each'."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline the technical assist agent
technical_support_agent = Agent(
    function="Technical Help Specialist",
    aim="Resolve technical points reported by prospects promptly and successfully",
    backstory=(
        "You're a extremely expert technical assist specialist with a robust background in troubleshooting software program and {hardware} points. "
        "Your major accountability is to help prospects in resolving technical issues, guaranteeing their satisfaction and the sleek operation of their merchandise."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline the billing assist agent
billing_support_agent = Agent(
    function="Billing Help Specialist",
    aim="Handle buyer inquiries associated to billing, funds, and account administration",
    backstory=(
        "You're an skilled billing assist specialist with experience in dealing with buyer billing inquiries. "
        "Your predominant goal is to offer clear and correct info relating to billing processes, resolve cost points, and help with account administration to make sure buyer satisfaction."
    ),
    llm = llm_a,
    allow_delegation=False,
    verbose=verbose,
)

# Outline duties
categorize_tickets = Activity(
    description="Categorize the incoming buyer assist ticket: '{ticket} based mostly on its content material to find out whether it is technical or billing-related. If a question requires each features, reply with 'Each'.",
    expected_output="A categorized ticket labeled as 'Technical' or 'Billing' or 'Each'. Don't be verbose, simply reply with one phrase.",
    agent=triage_agent,
)

resolve_technical_issues = Activity(
    description="Resolve technical points described within the ticket: '{ticket}'",
    expected_output="Detailed options offered to every technical challenge.",
    agent=technical_support_agent,
)

resolve_billing_issues = Activity(
    description="Resolve billing points described within the ticket: '{ticket}'",
    expected_output="Complete responses to every billing-related inquiry.",
    agent=billing_support_agent,
)

# Instantiate your crew with a customized supervisor and hierarchical course of
crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a, # Uncomment for auto-created supervisor
    manager_agent=supervisor, # Remark for auto-created supervisor
    course of=Course of.hierarchical,
    verbose=verbose,
)

As is obvious, this system displays the staff of human brokers. Not solely is there a manger, triage agent, technical and billing assist agent, however the CrewAI objects resembling Agent, Activity and Crew are self-evident of their that means and straightforward to visualise. One other remark is that there’s little or no python code and many of the reasoning, planning and conduct is pure language based mostly which relies upon upon the power of the LLM to derive that means and intent from language, then cause and plan for the aim.

A CrewAI code subsequently, scores excessive on ease of growth. It’s a low-code method of making a circulation shortly with many of the heavy-lifting of the workflow being achieved by the orchestration framework reasonably than the developer.

How properly does it work?

As we’re testing the hierarchical course of, the method parameter is about to Course of.hierarchical within the Crew definition. We will attempt completely different options of CrewAI as follows and measure efficiency:

Supervisor agent auto-created by CrewAI
Utilizing our customized supervisor agent

1. Auto-created supervisor agent

Enter question: Why is my laptop computer overheating?

Right here is the Langfuse hint:

The important thing observations are as follows:

First the output is “Primarily based on the offered context, it appears there’s a misalignment between the character of the difficulty (laptop computer overheating) and its categorization as a billing concern. To make clear the connection, it might be necessary to find out if the client is requesting a refund for the laptop computer because of the overheating challenge, disputing a cost associated to the acquisition or restore of the laptop computer, or searching for compensation for restore prices incurred because of the overheating…” For a question that was clearly a technical challenge, it is a poor response.
Why does it occur? The left panel reveals that the execution first went to triage specialist, then to technical assist after which surprisingly, to billing assist specialist as properly. The next graphic depicts this as properly:

Trying carefully, we discover that the triage specialist appropriately recognized the ticket as “Technical” and the technical assist agent gave an excellent reply as follows:

However then, as a substitute of stopping and replying with the above because the response, the Crew Supervisor went to the Billing assist specialist and tried to discover a non-existent billing challenge within the purely technical consumer question.

This resulted within the Billing agent’s response overwriting the Technical agent’s response, with the Crew Supervisor doing a sub-optimal job of validating the standard of the ultimate response towards the consumer’s question.

Why did it occur?

As a result of within the Crew process definition, I specified the duties as categorize_tickets, resolve_technical_issues, resolve_billing_issues and though the method is meant to be hierarchical, the Crew Supervisor doesn’t carry out any orchestration, as a substitute merely executing all of the duties sequentially.

crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    manager_llm = llm_a,
    course of=Course of.hierarchical,
    verbose=verbose,
)

In the event you now ask a billing-related question, it’ll seem to offer an accurate reply just because the resolve_billing_issues is the final process within the sequence.

What a few question that requires each technical and billing assist, resembling “My laptop computer is overheating and in addition I used to be charged twice final month?” On this case additionally, the triage agent appropriately categorizes the ticket kind as “Each”, and the technical and billing brokers give right solutions to their particular person queries, however the supervisor is unable to mix all of the responses right into a coherent reply to consumer’s question. As an alternative, the ultimate response solely considers the billing response since that’s the final process to be referred to as in sequence.

Latency and Utilization: As may be seen from the above picture, the Crew execution took virtually 38 secs and spent 15759 tokens. The ultimate output is barely about 200 tokens. The remainder of the tokens had been spent in all of the pondering, agent calling, producing intermediate responses and so on – all to generate an unsatisfactory response on the finish. The efficiency may be categorised as “Poor”.

Analysis of this strategy

High quality of orchestration: Poor
High quality of ultimate output: Poor
Explainability: Poor
Latency and Utilization: Poor

However maybe, the above consequence is because of the truth that we relied on CrewAI’s built-in supervisor, which didn’t have our customized directions. Due to this fact, in our subsequent strategy we substitute the CrewAI automated supervisor with our customized Supervisor agent, which has detailed directions on what to do in case of Technical, Billing or Each tickets.

2. Utilizing Customized Supervisor Agent

Our Buyer Help Supervisor is outlined with the next very particular directions. Notice that this requires some experimentation to get it working, and a generic supervisor immediate resembling that talked about within the CrewAI documentation will give the identical faulty outcomes because the built-in supervisor agent above.

    function="Buyer Help Supervisor",
    aim="Oversee the assist staff to make sure well timed and efficient decision of buyer inquiries. Use the software to categorize the consumer question first, then resolve the subsequent steps.Syntesize responses from completely different brokers if wanted to offer a complete reply to the client.",
    backstory=( """
        You don't attempt to discover a solution to the consumer ticket {ticket} your self. 
        You delegate duties to coworkers based mostly on the next logic:
        Notice the class of the ticket first by utilizing the triage agent.
        If the ticket is categorized as 'Each', at all times assign it first to the Technical Help Specialist, then to the Billing Help Specialist, then print the ultimate mixed response. Be sure that the ultimate response solutions each technical and billing points raised within the ticket based mostly on the responses from each Technical and Billing Help Specialists.
        ELSE
        If the ticket is categorized as 'Technical', assign it to the Technical Help Specialist, else skip this step.
        Earlier than continuing additional, analyse the ticket class. Whether it is 'Technical', print the ultimate response. Terminate additional actions.
        ELSE
        If the ticket is categorized as 'Billing', assign it to the Billing Help Specialist.
        Lastly, compile and current the ultimate response to the client based mostly on the outputs from the assigned brokers.
        """

And within the Crew definition, we use the customized supervisor as a substitute of the built-in one:

crew_q = Crew(
    brokers=[triage_agent, technical_support_agent, billing_support_agent],
    duties=[categorize_tickets, resolve_technical_issues, resolve_billing_issues],
    # manager_llm = llm_a,
    manager_agent=supervisor,
    course of=Course of.hierarchical,
    verbose=verbose,
)

Let’s repeat the take a look at circumstances

Enter question: Why is my laptop computer overheating?

The hint is the next:

Graph of Why is my laptop computer overheating?

A very powerful remark is that now for this technical question, the circulation didn’t go to the Billing assist specialist agent. The supervisor appropriately adopted directions, categorized the question as technical and stopped execution as soon as the Technical Help Specialist had generated its response. From the response preview displayed, it’s evident that it’s a good response for the consumer question. Additionally, the latency is 24 secs and token utilization is 10k.

Enter question: Why was I charged twice final month?

The hint is as follows:

Response to ‘Why was I charged twice final month?’

Graph of Why was I charged twice final month?

As may be seen, the supervisor appropriately skipped executing the Technical Help Specialist, though that was earlier than the Billing agent within the Crew definition. As an alternative the response generated is of excellent high quality from the Billing Help Specialist solely. Latency is 16 secs and token utilization 7,700 solely

Enter question: My laptop computer is overheating and in addition, I used to be charged twice final month?

The hint reveals the Supervisor executed each Technical and Billing assist brokers and offered a mixed response.

The response preview within the determine above doesn’t present the complete response, which is as follows, and combines responses from each assist brokers. Latency is 38 secs and token utilization is 20k, which is commensurate with the a number of brokers orchestration and the detailed response generated.

Pricey Buyer,

Thanks for reaching out to us relating to the problems you're experiencing. We sincerely apologize for any inconvenience triggered. Under are the detailed options to handle your considerations:

**1. Laptop computer Overheating Difficulty:**
   - **Verify for Correct Air flow**: Guarantee your laptop computer is positioned on a tough, flat floor to permit correct airflow. Keep away from utilizing it on mushy surfaces like beds or couches that may block the vents. Think about using a laptop computer cooling pad or stand with built-in followers to enhance airflow.
   - **Clear the Laptop computer's Vents and Followers**: Mud and particles can accumulate within the vents and followers, limiting airflow. Energy off the laptop computer, unplug it, and use a can of compressed air to softly blow out mud from the vents. In case you are comfy, you possibly can clear the inner followers and elements extra completely, or take the laptop computer to an expert technician for inside cleansing.
   - **Monitor Operating Functions and Processes**: Open the Activity Supervisor (Home windows: Ctrl + Shift + Esc, macOS: Exercise Monitor) and test for processes consuming excessive CPU or GPU utilization. Shut pointless functions or processes to cut back the load on the system.
   - **Replace Drivers and Software program**: Replace your working system, drivers (particularly graphics drivers), and another important software program to the most recent variations.
   - **Verify for Malware or Viruses**: Run a full system scan utilizing a good antivirus program to detect and take away any malware.
   - **Alter Energy Settings**: Alter your energy settings to "Balanced" or "Energy Saver" mode (Home windows: Management Panel > Energy Choices, macOS: System Preferences > Power Saver).
   - **Examine the Laptop computer's {Hardware}**: If the laptop computer remains to be overheating, there could also be a difficulty with the {hardware}, resembling a failing fan or thermal paste that wants alternative. Seek the advice of an expert technician to examine and substitute the thermal paste or defective {hardware} elements if vital.
   - **Environmental Components**: Function the laptop computer in a cool, well-ventilated atmosphere. Keep away from utilizing the laptop computer in direct daylight or close to warmth sources.
   - **Think about Upgrading Elements**: If the laptop computer is older, contemplate upgrading elements resembling RAM or switching to an SSD to cut back the pressure on the system and assist with warmth administration.
   - **Monitor Temperature Ranges**: Set up a temperature monitoring software (e.g., HWMonitor, Core Temp, or Macs Fan Management) to maintain monitor of the CPU and GPU temperatures. This will help establish if the laptop computer is constantly operating at excessive temperatures and supply insights into when the overheating happens.

If the issue persists after making an attempt all of the above options, please contact the laptop computer producer’s assist staff or an authorized restore technician for additional help.

**2. Duplicate Billing Difficulty:**
   - **Verification of Account Particulars**: I reviewed your account info to make sure accuracy and confirmed that the account is lively and matches the reported challenge.  
   - **Billing Information Verify**: Upon inspecting the billing data for final month, I recognized two an identical expenses processed on the identical day. This confirms the duplicate billing challenge.  
   - **Fee Technique Particulars**: I verified the cost technique used for the transactions and confirmed that each expenses had been processed utilizing the identical cost technique.  
   - **Decision**:  
       - Because the duplicate cost was confirmed, I've initiated a refund for the second transaction. The refund will likely be processed again to the unique cost technique inside 5-7 enterprise days.  
       - Moreover, I've utilized a credit score to your account as a gesture of goodwill for the inconvenience triggered.  
   - **Buyer Communication**: You'll obtain an electronic mail or telephone name from us to tell you of the decision, together with the refund timeline and the utilized credit score.  

We sincerely apologize for the inconvenience brought on by these points. Please tell us in case you want additional help or have any extra questions. We're right here to assist

And our closing, extra nuanced question is as follows:

Enter question: My bill quantity is inaccurate after system glitch?

The hint reveals that the question was appropriately categorised as ‘Each’, requiring responses from Technical and Billing brokers and a mixed response was offered. Latency is 24 secs and tokens used are 12k.

The complete response is included right here:

Thanks for bringing this challenge to our consideration. Primarily based on the technical investigation, listed below are the steps we're taking to resolve the system glitch that triggered the inaccurate bill quantity:

1. We've got verified the main points of the inaccurate bill quantity and recognized the difficulty within the system.
2. Our technical staff has analyzed the system logs and reviewed the bill calculation logic to pinpoint the foundation reason behind the error.
3. The required fixes have been applied to handle the glitch, and the system has been completely examined to make sure the difficulty is resolved.
4. A corrected bill has been generated, and we've got verified its accuracy.

Moreover, right here’s the entire decision for the billing challenge associated to the inaccurate bill quantity brought on by the system glitch:

1. **Corrected Bill Verification**: The technical staff has resolved the system glitch and generated a corrected bill. The corrected bill has been personally verified to make sure its accuracy, together with cross-checking the quantities, dates, and any relevant expenses or credit.

2. **Buyer Communication**: We'll promptly notify you in regards to the decision of the difficulty. This communication consists of:
   - An apology for the inconvenience brought on by the inaccurate bill.
   - Affirmation that the system glitch has been resolved.
   - Assurance that the corrected bill has been completely reviewed for accuracy.
   - A duplicate of the corrected bill on your data.

3. **Further Steps Taken**: To forestall related points sooner or later, the technical staff has applied measures to make sure system stability and accuracy in bill technology.

4. **Account Adjustment (if relevant)**: If the inaccurate bill resulted in any overpayment or underpayment, the mandatory changes will likely be made to your account. This consists of issuing a refund for any overpayment or offering clear directions for settling any excellent stability.

5. **Comply with-Up**: We're right here to help you with any additional questions or considerations relating to your account or billing. Please don't hesitate to succeed in out to us, and we will likely be completely satisfied to assist. To your comfort, we've got offered direct contact info for additional communication.

We sincerely apologize for any inconvenience this will likely have triggered and guarantee you that we're taking steps to stop related points sooner or later. Thanks on your understanding and endurance.

Analysis of this strategy

High quality of orchestration: Good
High quality of ultimate output: Good
Explainability: Good (we perceive why it did what it did)
Latency and Utilization: Honest (commensurate with the complexity of the output)

Takeaway

In abstract, the hierarchical Supervisor–Employee sample in CrewAI doesn’t perform as documented. The core orchestration logic is weak; as a substitute of permitting the supervisor to selectively delegate duties, CrewAI executes all duties sequentially, inflicting incorrect agent invocation, overwritten outputs, and inflated latency/token utilization. Why it failed comes all the way down to the framework’s inside routing—hierarchical mode doesn’t implement conditional branching or true delegation, so the ultimate response is successfully decided by whichever process occurs to run final. The repair is introducing a customized supervisor agent with express, step-wise directions: it makes use of the triage consequence, conditionally calls solely the required brokers, synthesizes their outputs, and terminates execution on the proper level—restoring right routing, bettering output high quality, and considerably optimising token prices.

Conclusion

CrewAI, within the spirit of conserving the LLM on the heart of orchestration, relies upon upon it for many of the heavy-lifting of orchestration, utilising consumer prompts mixed with detailed scaffolding prompts embedded within the framework. In contrast to LangGraph and AutoGen, this strategy sacrifices determinism for developer-friendliness. And generally leads to surprising conduct for important options such because the manager-worker sample, essential for a lot of real-life use circumstances. This text makes an attempt to reveal a pathway for attaining the specified orchestration for this sample utilizing cautious prompting. In future articles, I intend to discover extra options for CrewAI, LangGraph and others for his or her applicability in sensible use circumstances.

You should use CrewAI to design an interactive conversational assistant on a doc retailer and additional make the responses actually multimodal. Refer my articles on GraphRAG Design and Multimodal RAG.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

_{All photographs on this article drawn by me or generated utilizing Copilot or Langfuse. Code shared is written by me.}