On this article, you’ll learn to rework a fundamental tool-calling script right into a resilient agent that gracefully handles failures from misbehaving instruments, malformed mannequin outputs, and unavailable companies.
Matters we’ll cowl embrace:
- Easy methods to construction an iterative agent loop with a security cap on iteration depend.
- The 4 distinct classes of failure an agent encounters when calling instruments, and easy methods to deal with every one.
- Easy methods to design instrument error messages that train the mannequin easy methods to get well, decreasing wasted iterations.
Constructing a Multi-Device Gemma 4 Agent with Error Restoration
Introduction
In a earlier article, we wired up Gemma 4 to a handful of Python capabilities utilizing Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the mannequin picks a instrument, our code runs it, the mannequin solutions. It’s a helpful start line, but it surely’s a great distance from an agent.
One of many issues that turns a tool-calling demo into an precise agent is the way it handles issues going incorrect. Instruments fail. The mannequin hallucinates a operate identify, or passes a string the place you needed a quantity, or asks a couple of metropolis your lookup desk has by no means heard of. An upstream API instances out. A required argument is lacking. Within the earlier tutorial, any of those would both crash the script or get swallowed by a attempt/besides that prints a message and offers up. That’s nice for a single path demo. It’s not nice for something you’d wish to go away working.
This text rebuilds the agent across the assumption that issues will go incorrect, and reveals easy methods to get well gracefully after they do. The sample is straightforward: catch errors on the boundary, convert them into messages the mannequin can learn, ship them again to the mannequin, and let the mannequin determine whether or not to retry, route round the issue, or clarify the failure to the consumer. We’ll additionally wrap every part in a correct iterative agent loop with a security cap on iteration depend.
The full script could be discovered right here. This text walks by way of the components that matter.
Rethinking the Device Loop
The unique dispatcher ran a single spherical: ship the consumer question, accumulate instrument calls, run them, ship the outcomes again, print the mannequin’s reply. That’s a one-shot interplay. It really works nice when the mannequin’s first response appropriately solutions the consumer’s query, but it surely has nowhere to go when one thing goes incorrect. If a instrument fails, the mannequin will get one probability to react after which we’re performed. If the mannequin desires to name one other instrument after seeing the primary consequence, too unhealthy; we already exited.
A correct agent loop is iterative. The construction is easy:
- Ship the present message historical past to the mannequin.
- If the mannequin produces instrument calls, execute every one, append each consequence to the historical past, and loop once more.
- If the mannequin produces a plain textual content response, that’s the ultimate reply. Return.
- Cap the loop at
MAX_ITERATIONSso a confused mannequin can’t burn by way of your CPU endlessly.
That final level is non-negotiable. Small fashions often get caught calling the identical instrument repeatedly, or oscillating between two instruments, and there’s nothing extra demoralizing than strolling again to your terminal to seek out your laptop computer’s followers screaming as a result of Gemma determined to search for the climate in London thirty instances in a row.
Right here’s the loop:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def run_agent(user_query): messages = [{“role”: “user”, “content”: user_query}]
for iteration in vary(1, MAX_ITERATIONS + 1): payload = { “mannequin”: MODEL_NAME, “messages”: messages, “instruments”: available_tools, “stream”: False, }
print(f“[EXECUTION — iteration {iteration}]”) print(” ● Querying mannequin…n”)
attempt: response_data = call_ollama(payload) besides Exception as e: print(f” └─ [ERROR] Error calling Ollama API: {e}”) print(f” └─ Make certain Ollama is working and {MODEL_NAME} is pulled.”) return
message = response_data.get(“message”, {}) tool_calls = message.get(“tool_calls”) or []
# Department A: the mannequin desires to make use of instruments if tool_calls: print(f“[TOOL EXECUTION — {len(tool_calls)} call(s)]”) messages.append(message) tool_messages = print_tool_calls(tool_calls) messages.lengthen(tool_messages) print() proceed
# Department B: the mannequin produced a remaining reply print(“[RESPONSE]”) print(message.get(“content material”, “”) + “n”) return
# Security rail: we exhausted MAX_ITERATIONS with out a remaining reply print(“[RESPONSE]”) print( f“Hit the {MAX_ITERATIONS}-iteration cap with out a remaining reply. “ “This normally means the mannequin is caught in a tool-calling loop. “ “Strive simplifying the question.n” ) |
The sample is value committing to reminiscence as a result of it reveals up in each agent framework you’ll ever learn: the message historical past is the state. For every iteration we ship your entire dialog (the unique consumer question, the mannequin’s tool-call request, our instrument outcomes, any follow-up mannequin messages) again to the mannequin. The mannequin is stateless; the listing is the agent’s reminiscence.
This iterative construction can also be what makes error restoration potential. When a instrument fails and we ship the error again as a instrument message, the mannequin will get to see that error and react to it on the following iteration. With out the loop, there’s nothing to react into.
Constructing the Device Registry
Right here we construct our 4 instruments, all deterministic, all offline. No API keys, no community calls, no flaky exterior companies to debug. The purpose of this text is the error-handling structure, not the instruments themselves, so we wish the instruments to behave predictably so we are able to deal with the framework round them, and so we are able to intentionally set off each failure mode at will.
The instruments are:
get_weather(metropolis): appears to be like up a metropolis in a small dict of canned climate informationget_local_time(metropolis): computes the true present time in that metropolis’s timezone utilizingzoneinfoconvert_currency(quantity, from_currency, to_currency): does the maths in opposition to a hardcoded USD-anchored price deskget_city_population(metropolis): one other lookup in opposition to a small dict
The static information lives on the prime of the file:
|
CITY_DATA = { “london”: {“timezone”: “Europe/London”, “inhabitants”: 8_982_000}, “tokyo”: {“timezone”: “Asia/Tokyo”, “inhabitants”: 13_960_000}, “sao paulo”: {“timezone”: “America/Sao_Paulo”, “inhabitants”: 12_330_000}, “paris”: {“timezone”: “Europe/Paris”, “inhabitants”: 2_161_000}, “big apple”: {“timezone”: “America/New_York”, “inhabitants”: 8_336_000}, “sydney”: {“timezone”: “Australia/Sydney”, “inhabitants”: 5_312_000}, “mumbai”: {“timezone”: “Asia/Kolkata”, “inhabitants”: 20_410_000}, }
EXCHANGE_RATES = { “USD”: 1.00, “EUR”: 0.92, “GBP”: 0.79, “JPY”: 156.40, “BRL”: 5.12, “CAD”: 1.37, “AUD”: 1.51, “INR”: 83.20, } |
The capabilities are intentionally easy, however they increase on unhealthy enter reasonably than returning error strings. Right here’s get_weather:
|
def get_weather(metropolis: str) -> str: “”“Returns present climate situations for a recognized metropolis.”“” key = metropolis.decrease().strip() if key not in WEATHER_DATA: increase ValueError( f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be part of(sorted(WEATHER_DATA.keys()))}.” ) information = WEATHER_DATA[key] return f“The climate in {metropolis.title()} is {information[‘conditions’]} with a temperature of {information[‘temp_c’]}°C.” |
Two issues to name out about that error message. First, it’s particular: it tells the caller what went incorrect and what the legitimate choices are. Second, the instrument increases a ValueError reasonably than returning the error as a string. Don’t catch and string-format errors contained in the instrument; as an alternative, allow them to propagate. We wish the dispatcher to deal with each form of failure in a single place, and we wish the message the mannequin sees on a nasty enter to be informative sufficient that the mannequin can right itself.
get_local_time does the one actual work — precise timezone-aware datetime arithmetic — and that’s additionally the instrument we’ll later use to show sleek degradation in opposition to a simulated upstream failure:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def get_local_time(metropolis: str) -> str: “”“Returns the present native time for a metropolis, with a cached fallback.”“” key = metropolis.decrease().strip()
# Simulate an upstream geocoding service that will fail unpredictably if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Observe: geocoding service is at present unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ shouldn’t be within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” )
if key not in CITY_DATA: increase ValueError(f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be part of(sorted(CITY_DATA.keys()))}.”) tz_name = CITY_DATA[key][“timezone”] now = datetime.datetime.now(ZoneInfo(tz_name)) return f“The present native time in {metropolis.title()} is {now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}).” That <code>SIMULATE_GEOCODING_OUTAGE</code> flag lets us reproduce a actual–world failure mode with out needing actual infrastructure to fail. We‘ll come again to it.
The instrument schemas are unchanged from the earlier tutorial’s</a> fashion: normal Ollama operate–calling format, with clear descriptions of what every instrument does and what arguments it expects.
<h2>The 4 Error Restoration Patterns</h2> Time to get severe. There are 4 distinct failure modes you‘ll encounter when an agent talks to instruments, and every one wants its personal technique. They’re dealt with in a single dispatcher operate, however it‘s value understanding them as separate ideas.
Sample 1: Device Execution ErrorsThe primary protection is the dispatcher itself. It wraps each instrument name in a structured
def dispatch_tool_call(tool_call): function_name = tool_call[“function”][“name”] arguments = tool_call[“function”][“arguments”] or {}
# Protection 1: validate the instrument identify in opposition to the registry if function_name not in TOOL_FUNCTIONS: return “error”, ( f”Unknown instrument ‘{function_name}‘. “ f”Legitimate instruments are: {‘, ‘.be part of(TOOL_FUNCTIONS.keys())}.“ )
func = TOOL_FUNCTIONS[function_name]
# Protection 2: catch argument errors (incorrect varieties, lacking or further args) attempt: consequence = func(**arguments) return “okay“, str(consequence) besides TypeError as e: return “error“, f”Dangerous arguments for {function_name}: {e}“ besides ValueError as e: return “error“, str(e) besides ToolUnavailableError as e: return “error“, f”Device quickly unavailable: {e}“ besides Exception as e: return “error“, f”Sudden error in {function_name}: {kind(e).__name__}: {e}“ |
The important thing perception: return the error to the mannequin as a instrument consequence as an alternative of elevating it again to the agent loop. The mannequin can learn the error, see that it requested for “Atlantis” and Atlantis isn’t a recognized metropolis, and pivot to a unique metropolis, or apologize to the consumer. In the event you increase as an alternative, you’ve stripped the mannequin of the power to get well.
Discover the 4 totally different exception varieties and the catch-all on the backside. Each corresponds to an actual class of failure: area errors (ValueError), signature mismatches (TypeError), infrastructure outages (ToolUnavailableError), and the Don Rumsfeld unknown unknowns (Exception). Separating them provides you cleaner error messages, which give the mannequin higher indicators for restoration.
The catch-all is vital and maybe controversial. Some fashion guides will let you know by no means to catch a naked Exception. In an agent dispatcher, the choice — letting an surprising exception kill the loop — is worse. The mannequin loses the possibility to get well, the consumer loses the response, and also you lose the dialog historical past you could possibly have used to debug what occurred. Higher to catch, log, and hand the message to the mannequin.
Sample 2: Malformed Device Calls From the Mannequin
The mannequin often hallucinates a instrument identify that doesn’t exist, or sends arguments beneath the incorrect keys (city as an alternative of metropolis, for instance). The primary protection within the snippet above handles the primary case: earlier than we even attempt to dispatch, we verify the identify in opposition to the registry and return a corrective message itemizing the legitimate names.
The incorrect-argument case is dealt with by the second protection. Python’s **arguments unpacking raises TypeError if the mannequin sends a key phrase the operate doesn’t settle for, or omits a required one. We catch the TypeError, format it cleanly, and the mannequin will get a helpful error on the following iteration:
|
[ERROR]: Dangerous arguments for get_weather: get_weather() received an surprising key phrase argument ‘city’ |
That message comprises every part the mannequin must right itself: the instrument identify, the offending argument, and an implicit sign that the suitable identify is one thing else. In apply the mannequin normally fixes the decision on its subsequent flip.
There’s additionally a extra delicate argument-related failure: kind drift. The mannequin is aware of quantity needs to be a quantity, however in longer conversations it often begins sending "100" as a string. Letting convert_currency increase on that will pressure an additional flip for the mannequin to right itself. A greater method is defensive coercion within the instrument itself:
|
def convert_currency(quantity: float, from_currency: str, to_currency: str) -> str: # Defensive kind coercion: the mannequin typically sends numbers as strings attempt: quantity = float(quantity) besides (TypeError, ValueError): increase ValueError(f“‘quantity’ have to be a quantity, received: {quantity!r}”) # … remainder of the operate |
This silently fixes the widespread case ("100" turns into 100.0) whereas nonetheless elevating a clear error for the genuinely damaged case ("fifty"). The precept: be liberal in what you settle for from the mannequin, and strict in what you complain about.
Sample 3: Area-Degree Errors
These are the errors the instrument itself raises when the inputs are well-formed however the request can’t be glad, similar to asking for the climate in Atlantis, or changing from a foreign money that isn’t within the price desk. These ought to produce error messages that train the mannequin easy methods to get well, not simply say “failed.”
Evaluate these two error messages:
|
Good: “Unknown metropolis: ‘Atlantis’. Identified cities: london, mumbai, big apple, paris, sao paulo, sydney, tokyo.” |
The nice model provides the mannequin every part it must both retry with a sound enter or clarify the limitation to the consumer. The unhealthy model forces the mannequin to guess. Each error message within the instrument capabilities follows this sample: say what went incorrect, and the place potential, listing the legitimate alternate options.
This isn’t only a UX nicety. It straight impacts what number of iterations the agent loop will burn earlier than attending to an excellent reply. A imprecise error can price you a full further spherical journey whereas the mannequin gropes for a repair. A particular error normally will get corrected on the very subsequent flip or, when the enter is genuinely unrecoverable, lets the mannequin produce a clear rationalization with out making an attempt once more in any respect.
Sample 4: Swish Degradation for Unavailable Instruments
The final sample is for the state of affairs the place a instrument isn’t damaged, simply gone — a geocoding service is down, an API quota is exhausted, a database is having a nasty day. You may have three choices right here, roughly so as of how a lot you belief the mannequin to deal with the state of affairs:
- Return a cached or default worth and flag it within the consequence. Greatest when the instrument’s freshness isn’t vital.
- Skip the instrument totally and return a transparent message about what couldn’t be offered. Let the mannequin determine whether or not to retry or work round it.
- Floor the outage to the consumer by having the agent cease and ask for steering.
get_local_time demonstrates choice 1. When SIMULATE_GEOCODING_OUTAGE is on and the random verify journeys, the instrument first tries the native cache:
|
if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Observe: geocoding service is at present unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ shouldn’t be within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” ) |
If town is within the cache, the instrument returns a profitable consequence tagged with [cached] and a notice explaining that the reside service is unavailable. The mannequin sees a wonderfully usable reply and a small caveat it might probably select to say to the consumer. If town isn’t within the cache, the instrument falls by way of to choice 2: it raises ToolUnavailableError with a message itemizing what is cached.
That ToolUnavailableError is deliberately a separate exception kind reasonably than a ValueError. The dispatcher provides it its personal catch arm with a definite error prefix (“Device quickly unavailable”) so the mannequin can inform the distinction between “you requested for one thing I don’t have” and “the service is down proper now.” These two failures have very totally different acceptable responses — retry later versus choose a unique enter — and giving the mannequin a transparent sign helps it choose the suitable one.
In manufacturing, you’d lengthen this sample with a retry-with-backoff coverage earlier than falling by way of to the fallback. The construction stays the identical: the dispatcher distinguishes recoverable from unrecoverable failures, and the mannequin is instructed sufficient about every one to make a wise subsequent transfer.
Placing It All Collectively
Time to truly run the factor. Right here’s a question that workouts every part — a number of cities, a number of instruments, and an intentional unhealthy enter to set off error restoration in flight:
|
python most important.py “What is the climate in London, Tokyo, and Atlantis proper now? And convert 50 GBP to JPY.” |
The precise iteration depend and tool-call ordering will differ from run to run relying on how Gemma decides to sequence the work, however right here’s a consultant hint, barely trimmed:

Have a look at what occurred in iteration 3. The mannequin requested about Atlantis, the instrument raised ValueError, the dispatcher transformed it into an error message itemizing the legitimate cities, and the mannequin — on iteration 5 — folded that info right into a clear response. It didn’t retry Atlantis. It didn’t crash. It seen the failure, built-in it with the profitable outcomes, and produced a solution that acknowledged the limitation. That’s your entire payoff of the error-recovery structure in a single hint.
To see sleek degradation in motion, flip SIMULATE_GEOCODING_OUTAGE to True and run a question that asks for native time:
|
python most important.py “What is the native time in London and Paris?” |
About 60% of the time you’ll see the [cached] prefix within the instrument consequence and the mannequin will point out the cached supply in its remaining response. The remainder of the time the instrument will return efficiently and the cached path gained’t set off. Both means, the loop completes and the consumer will get a solution.
Conclusion
We constructed three issues on prime of the inspiration from the primary tutorial: an iterative agent loop with a tough iteration cap, a layered dispatcher that catches each class of instrument failure, and gear capabilities whose error messages train the mannequin easy methods to get well. Collectively they’re the distinction between a tool-calling demo and an agent you’d really wish to go away working unsupervised.
A couple of pure subsequent steps embrace:
- Persistent reminiscence throughout periods, so the agent can bear in mind what it discovered about you final week
- Retry-with-backoff insurance policies for transient upstream failures
- Reincorporating the exterior APIs rather than the static lookup tables, which largely simply means accepting that timeouts and price limits turn out to be a part of the traditional failure floor
The full script is on GitHub. Clone it, run it, break it intentionally to look at the restoration in motion, and incorporate the following steps above.
On this article, you’ll learn to rework a fundamental tool-calling script right into a resilient agent that gracefully handles failures from misbehaving instruments, malformed mannequin outputs, and unavailable companies.
Matters we’ll cowl embrace:
- Easy methods to construction an iterative agent loop with a security cap on iteration depend.
- The 4 distinct classes of failure an agent encounters when calling instruments, and easy methods to deal with every one.
- Easy methods to design instrument error messages that train the mannequin easy methods to get well, decreasing wasted iterations.
Constructing a Multi-Device Gemma 4 Agent with Error Restoration
Introduction
In a earlier article, we wired up Gemma 4 to a handful of Python capabilities utilizing Ollama’s tool-calling API. That gave us a working single-turn dispatcher: the mannequin picks a instrument, our code runs it, the mannequin solutions. It’s a helpful start line, but it surely’s a great distance from an agent.
One of many issues that turns a tool-calling demo into an precise agent is the way it handles issues going incorrect. Instruments fail. The mannequin hallucinates a operate identify, or passes a string the place you needed a quantity, or asks a couple of metropolis your lookup desk has by no means heard of. An upstream API instances out. A required argument is lacking. Within the earlier tutorial, any of those would both crash the script or get swallowed by a attempt/besides that prints a message and offers up. That’s nice for a single path demo. It’s not nice for something you’d wish to go away working.
This text rebuilds the agent across the assumption that issues will go incorrect, and reveals easy methods to get well gracefully after they do. The sample is straightforward: catch errors on the boundary, convert them into messages the mannequin can learn, ship them again to the mannequin, and let the mannequin determine whether or not to retry, route round the issue, or clarify the failure to the consumer. We’ll additionally wrap every part in a correct iterative agent loop with a security cap on iteration depend.
The full script could be discovered right here. This text walks by way of the components that matter.
Rethinking the Device Loop
The unique dispatcher ran a single spherical: ship the consumer question, accumulate instrument calls, run them, ship the outcomes again, print the mannequin’s reply. That’s a one-shot interplay. It really works nice when the mannequin’s first response appropriately solutions the consumer’s query, but it surely has nowhere to go when one thing goes incorrect. If a instrument fails, the mannequin will get one probability to react after which we’re performed. If the mannequin desires to name one other instrument after seeing the primary consequence, too unhealthy; we already exited.
A correct agent loop is iterative. The construction is easy:
- Ship the present message historical past to the mannequin.
- If the mannequin produces instrument calls, execute every one, append each consequence to the historical past, and loop once more.
- If the mannequin produces a plain textual content response, that’s the ultimate reply. Return.
- Cap the loop at
MAX_ITERATIONSso a confused mannequin can’t burn by way of your CPU endlessly.
That final level is non-negotiable. Small fashions often get caught calling the identical instrument repeatedly, or oscillating between two instruments, and there’s nothing extra demoralizing than strolling again to your terminal to seek out your laptop computer’s followers screaming as a result of Gemma determined to search for the climate in London thirty instances in a row.
Right here’s the loop:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def run_agent(user_query): messages = [{“role”: “user”, “content”: user_query}]
for iteration in vary(1, MAX_ITERATIONS + 1): payload = { “mannequin”: MODEL_NAME, “messages”: messages, “instruments”: available_tools, “stream”: False, }
print(f“[EXECUTION — iteration {iteration}]”) print(” ● Querying mannequin…n”)
attempt: response_data = call_ollama(payload) besides Exception as e: print(f” └─ [ERROR] Error calling Ollama API: {e}”) print(f” └─ Make certain Ollama is working and {MODEL_NAME} is pulled.”) return
message = response_data.get(“message”, {}) tool_calls = message.get(“tool_calls”) or []
# Department A: the mannequin desires to make use of instruments if tool_calls: print(f“[TOOL EXECUTION — {len(tool_calls)} call(s)]”) messages.append(message) tool_messages = print_tool_calls(tool_calls) messages.lengthen(tool_messages) print() proceed
# Department B: the mannequin produced a remaining reply print(“[RESPONSE]”) print(message.get(“content material”, “”) + “n”) return
# Security rail: we exhausted MAX_ITERATIONS with out a remaining reply print(“[RESPONSE]”) print( f“Hit the {MAX_ITERATIONS}-iteration cap with out a remaining reply. “ “This normally means the mannequin is caught in a tool-calling loop. “ “Strive simplifying the question.n” ) |
The sample is value committing to reminiscence as a result of it reveals up in each agent framework you’ll ever learn: the message historical past is the state. For every iteration we ship your entire dialog (the unique consumer question, the mannequin’s tool-call request, our instrument outcomes, any follow-up mannequin messages) again to the mannequin. The mannequin is stateless; the listing is the agent’s reminiscence.
This iterative construction can also be what makes error restoration potential. When a instrument fails and we ship the error again as a instrument message, the mannequin will get to see that error and react to it on the following iteration. With out the loop, there’s nothing to react into.
Constructing the Device Registry
Right here we construct our 4 instruments, all deterministic, all offline. No API keys, no community calls, no flaky exterior companies to debug. The purpose of this text is the error-handling structure, not the instruments themselves, so we wish the instruments to behave predictably so we are able to deal with the framework round them, and so we are able to intentionally set off each failure mode at will.
The instruments are:
get_weather(metropolis): appears to be like up a metropolis in a small dict of canned climate informationget_local_time(metropolis): computes the true present time in that metropolis’s timezone utilizingzoneinfoconvert_currency(quantity, from_currency, to_currency): does the maths in opposition to a hardcoded USD-anchored price deskget_city_population(metropolis): one other lookup in opposition to a small dict
The static information lives on the prime of the file:
|
CITY_DATA = { “london”: {“timezone”: “Europe/London”, “inhabitants”: 8_982_000}, “tokyo”: {“timezone”: “Asia/Tokyo”, “inhabitants”: 13_960_000}, “sao paulo”: {“timezone”: “America/Sao_Paulo”, “inhabitants”: 12_330_000}, “paris”: {“timezone”: “Europe/Paris”, “inhabitants”: 2_161_000}, “big apple”: {“timezone”: “America/New_York”, “inhabitants”: 8_336_000}, “sydney”: {“timezone”: “Australia/Sydney”, “inhabitants”: 5_312_000}, “mumbai”: {“timezone”: “Asia/Kolkata”, “inhabitants”: 20_410_000}, }
EXCHANGE_RATES = { “USD”: 1.00, “EUR”: 0.92, “GBP”: 0.79, “JPY”: 156.40, “BRL”: 5.12, “CAD”: 1.37, “AUD”: 1.51, “INR”: 83.20, } |
The capabilities are intentionally easy, however they increase on unhealthy enter reasonably than returning error strings. Right here’s get_weather:
|
def get_weather(metropolis: str) -> str: “”“Returns present climate situations for a recognized metropolis.”“” key = metropolis.decrease().strip() if key not in WEATHER_DATA: increase ValueError( f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be part of(sorted(WEATHER_DATA.keys()))}.” ) information = WEATHER_DATA[key] return f“The climate in {metropolis.title()} is {information[‘conditions’]} with a temperature of {information[‘temp_c’]}°C.” |
Two issues to name out about that error message. First, it’s particular: it tells the caller what went incorrect and what the legitimate choices are. Second, the instrument increases a ValueError reasonably than returning the error as a string. Don’t catch and string-format errors contained in the instrument; as an alternative, allow them to propagate. We wish the dispatcher to deal with each form of failure in a single place, and we wish the message the mannequin sees on a nasty enter to be informative sufficient that the mannequin can right itself.
get_local_time does the one actual work — precise timezone-aware datetime arithmetic — and that’s additionally the instrument we’ll later use to show sleek degradation in opposition to a simulated upstream failure:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
def get_local_time(metropolis: str) -> str: “”“Returns the present native time for a metropolis, with a cached fallback.”“” key = metropolis.decrease().strip()
# Simulate an upstream geocoding service that will fail unpredictably if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Observe: geocoding service is at present unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ shouldn’t be within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” )
if key not in CITY_DATA: increase ValueError(f“Unknown metropolis: ‘{metropolis}’. Identified cities: {‘, ‘.be part of(sorted(CITY_DATA.keys()))}.”) tz_name = CITY_DATA[key][“timezone”] now = datetime.datetime.now(ZoneInfo(tz_name)) return f“The present native time in {metropolis.title()} is {now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}).” That <code>SIMULATE_GEOCODING_OUTAGE</code> flag lets us reproduce a actual–world failure mode with out needing actual infrastructure to fail. We‘ll come again to it.
The instrument schemas are unchanged from the earlier tutorial’s</a> fashion: normal Ollama operate–calling format, with clear descriptions of what every instrument does and what arguments it expects.
<h2>The 4 Error Restoration Patterns</h2> Time to get severe. There are 4 distinct failure modes you‘ll encounter when an agent talks to instruments, and every one wants its personal technique. They’re dealt with in a single dispatcher operate, however it‘s value understanding them as separate ideas.
Sample 1: Device Execution ErrorsThe primary protection is the dispatcher itself. It wraps each instrument name in a structured
def dispatch_tool_call(tool_call): function_name = tool_call[“function”][“name”] arguments = tool_call[“function”][“arguments”] or {}
# Protection 1: validate the instrument identify in opposition to the registry if function_name not in TOOL_FUNCTIONS: return “error”, ( f”Unknown instrument ‘{function_name}‘. “ f”Legitimate instruments are: {‘, ‘.be part of(TOOL_FUNCTIONS.keys())}.“ )
func = TOOL_FUNCTIONS[function_name]
# Protection 2: catch argument errors (incorrect varieties, lacking or further args) attempt: consequence = func(**arguments) return “okay“, str(consequence) besides TypeError as e: return “error“, f”Dangerous arguments for {function_name}: {e}“ besides ValueError as e: return “error“, str(e) besides ToolUnavailableError as e: return “error“, f”Device quickly unavailable: {e}“ besides Exception as e: return “error“, f”Sudden error in {function_name}: {kind(e).__name__}: {e}“ |
The important thing perception: return the error to the mannequin as a instrument consequence as an alternative of elevating it again to the agent loop. The mannequin can learn the error, see that it requested for “Atlantis” and Atlantis isn’t a recognized metropolis, and pivot to a unique metropolis, or apologize to the consumer. In the event you increase as an alternative, you’ve stripped the mannequin of the power to get well.
Discover the 4 totally different exception varieties and the catch-all on the backside. Each corresponds to an actual class of failure: area errors (ValueError), signature mismatches (TypeError), infrastructure outages (ToolUnavailableError), and the Don Rumsfeld unknown unknowns (Exception). Separating them provides you cleaner error messages, which give the mannequin higher indicators for restoration.
The catch-all is vital and maybe controversial. Some fashion guides will let you know by no means to catch a naked Exception. In an agent dispatcher, the choice — letting an surprising exception kill the loop — is worse. The mannequin loses the possibility to get well, the consumer loses the response, and also you lose the dialog historical past you could possibly have used to debug what occurred. Higher to catch, log, and hand the message to the mannequin.
Sample 2: Malformed Device Calls From the Mannequin
The mannequin often hallucinates a instrument identify that doesn’t exist, or sends arguments beneath the incorrect keys (city as an alternative of metropolis, for instance). The primary protection within the snippet above handles the primary case: earlier than we even attempt to dispatch, we verify the identify in opposition to the registry and return a corrective message itemizing the legitimate names.
The incorrect-argument case is dealt with by the second protection. Python’s **arguments unpacking raises TypeError if the mannequin sends a key phrase the operate doesn’t settle for, or omits a required one. We catch the TypeError, format it cleanly, and the mannequin will get a helpful error on the following iteration:
|
[ERROR]: Dangerous arguments for get_weather: get_weather() received an surprising key phrase argument ‘city’ |
That message comprises every part the mannequin must right itself: the instrument identify, the offending argument, and an implicit sign that the suitable identify is one thing else. In apply the mannequin normally fixes the decision on its subsequent flip.
There’s additionally a extra delicate argument-related failure: kind drift. The mannequin is aware of quantity needs to be a quantity, however in longer conversations it often begins sending "100" as a string. Letting convert_currency increase on that will pressure an additional flip for the mannequin to right itself. A greater method is defensive coercion within the instrument itself:
|
def convert_currency(quantity: float, from_currency: str, to_currency: str) -> str: # Defensive kind coercion: the mannequin typically sends numbers as strings attempt: quantity = float(quantity) besides (TypeError, ValueError): increase ValueError(f“‘quantity’ have to be a quantity, received: {quantity!r}”) # … remainder of the operate |
This silently fixes the widespread case ("100" turns into 100.0) whereas nonetheless elevating a clear error for the genuinely damaged case ("fifty"). The precept: be liberal in what you settle for from the mannequin, and strict in what you complain about.
Sample 3: Area-Degree Errors
These are the errors the instrument itself raises when the inputs are well-formed however the request can’t be glad, similar to asking for the climate in Atlantis, or changing from a foreign money that isn’t within the price desk. These ought to produce error messages that train the mannequin easy methods to get well, not simply say “failed.”
Evaluate these two error messages:
|
Good: “Unknown metropolis: ‘Atlantis’. Identified cities: london, mumbai, big apple, paris, sao paulo, sydney, tokyo.” |
The nice model provides the mannequin every part it must both retry with a sound enter or clarify the limitation to the consumer. The unhealthy model forces the mannequin to guess. Each error message within the instrument capabilities follows this sample: say what went incorrect, and the place potential, listing the legitimate alternate options.
This isn’t only a UX nicety. It straight impacts what number of iterations the agent loop will burn earlier than attending to an excellent reply. A imprecise error can price you a full further spherical journey whereas the mannequin gropes for a repair. A particular error normally will get corrected on the very subsequent flip or, when the enter is genuinely unrecoverable, lets the mannequin produce a clear rationalization with out making an attempt once more in any respect.
Sample 4: Swish Degradation for Unavailable Instruments
The final sample is for the state of affairs the place a instrument isn’t damaged, simply gone — a geocoding service is down, an API quota is exhausted, a database is having a nasty day. You may have three choices right here, roughly so as of how a lot you belief the mannequin to deal with the state of affairs:
- Return a cached or default worth and flag it within the consequence. Greatest when the instrument’s freshness isn’t vital.
- Skip the instrument totally and return a transparent message about what couldn’t be offered. Let the mannequin determine whether or not to retry or work round it.
- Floor the outage to the consumer by having the agent cease and ask for steering.
get_local_time demonstrates choice 1. When SIMULATE_GEOCODING_OUTAGE is on and the random verify journeys, the instrument first tries the native cache:
|
if SIMULATE_GEOCODING_OUTAGE and random.random() < 0.6: if key in TIMEZONE_FALLBACK_CACHE: tz_name = TIMEZONE_FALLBACK_CACHE[key] now = datetime.datetime.now(ZoneInfo(tz_name)) return ( f“[cached] The present native time in {metropolis.title()} is “ f“{now.strftime(‘%H:%M on %A, %B %d, %Y’)} ({tz_name}). “ “Observe: geocoding service is at present unavailable; this worth is from the native cache.” ) increase ToolUnavailableError( f“Geocoding service is unavailable and ‘{metropolis}’ shouldn’t be within the native cache. “ “Please attempt once more later or use a metropolis from the cache: “ f“{‘, ‘.be part of(sorted(TIMEZONE_FALLBACK_CACHE.keys()))}.” ) |
If town is within the cache, the instrument returns a profitable consequence tagged with [cached] and a notice explaining that the reside service is unavailable. The mannequin sees a wonderfully usable reply and a small caveat it might probably select to say to the consumer. If town isn’t within the cache, the instrument falls by way of to choice 2: it raises ToolUnavailableError with a message itemizing what is cached.
That ToolUnavailableError is deliberately a separate exception kind reasonably than a ValueError. The dispatcher provides it its personal catch arm with a definite error prefix (“Device quickly unavailable”) so the mannequin can inform the distinction between “you requested for one thing I don’t have” and “the service is down proper now.” These two failures have very totally different acceptable responses — retry later versus choose a unique enter — and giving the mannequin a transparent sign helps it choose the suitable one.
In manufacturing, you’d lengthen this sample with a retry-with-backoff coverage earlier than falling by way of to the fallback. The construction stays the identical: the dispatcher distinguishes recoverable from unrecoverable failures, and the mannequin is instructed sufficient about every one to make a wise subsequent transfer.
Placing It All Collectively
Time to truly run the factor. Right here’s a question that workouts every part — a number of cities, a number of instruments, and an intentional unhealthy enter to set off error restoration in flight:
|
python most important.py “What is the climate in London, Tokyo, and Atlantis proper now? And convert 50 GBP to JPY.” |
The precise iteration depend and tool-call ordering will differ from run to run relying on how Gemma decides to sequence the work, however right here’s a consultant hint, barely trimmed:

Have a look at what occurred in iteration 3. The mannequin requested about Atlantis, the instrument raised ValueError, the dispatcher transformed it into an error message itemizing the legitimate cities, and the mannequin — on iteration 5 — folded that info right into a clear response. It didn’t retry Atlantis. It didn’t crash. It seen the failure, built-in it with the profitable outcomes, and produced a solution that acknowledged the limitation. That’s your entire payoff of the error-recovery structure in a single hint.
To see sleek degradation in motion, flip SIMULATE_GEOCODING_OUTAGE to True and run a question that asks for native time:
|
python most important.py “What is the native time in London and Paris?” |
About 60% of the time you’ll see the [cached] prefix within the instrument consequence and the mannequin will point out the cached supply in its remaining response. The remainder of the time the instrument will return efficiently and the cached path gained’t set off. Both means, the loop completes and the consumer will get a solution.
Conclusion
We constructed three issues on prime of the inspiration from the primary tutorial: an iterative agent loop with a tough iteration cap, a layered dispatcher that catches each class of instrument failure, and gear capabilities whose error messages train the mannequin easy methods to get well. Collectively they’re the distinction between a tool-calling demo and an agent you’d really wish to go away working unsupervised.
A couple of pure subsequent steps embrace:
- Persistent reminiscence throughout periods, so the agent can bear in mind what it discovered about you final week
- Retry-with-backoff insurance policies for transient upstream failures
- Reincorporating the exterior APIs rather than the static lookup tables, which largely simply means accepting that timeouts and price limits turn out to be a part of the traditional failure floor
The full script is on GitHub. Clone it, run it, break it intentionally to look at the restoration in motion, and incorporate the following steps above.
















