Structured Outputs with LLMs: JSON Mode, Perform Calling, and When to Use Every

Put the Agent Contained in the Workflow

LLM Analysis Frameworks In contrast: Learn how to Really Measure What Your Mannequin Does

, we’ve talked rather a lot about standard strategies for optimizing the efficiency and value of AI functions, like response streaming or immediate caching. Immediately, I need to discuss one thing a bit completely different however equally necessary for constructing actual AI apps. That’s, structured, machine-readable outputs.

To date in a lot of the examples I’ve shared, we’ve been coping with free-text responses from an AI mannequin. The person asks a query, the mannequin responds in pure language, and we simply show that response to the person indirectly. Pretty easy and simple. However what occurs once we want the mannequin to return knowledge in a selected format (e.g., a JSON object) in order that we are able to additional course of it programmatically afterward? What if we want the mannequin to extract particular fields from a textual content or picture, populate a database entry, or set off a subsequent motion based mostly on its response? In these circumstances, getting again a wall of textual content gained’t be very handy. 🤔

Fortunately, there are a number of options for this problem. There are two primary approaches for acquiring structured, machine-readable outputs from an LLM: JSON Mode and Perform Calling (additionally known as device use). These two are sometimes confused with each other (which is to be anticipated since they each cope with structured outputs, duh), however they serve fairly completely different functions. On high of this, OpenAI has launched a stricter variant of Perform Calling known as Structured Outputs, which takes schema enforcement one step additional, as we’ll see. On this put up, we’ll take a more in-depth take a look at all three, perceive how each works underneath the hood, and work out when to make use of every.

So, let’s have a look!

1. What’s JSON Mode?

JSON Mode is the easier strategy for attaining machine-readable outputs from an LLM. It’s primarily a parameter you possibly can set in an API request to instruct the mannequin to all the time return a sound JSON object. And that’s actually all there’s to it! Nonetheless, this simplicity comes at a price, since there are not any ensures on the construction or schema of the JSON (keep in mind we didn’t outline any schema, discipline names, or sorts, or something like this), simply that will probably be legitimate, parseable JSON.

For instance, utilizing OpenAI’s API in Python, we are able to allow JSON Mode by including the parameter response_format={"sort": "json_object"} to our name to the mannequin. Extra particularly, it will look one thing like this:

from openai import OpenAI

consumer = OpenAI(api_key="your_api_key")

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    response_format={"sort": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Always respond in JSON format."
        },
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

print(response.selections[0].message.content material)

And the response would look one thing like this:

{
  "title": "Maria",
  "age": 32,
  "metropolis": "Athens"
}

And voilà! ✨ With only one easy parameter change, we get a sound JSON again each time. No want for string parsing or unusual regex hacks.

There’s a catch, although. JSON Mode does assure that the output is legitimate JSON, nevertheless it does not assure a selected construction. If we run the identical instance a number of occasions, we might get barely completely different discipline names or a barely completely different construction every time. For instance, one run would possibly return "title" , and one other "full_name". That’s an issue if we’re attempting to reliably extract particular fields programmatically.

One other factor is that past setting response_format={"sort": "json_object"}, it’s a good apply to additionally all the time explicitly instruct the mannequin to reply in JSON within the system immediate. Within the instance above, discover how we additionally added “At all times reply in JSON format” within the system immediate. With out this, the mannequin might return a sound JSON generally, however not all the time, since its behaviour might develop into unpredictable.

2. What’s Perform Calling?

Perform Calling (or device use) is a extra superior strategy for getting structured, machine-readable outputs from an LLM. As an alternative of simply asking the mannequin to format its response as JSON, we outline a selected schema. That’s, we explicitly outline a proper description of the construction we wish the output to observe, and on this means, the mannequin is extra constrained to return knowledge that matches that schema precisely. In different phrases, with Perform Calling we outline upfront what fields we anticipate, what sorts these fields must be, that are required, and which aren’t, and so forth.

Right here’s how the identical extraction instance would look utilizing Perform Calling:

from openai import OpenAI
import json

consumer = OpenAI(api_key="your_api_key")

# outline the schema of the output we anticipate
instruments = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "description": "Extract personal information from a text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The full name of the person"
                    },
                    "age": {
                        "type": "integer",
                        "description": "The age of the person"
                    },
                    "city": {
                        "type": "string",
                        "description": "The city the person lives in"
                    }
                },
                "required": ["name", "age", "city"]
            }
        }
    }
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    tool_choice={"sort": "operate", "operate": {"title": "extract_person_info"}},
    messages=[
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

# parse the structured output
tool_call = response.selections[0].message.tool_calls[0]
end result = json.masses(tool_call.operate.arguments)
print(end result)

And the output would appear to be this:

{
  "title": "Maria",
  "age": 32,
  "metropolis": "Athens"
}

The output for this instance with Perform Calling is equivalent to the one we bought utilizing JSON Mode. Nonetheless, the important thing distinction is that, not like JSON Mode, with Perform Calling, the output goes to be constant; it’ll all the time observe the precise outlined schema, with constant discipline names, sorts, and another attributes we outline on it.

🍨 DataCream is a publication providing tales and tutorials on AI, knowledge, and tech. In case you are fascinated with these matters, subscribe right here!

Bonus: Slightly extra on Perform Calling

Earlier than shifting on to Structured Outputs, it’s price pausing and elaborating some extra on the unique motivation and use behind Perform Calling, which fits nicely past simply getting structured outputs. Basically, the idea of Perform Calling is the inspiration of agentic AI workflows. Extra particularly, in an agentic setup, the LLM is not simply responding to a person’s query, however reasonably it’s deciding which motion to take subsequent based mostly on the person’s enter.

For instance, let’s think about a buyer help assistant that may both search for an order, problem a refund, or escalate to a human agent, relying on what the person is asking. With Perform Calling, we are able to outline all three of those candidate actions as “instruments” (features), and the mannequin’s output will outline which one to name and with what arguments based mostly on its enter.

instruments = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order",
            "description": "Look up the status of a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "The order ID"}
                },
                "required": ["order_id"]
            }
        }
    },
    {
        "sort": "operate",
        "operate": {
            "title": "issue_refund",
            "description": "Problem a refund for a buyer order",
            "parameters": {
                "sort": "object",
                "properties": {
                    "order_id": {"sort": "string"},
                    "purpose": {"sort": "string"}
                },
                "required": ["order_id", "reason"]
            }
        }
    }
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=[
        {"role": "user", "content": "I want a refund for order #12345, it arrived broken."}
    ]
)

tool_call = response.selections[0].message.tool_calls[0]
print(tool_call.operate.title)       # "issue_refund"
print(tool_call.operate.arguments)  # '{"order_id": "12345", "purpose": "arrived damaged"}'

So, the API response object appears one thing like this:

ChatCompletionMessage(
    content material=None,
    function='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='issue_refund',
                arguments='{"order_id": "12345", "reason": "arrived broken"}'
            )
        )
    ]
)

And the print statements would hypothetically output:

issue_refund
{"order_id": "12345", "purpose": "arrived damaged"}

So, what is occurring right here? The mannequin returns a tool_calls object as an alternative of an everyday textual content response (try howcontent material is None). Contained in the tool_calls object, we are able to see that the mannequin determined to name issue_refund (not lookup_order), and stuffed within the arguments by itself based mostly on what the person stated. We then parse these arguments and execute the precise refund logic in our system.

Discover how the mannequin didn’t simply return the requested knowledge, however reasonably determined which of the candidate actions is probably the most acceptable to carry out, then stuffed within the acceptable arguments in its response. On this means, we are able to then take these arguments and really execute the corresponding motion in our system. That is the actual energy of Perform Calling, and it’s why it’s such a foundational part in agentic AI functions.

However let’s get again to machine-readable outputs now, and we’ll speak extra about agentic AI workflows and Perform Calling in another put up.

3. What about Structured Outputs?

A stricter variation of Perform Calling is Structured Outputs. Even when Perform Calling guides the mannequin to offer an output following an outlined schema, this isn’t actually hard-constrained. In apply, which means that some deviations from this outlined schema should happen. Such deviations could also be:

A discipline marked as required that’s, in truth, omitted if the mannequin struggles to determine its worth
Further fields not outlined in our schema are added
A discipline outlined as integer comes again as a string "32" as an alternative of 32

…and so forth.

This occurs as a result of, in Perform Calling, the mannequin is attempting to observe the schema, however that is nonetheless a best-effort era. Like every LLM output, the output right here continues to be essentially tokens being predicted one after the other, with the schema being only a sturdy trace. There’s nonetheless an excellent probability for that token-by-token era to be derailed someplace alongside the route and produce outputs that deviate from the outlined schema.

Structured Outputs, however, takes Perform Calling one step additional by guaranteeing that each discipline within the outlined schema will all the time seem within the output precisely as outlined, with no surprises, no lacking or further fields. The important thing differentiator is that OpenAI makes use of constrained decoding behind the scenes. Because of this at every token step, the mannequin is barely allowed to generate tokens that maintain the output legitimate in keeping with the schema. In different phrases, the schema is enforced on the era degree, as an alternative of simply being requested by way of the system immediate.

OpenAI’s Structured Outputs might be activated by merely setting strict: true within the operate definition:

instruments = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "strict": True,  # enables Structured Outputs
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"}
                },
                "required": ["name", "age", "city"],
                "additionalProperties": False
            }
        }
    }
]

However once more, this comes at a price. Structured Outputs is on the market on GPT-4o and later fashions, with older fashions falling again to JSON mode. Not each JSON construction is supported, and it might be a bit slower since OpenAI preprocesses the outcomes.

Nonetheless, it’s the strictest and most secure method to implement a selected schema for the mannequin’s outputs with no room for deviation. For manufacturing methods the place reliability and consistency actually matter, that is usually the most secure possibility.

However aren’t all these the identical factor?

JSON Mode, Perform Calling, and Structured Outputs might sound to do the identical factor, since all of them primarily get you JSON again from the mannequin. Nonetheless, as we’ve already seen, they’re meaningfully completely different in what they assure and what they’re designed for. Specifically:

Schema enforcement: JSON Mode returns a sound JSON, however with no structural ensures. Perform Calling returns a sound JSON that matches an outlined schema, following particular discipline names, sorts, and required fields, however deviations are nonetheless attainable. Structured Outputs goes one step additional, implementing that schema on the era degree, rendering deviations unattainable.
Use case: JSON Mode is for circumstances the place we want a machine-readable response however can dwell with a variable format. Perform Calling was primarily designed for circumstances the place the mannequin must set off an motion or move arguments to an exterior device, thus is actually the overall case of machine-readable outputs. Structured Outputs is Perform Calling with a reliability assure, making it perfect for manufacturing pipelines the place we want consistency in outputs.
Ease of setup: JSON Mode is the lightest choice to arrange; only a single parameter change with no schema definition. On the flip facet, for Perform Calling and Structured Outputs, we additionally want to consider and arrange the JSON schema.

Having stated that, OpenAI itself recommends all the time utilizing Structured Outputs as an alternative of JSON Mode at any time when attainable, as a basic rule of thumb.

On my thoughts

Acquiring machine-readable outputs from LLMs and selecting the suitable strategy for doing so could make an enormous distinction within the reliability and maintainability of any AI software. Freetext responses are nice for conversational interfaces, however the second our LLM is a part in a bigger system (like feeding knowledge downstream, triggering actions, populating databases, and so on.), structured responses are important. JSON Mode, Perform Calling, and Structured Outputs can present such outputs, every at a special degree of strictness. Like many choices in AI engineering, the precise selection is determined by what you’re constructing and the way a lot variability you possibly can tolerate.

In the event you made it this far, you would possibly discover pialgorithms helpful — a platform we’ve been constructing that helps groups securely handle organizational data in a single place.

Beloved this put up? Be a part of me on 💌Substack and 💼LinkedIn

All pictures by the creator, besides talked about in any other case.