Getting Began with the Claude API in Python

# Introduction

You wish to add Claude to a Python utility. Creating an account and making your first API name is easy. The official documentation can get you from zero to a working request in a couple of minutes. The subsequent questions are often extra sensible:

What does the response object comprise?
How do you stream responses so customers can see output because it’s generated?
How do you construction prompts and deal with responses in a manufacturing utility?

The Claude Python SDK takes care of a lot of the underlying API interplay. It offers typed response objects, built-in retry dealing with, and a easy interface for working with the Messages API.

This text walks you thru setup, your first API name, studying the response, system prompts, and streaming. By the tip, you may have a working basis.

# Conditions and Set up

You want Python 3.9 or increased, a free Claude Console account, and an API key from the Console’s Settings > API Keys web page. You’ll be able to add $5 in credit and work via all the pieces on this article.

With these in place, set up the SDK:

By no means hardcode your API key in supply recordsdata. Retailer it as an surroundings variable as a substitute:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

Or add it to a .env file on the venture root when you’re utilizing python-dotenv. The SDK reads the ANTHROPIC_API_KEY out of your surroundings, so that you need not move it wherever in your code.

# Making Your First API Name

The entry level for each interplay is consumer.messages.create(). Let’s ask Claude to clarify what a context window is, one thing you may really want to grasp as you employ the API.

You move three issues: the mannequin ID, a max_tokens restrict, and a messages checklist. The messages checklist is at all times a listing of dicts, every with a "position" and "content material" key.

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content material[0].textual content)

The mannequin subject takes the precise mannequin ID string. max_tokens is a tough ceiling on what number of output tokens Claude will produce; the response stops there even when the thought is not full, so set it excessive sufficient for open-ended requests. The messages checklist should at all times begin with a "person" flip.

Pattern output:

A context window is the utmost quantity of textual content (measured in tokens) {that a} language
mannequin can course of and take into account at one time, encompassing each your enter and its output.

# Understanding the Response Object

The response from messages.create() is a typed Message object. It is price inspecting the complete construction earlier than constructing something on prime of it.

Exchange the print line within the earlier instance with:

Working that offers you the complete object:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  kind="message",
  position="assistant",
  content material=[TextBlock(text="A context window is...", type="text")],
  mannequin="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  utilization=Utilization(input_tokens=19, output_tokens=42)
)

A couple of fields right here matter greater than they first seem. stop_reason tells you why Claude stopped producing. end_turn means Claude completed by itself phrases. When you see max_tokens, the response was lower off by your restrict, and you could want to lift it or rethink the immediate.

The utilization subject tracks each enter and output tokens for the request. That is how Anthropic calculates billing, and it is also the way you detect when a immediate is creeping too near the mannequin’s context restrict. content material is a listing — in customary textual content responses it at all times has one merchandise, a TextBlock — so response.content material[0].textual content is the idiomatic option to pull the textual content out.

# Utilizing System Prompts

A system immediate allows you to give Claude a persistent position, set constraints, or present context that ought to apply throughout the complete dialog. You move it as a top-level system parameter — separate from the messages checklist, not as a message itself.

Right here we configure Claude to behave as a code reviewer who solely responds in Python and avoids basic explanations:

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You're a Python code reviewer. "
        "Reply solely with corrected or improved Python code. "
        "Don't clarify adjustments until the person explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content material[0].textual content)

The system immediate sits above the dialog in Claude’s context. It carries the identical authority all through all turns, so position directions, formatting guidelines, and area constraints you set right here persist with out you repeating them in each message.

# Streaming Responses

For requests the place Claude might take just a few seconds to reply, streaming allows you to show textual content because it arrives as a substitute of ready for the complete response. The SDK exposes this via consumer.messages.stream(), used as a context supervisor.

The text_stream iterator yields particular person textual content chunks in actual time. Every chunk is a string fragment, not a full sentence. You move finish="" and flush=True to print() so output seems constantly somewhat than buffering:

import anthropic

consumer = anthropic.Anthropic()

with consumer.messages.stream(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, finish="", flush=True)

print()  # newline after stream ends

The context supervisor ensures the HTTP connection is closed cleanly when the block exits, even when an exception is raised mid-stream. When you want the entire Message object after streaming — together with token utilization counts — name stream.get_final_message() earlier than the block closes.

Pattern output:

Python lists are dynamic arrays. While you append a component and the checklist has no
room, Python allocates a brand new, bigger block of reminiscence — usually 1.125x the present
measurement — copies all present parts into it, and releases the outdated block. This
operation is O(n) within the worst case, however as a result of it occurs occasionally relative to
the variety of appends, the amortized value per append stays O(1). You'll be able to pre-allocate
capability with a listing comprehension or by passing an iterable to the checklist constructor
if you realize the ultimate measurement upfront.

# Subsequent Steps

You now have the core constructing blocks: requests, structured responses, system prompts, and streaming.

Subsequent, you’ll be able to study error dealing with, token utilization, and multi-turn conversations. As a result of the API is stateless, you should ship the dialog historical past with every request. The SDK documentation reveals the really useful strategy.

The API reference additionally consists of options like structured outputs and device use. Joyful exploring!

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embrace DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and occasional! At present, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates participating useful resource overviews and coding tutorials.