10 Python One-Liners for Calling LLMs from Your Code

Picture by Creator

Introduction

You don’t all the time want a heavy wrapper, a giant shopper class, or dozens of strains of boilerplate to name a big language mannequin. Generally one well-crafted line of Python does all of the work: ship a immediate, obtain a response. That form of simplicity can velocity up prototyping or embedding LLM calls inside scripts or pipelines with out architectural overhead.

On this article, you’ll see ten Python one-liners that decision and work together with LLMs. We’ll cowl:

Every snippet comes with a short rationalization and a hyperlink to official documentation, so you’ll be able to confirm what’s taking place below the hood. By the top, you’ll know not solely the best way to drop in quick LLM calls but additionally perceive when and why every sample works.

Setting Up

Earlier than dropping within the one-liners, there are some things to arrange so that they run easily:

Set up required packages (solely as soon as):

pip set up openai anthropic google-generativeai requests httpx

pip set up openai anthropic google–generativeai requests httpx

Guarantee your API keys are set in surroundings variables, by no means hard-coded in your scripts. For instance:

export OPENAI_API_KEY=”sk-…” export ANTHROPIC_API_KEY=”claude-yourkey” export GOOGLE_API_KEY=”your_google_key”

export OPENAI_API_KEY=“sk-…”

export ANTHROPIC_API_KEY=“claude-yourkey”

export GOOGLE_API_KEY=“your_google_key”

For native setups (Ollama, LM Studio, vLLM), you want the mannequin server operating domestically and listening on the proper port (for example, Ollama’s default REST API runs at http://localhost:11434).

All one-liners assume you employ the proper mannequin title and that the mannequin is both accessible through cloud or domestically. With that in place, you’ll be able to paste every one-liner straight into your Python REPL or script and get a response, topic to quota or native useful resource limits.

Hosted API One-Liners (Cloud Fashions)

Hosted APIs are the simplest approach to begin utilizing giant language fashions. You don’t must run a mannequin domestically or fear about GPU reminiscence; simply set up the shopper library, set your API key, and ship a immediate. These APIs are maintained by the mannequin suppliers themselves, so that they’re dependable, safe, and incessantly up to date.

The next one-liners present the best way to name among the hottest hosted fashions straight from Python. Every instance sends a easy message to the mannequin and prints the generated response.

1. OpenAI GPT Chat Completion

OpenAI’s API offers entry to GPT fashions like GPT-4o and GPT-4o-mini. The SDK handles all the pieces from authentication to response parsing.

from openai import OpenAI; print(OpenAI().chat.completions.create(mannequin=”gpt-4o-mini”, messages=[{“role”:”user”,”content”:”Explain vector similarity”}]).decisions[0].message.content material)

from openai import OpenAI; print(OpenAI().chat.completions.create(mannequin=“gpt-4o-mini”, messages=[{“role”:“user”,“content”:“Explain vector similarity”}]).decisions[0].message.content material)

What it does: It creates a shopper, sends a message to GPT-4o-mini, and prints the mannequin’s reply.

Why it really works: The openai Python bundle wraps the REST API cleanly. You solely want your OPENAI_API_KEY set as an surroundings variable.

Documentation: OpenAI Chat Completions API

2. Anthropic Claude

Anthropic’s Claude fashions (Claude 3, Claude 3.5 Sonnet, and so forth.) are recognized for his or her lengthy context home windows and detailed reasoning. Their Python SDK follows an analogous chat-message format to OpenAI’s.

from anthropic import Anthropic; print(Anthropic().messages.create(mannequin=”claude-3-5-sonnet”, messages=[{“role”:”user”,”content”:”How does chain of thought prompting work?”}]).content material[0].textual content)

from anthropic import Anthropic; print(Anthropic().messages.create(mannequin=“claude-3-5-sonnet”, messages=[{“role”:“user”,“content”:“How does chain of thought prompting work?”}]).content material[0].textual content)

What it does: Initializes the Claude shopper, sends a message, and prints the textual content of the primary response block.

Why it really works: The .messages.create() technique makes use of a typical message schema (function + content material), returning structured output that’s simple to extract.

Documentation: Anthropic Claude API Reference

3. Google Gemini

Google’s Gemini API (through the google-generativeai library) makes it easy to name multimodal and textual content fashions with minimal setup. The important thing distinction is that Gemini’s API treats each immediate as “content material technology,” whether or not it’s textual content, code, or reasoning.

import os, google.generativeai as genai; genai.configure(api_key=os.getenv(“GOOGLE_API_KEY”)); print(genai.GenerativeModel(“gemini-1.5-flash”).generate_content(“Describe retrieval-augmented technology”).textual content)

import os, google.generativeai as genai; genai.configure(api_key=os.getenv(“GOOGLE_API_KEY”)); print(genai.GenerativeModel(“gemini-1.5-flash”).generate_content(“Describe retrieval-augmented technology”).textual content)

What it does: Calls the Gemini 1.5 Flash mannequin to explain retrieval-augmented technology (RAG) and prints the returned textual content.

Why it really works: GenerativeModel() units the mannequin title, and generate_content() handles the immediate/response stream. You simply want your GOOGLE_API_KEY configured.

Documentation: Google Gemini API Quickstart

4. Mistral AI (REST request)

Mistral offers a easy chat-completions REST API. You ship an inventory of messages and obtain a structured JSON response in return.

import requests, json; print(requests.publish(“https://api.mistral.ai/v1/chat/completions”, headers={“Authorization”:”Bearer YOUR_MISTRAL_API_KEY”}, json={“mannequin”:”mistral-tiny”,”messages”:[{“role”:”user”,”content”:”Define fine-tuning”}]}).json()[“choices”][0][“message”][“content”])

import requests, json; print(requests.publish(“https://api.mistral.ai/v1/chat/completions”, headers={“Authorization”:“Bearer YOUR_MISTRAL_API_KEY”}, json={“mannequin”:“mistral-tiny”,“messages”:[{“role”:“user”,“content”:“Define fine-tuning”}]}).json()[“choices”][0][“message”][“content”])

What it does: Posts a chat request to Mistral’s API and prints the assistant message.

Why it really works: The endpoint accepts an OpenAI-style messages array and returns decisions -> message -> content material.
Try the Mistral API reference and quickstart.

5. Hugging Face Inference API

If you happen to host a mannequin or use a public one on Hugging Face, you’ll be able to name it with a single POST. The text-generation activity returns generated textual content in JSON.

import requests; print(requests.publish(“https://api-inference.huggingface.co/fashions/mistralai/Mistral-7B-Instruct-v0.2”, headers={“Authorization”:”Bearer YOUR_HF_TOKEN”}, json={“inputs”:”Write a haiku about information”}).json()[0][“generated_text”])

import requests; print(requests.publish(“https://api-inference.huggingface.co/fashions/mistralai/Mistral-7B-Instruct-v0.2”, headers={“Authorization”:“Bearer YOUR_HF_TOKEN”}, json={“inputs”:“Write a haiku about information”}).json()[0][“generated_text”])

What it does: Sends a immediate to a hosted mannequin on Hugging Face and prints the generated textual content.

Why it really works: The Inference API exposes task-specific endpoints; for textual content technology, it returns an inventory with generated_text.
Documentation: Inference API and Textual content Era activity pages.

Native Mannequin One-Liners

Working fashions in your machine offers you privateness and management. You keep away from community latency and maintain information native. The tradeoff is about up: you want the server operating and a mannequin pulled. The one-liners beneath assume you’ve already began the native service.

6. Ollama (Native Llama 3 or Mistral)

Ollama exposes a easy REST API on localhost:11434. Use /api/generate for prompt-style technology or /api/chat for chat turns.

import requests; print(requests.publish(“http://localhost:11434/api/generate”, json={“mannequin”:”llama3″,”immediate”:”What’s vector search?”}).textual content)

import requests; print(requests.publish(“http://localhost:11434/api/generate”, json={“mannequin”:“llama3”,“immediate”:“What’s vector search?”}).textual content)

What it does: Sends a generate request to your native Ollama server and prints the uncooked response textual content.

Why it really works: Ollama runs an area HTTP server with endpoints like /api/generate and /api/chat. It’s essential to have the app operating and the mannequin pulled first. See official API documentation.

7. LM Studio (OpenAI-Appropriate Endpoint)

LM Studio can serve native fashions behind OpenAI-style endpoints similar to /v1/chat/completions. Begin the server from the Developer tab, then name it like several OpenAI-compatible backend.

import requests; print(requests.publish(“http://localhost:1234/v1/chat/completions”, json={“mannequin”:”phi-3″,”messages”:[{“role”:”user”,”content”:”Explain embeddings”}]}).json()[“choices”][0][“message”][“content”])

import requests; print(requests.publish(“http://localhost:1234/v1/chat/completions”, json={“mannequin”:“phi-3”,“messages”:[{“role”:“user”,“content”:“Explain embeddings”}]}).json()[“choices”][0][“message”][“content”])

What it does: Calls an area chat completion and prints the message content material.

Why it really works: LM Studio exposes OpenAI-compatible routes and likewise helps an enhanced API. Latest releases additionally add /v1/responses help. Examine the docs in case your native construct makes use of a unique route.

8. vLLM (Self-Hosted LLM Server)

vLLM offers a high-performance server with OpenAI-compatible APIs. You possibly can run it domestically or on a GPU field, then name /v1/chat/completions.

import requests; print(requests.publish(“http://localhost:8000/v1/chat/completions”, json={“mannequin”:”mistral”,”messages”:[{“role”:”user”,”content”:”Give me three LLM optimization tricks”}]}).json()[“choices”][0][“message”][“content”])

import requests; print(requests.publish(“http://localhost:8000/v1/chat/completions”, json={“mannequin”:“mistral”,“messages”:[{“role”:“user”,“content”:“Give me three LLM optimization tricks”}]}).json()[“choices”][0][“message”][“content”])

What it does: Sends a chat request to a vLLM server and prints the primary response message.

Why it really works: vLLM implements OpenAI-compatible Chat and Completions APIs, so any OpenAI-style shopper or plain requests name works as soon as the server is operating. Examine the documentation.

Useful Methods and Ideas

As soon as you realize the fundamentals of sending requests to LLMs, a couple of neat tips make your workflow quicker and smoother. These remaining two examples reveal the best way to stream responses in real-time and the best way to execute asynchronous API calls with out blocking your program.

9. Streaming Responses from OpenAI

Streaming lets you print every token as it’s generated by the mannequin, moderately than ready for the complete message. It’s good for interactive apps or CLI instruments the place you need output to seem immediately.

from openai import OpenAI; [print(c.choices[0].delta.content material or “”, finish=””) for c in OpenAI().chat.completions.create(mannequin=”gpt-4o-mini”, messages=[{“role”:”user”,”content”:”Stream a poem”}], stream=True)]

from openai import OpenAI; [print(c.choices[0].delta.content material or “”, finish=“”) for c in OpenAI().chat.completions.create(mannequin=“gpt-4o-mini”, messages=[{“role”:“user”,“content”:“Stream a poem”}], stream=True)]

What it does: Sends a immediate to GPT-4o-mini and prints tokens as they arrive, simulating a “dwell typing” impact.

Why it really works: The stream=True flag in OpenAI’s API returns partial occasions. Every chunk accommodates a delta.content material area, which this one-liner prints because it streams in.

Documentation: OpenAI Streaming Information.

10. Async Calls with httpx

Asynchronous calls allow you to question fashions with out blocking your app, making them ultimate for making a number of requests concurrently or integrating LLMs into internet servers.

import asyncio, httpx; print(asyncio.run(httpx.AsyncClient().publish(“https://api.mistral.ai/v1/chat/completions”, headers={“Authorization”:”Bearer TOKEN”}, json={“mannequin”:”mistral-tiny”,”messages”:[{“role”:”user”,”content”:”Hello”}]})).json()[“choices”][0][“message”][“content”])

import asyncio, httpx; print(asyncio.run(httpx.AsyncClient().publish(“https://api.mistral.ai/v1/chat/completions”, headers={“Authorization”:“Bearer TOKEN”}, json={“mannequin”:“mistral-tiny”,“messages”:[{“role”:“user”,“content”:“Hello”}]})).json()[“choices”][0][“message”][“content”])

What it does: Posts a chat request to Mistral’s API asynchronously, then prints the mannequin’s reply as soon as full.

Why it really works: The httpx library helps async I/O, so community calls don’t block the principle thread. This sample is useful for light-weight concurrency in scripts or apps.

Documentation: Async Help.

Wrapping Up

Every of those one-liners is greater than a fast demo; it’s a constructing block. You possibly can flip any of them right into a operate, wrap them inside a command-line device, or construct them right into a backend service. The identical code that matches on one line can simply develop into manufacturing workflows when you add error dealing with, caching, or logging.

If you wish to discover additional, test the official documentation for detailed parameters like temperature, max tokens, and streaming choices. Every supplier maintains dependable references:

The actual takeaway is that Python makes working with LLMs each accessible and versatile. Whether or not you’re operating GPT-4o within the cloud or Llama 3 domestically, you’ll be able to attain production-grade outcomes with only a few strains of code.

Retaining Possibilities Sincere: The Jacobian Adjustment

The Machine Studying “Creation Calendar” Day 24: Transformers for Textual content in Excel

Picture by Creator

Introduction

On this article, you’ll see ten Python one-liners that decision and work together with LLMs. We’ll cowl:

Setting Up

Earlier than dropping within the one-liners, there are some things to arrange so that they run easily:

Set up required packages (solely as soon as):

pip set up openai anthropic google-generativeai requests httpx

pip set up openai anthropic google–generativeai requests httpx

Guarantee your API keys are set in surroundings variables, by no means hard-coded in your scripts. For instance:

export OPENAI_API_KEY=”sk-…” export ANTHROPIC_API_KEY=”claude-yourkey” export GOOGLE_API_KEY=”your_google_key”

export OPENAI_API_KEY=“sk-…”

export ANTHROPIC_API_KEY=“claude-yourkey”

export GOOGLE_API_KEY=“your_google_key”

Hosted API One-Liners (Cloud Fashions)

The next one-liners present the best way to name among the hottest hosted fashions straight from Python. Every instance sends a easy message to the mannequin and prints the generated response.

1. OpenAI GPT Chat Completion

OpenAI’s API offers entry to GPT fashions like GPT-4o and GPT-4o-mini. The SDK handles all the pieces from authentication to response parsing.

What it does: It creates a shopper, sends a message to GPT-4o-mini, and prints the mannequin’s reply.

Why it really works: The openai Python bundle wraps the REST API cleanly. You solely want your OPENAI_API_KEY set as an surroundings variable.

Documentation: OpenAI Chat Completions API

2. Anthropic Claude

What it does: Initializes the Claude shopper, sends a message, and prints the textual content of the primary response block.

Why it really works: The .messages.create() technique makes use of a typical message schema (function + content material), returning structured output that’s simple to extract.

Documentation: Anthropic Claude API Reference

3. Google Gemini

What it does: Calls the Gemini 1.5 Flash mannequin to explain retrieval-augmented technology (RAG) and prints the returned textual content.

Why it really works: GenerativeModel() units the mannequin title, and generate_content() handles the immediate/response stream. You simply want your GOOGLE_API_KEY configured.

Documentation: Google Gemini API Quickstart

4. Mistral AI (REST request)

Mistral offers a easy chat-completions REST API. You ship an inventory of messages and obtain a structured JSON response in return.

What it does: Posts a chat request to Mistral’s API and prints the assistant message.

Why it really works: The endpoint accepts an OpenAI-style messages array and returns decisions -> message -> content material.
Try the Mistral API reference and quickstart.

5. Hugging Face Inference API

If you happen to host a mannequin or use a public one on Hugging Face, you’ll be able to name it with a single POST. The text-generation activity returns generated textual content in JSON.

What it does: Sends a immediate to a hosted mannequin on Hugging Face and prints the generated textual content.

Native Mannequin One-Liners

6. Ollama (Native Llama 3 or Mistral)

Ollama exposes a easy REST API on localhost:11434. Use /api/generate for prompt-style technology or /api/chat for chat turns.

import requests; print(requests.publish(“http://localhost:11434/api/generate”, json={“mannequin”:”llama3″,”immediate”:”What’s vector search?”}).textual content)

import requests; print(requests.publish(“http://localhost:11434/api/generate”, json={“mannequin”:“llama3”,“immediate”:“What’s vector search?”}).textual content)

What it does: Sends a generate request to your native Ollama server and prints the uncooked response textual content.

7. LM Studio (OpenAI-Appropriate Endpoint)

LM Studio can serve native fashions behind OpenAI-style endpoints similar to /v1/chat/completions. Begin the server from the Developer tab, then name it like several OpenAI-compatible backend.

What it does: Calls an area chat completion and prints the message content material.

8. vLLM (Self-Hosted LLM Server)

vLLM offers a high-performance server with OpenAI-compatible APIs. You possibly can run it domestically or on a GPU field, then name /v1/chat/completions.

What it does: Sends a chat request to a vLLM server and prints the primary response message.

Useful Methods and Ideas

9. Streaming Responses from OpenAI

What it does: Sends a immediate to GPT-4o-mini and prints tokens as they arrive, simulating a “dwell typing” impact.

Documentation: OpenAI Streaming Information.

10. Async Calls with httpx

Asynchronous calls allow you to question fashions with out blocking your app, making them ultimate for making a number of requests concurrently or integrating LLMs into internet servers.

What it does: Posts a chat request to Mistral’s API asynchronously, then prints the mannequin’s reply as soon as full.

Why it really works: The httpx library helps async I/O, so community calls don’t block the principle thread. This sample is useful for light-weight concurrency in scripts or apps.

Documentation: Async Help.

Wrapping Up

10 Python One-Liners for Calling LLMs from Your Code

Retaining Possibilities Sincere: The Jacobian Adjustment

The Machine Studying “Creation Calendar” Day 24: Transformers for Textual content in Excel

Related Posts

Retaining Possibilities Sincere: The Jacobian Adjustment

The Machine Studying “Creation Calendar” Day 24: Transformers for Textual content in Excel

The Machine Studying “Introduction Calendar” Day 23: CNN in Excel

Cease Retraining Blindly: Use PSI to Construct a Smarter Monitoring Pipeline

The Machine Studying “Creation Calendar” Day 20: Gradient Boosted Linear Regression in Excel

How I Optimized My Leaf Raking Technique Utilizing Linear Programming

7 Pandas Methods to Deal with Giant Datasets

Leave a Reply Cancel reply

POPULAR NEWS

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

Easy methods to Use LLMs for Highly effective Computerized Evaluations

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

XMN is accessible for buying and selling!

College endowments be a part of crypto rush, boosting meme cash like Meme Index

EDITOR'S PICK

AI-powered Reddit search now obtainable to pick customers • The Register

Trump’s crypto czar David Sacks confirms promoting all Bitcoin, Ether, and Solana earlier than administration started

Getting Began with Highly effective Knowledge Tables in Your Python Internet Apps | by Tom Gotsman | Oct, 2024

Why Tech Wants a Soul

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

10 Python One-Liners for Calling LLMs from Your Code

Introduction

Setting Up

Hosted API One-Liners (Cloud Fashions)

1. OpenAI GPT Chat Completion

2. Anthropic Claude

3. Google Gemini

4. Mistral AI (REST request)

5. Hugging Face Inference API

Native Mannequin One-Liners

6. Ollama (Native Llama 3 or Mistral)

7. LM Studio (OpenAI-Appropriate Endpoint)

8. vLLM (Self-Hosted LLM Server)

Useful Methods and Ideas

9. Streaming Responses from OpenAI

10. Async Calls with httpx

Wrapping Up

READ ALSO

Introduction

Setting Up

Hosted API One-Liners (Cloud Fashions)

1. OpenAI GPT Chat Completion

2. Anthropic Claude

3. Google Gemini

4. Mistral AI (REST request)

5. Hugging Face Inference API

Native Mannequin One-Liners

6. Ollama (Native Llama 3 or Mistral)

7. LM Studio (OpenAI-Appropriate Endpoint)

8. vLLM (Self-Hosted LLM Server)

Useful Methods and Ideas

9. Streaming Responses from OpenAI

10. Async Calls with httpx

Wrapping Up

Related Posts

Leave a Reply Cancel reply

POPULAR NEWS

EDITOR'S PICK

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?