Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

OpenAI Python SDK with Keeptrusts Gateway

The Keeptrusts gateway is OpenAI-compatible, so the official openai Python SDK works out of the box. This guide covers client configuration, streaming, async patterns, function calling, and graceful handling of policy blocks.

Use this page when

  • You are configuring the openai Python SDK to route through the Keeptrusts gateway.
  • You need streaming, async, or function-calling patterns that work with gateway policy enforcement.
  • You want to handle 409 policy blocks gracefully in Python applications.
  • You are adding correlation headers or custom metadata to enrich the audit trail.

Primary audience

  • Primary: Python developers integrating AI into applications via the OpenAI SDK
  • Secondary: AI Engineers building production AI services, Backend Developers adding LLM features

Client Configuration

Basic Setup

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)

The only change is base_url. The gateway forwards the api_key to the upstream provider.

Environment-Based Configuration

import os
from openai import OpenAI

client = OpenAI(
base_url=os.environ.get("LLM_GATEWAY_URL", "http://localhost:41002/v1"),
api_key=os.environ["OPENAI_API_KEY"],
timeout=30.0,
max_retries=2,
)

Use environment variables so the same code works across local development, staging, and production gateways.

Custom Headers

Pass metadata to the gateway for richer audit trails:

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key=os.environ["OPENAI_API_KEY"],
default_headers={
"X-Request-Source": "billing-service",
"X-Correlation-Id": correlation_id,
},
)

Streaming Responses

Streaming works identically to direct OpenAI usage. The gateway evaluates output policies on the assembled stream.

stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain AI governance briefly."}],
stream=True,
)

for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print()

If a policy triggers mid-stream, the gateway terminates the stream and returns the policy error in the final chunk.

Collecting Streamed Content

collected = []
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List three benefits of policy enforcement."}],
stream=True,
)

for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
collected.append(delta.content)

full_response = "".join(collected)

Async Client

For high-throughput applications, use the async client:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)

async def governed_completion(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content

async def main():
results = await asyncio.gather(
governed_completion("What is prompt injection?"),
governed_completion("Explain data redaction."),
governed_completion("Define AI compliance."),
)
for r in results:
print(r[:80], "...")

asyncio.run(main())

Each concurrent request is independently evaluated by the gateway's policy chain.

Function Calling

Function calling passes through the gateway transparently. Policies can inspect tool names and arguments.

tools = [
{
"type": "function",
"function": {
"name": "get_account_balance",
"description": "Retrieve the balance for a customer account",
"parameters": {
"type": "object",
"properties": {
"account_id": {"type": "string", "description": "The account identifier"}
},
"required": ["account_id"],
},
},
}
]

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the balance for account AC-1234?"}],
tools=tools,
tool_choice="auto",
)

message = response.choices[0].message
if message.tool_calls:
for call in message.tool_calls:
print(f"Tool: {call.function.name}")
print(f"Args: {call.function.arguments}")

See Governed Function Calling for tool validation policies and budget limits.

Handling Policy Blocks (409)

When a policy blocks a request, the gateway returns HTTP 409. The SDK raises an APIStatusError:

from openai import APIStatusError

try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json()
policy_name = error_body.get("error", {}).get("policy", "unknown")
print(f"Blocked by policy: {policy_name}")
return None
raise

Structured Error Envelope

{
"error": {
"type": "policy_violation",
"message": "Request blocked: prompt contains disallowed content",
"policy": "block-prompt-injection",
"code": "input_blocked"
}
}

Retry with Backoff for 429

from openai import RateLimitError
import time

def governed_call(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited, retrying in {wait}s...")
time.sleep(wait)
raise RuntimeError("Exceeded max retries for rate limit")

Multi-Provider Routing

The gateway can route to different providers based on model name:

# Routes through the gateway to OpenAI
openai_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)

# Routes through the gateway to Anthropic (if configured)
anthropic_response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}],
)

Same client, same SDK, same policies — the gateway resolves the provider from the model name in your config.

Best Practices

  1. Always set base_url via environment variable — avoids hardcoding gateway addresses.
  2. Handle 409 explicitly — policy blocks are expected behavior, not crashes.
  3. Use async for batch workloads — the gateway handles concurrent requests independently.
  4. Pass correlation headers — enriches the audit trail for debugging.
  5. Keep api_key in env vars — the gateway forwards it securely; never log it.
  6. Test with observe-only policies first — validate integration before enforcing blocks.

Next steps

For AI systems

  • Canonical terms: OpenAI Python SDK, base_url, openai.OpenAI, streaming, async client (AsyncOpenAI), function calling, X-Request-Source header, policy block (409).
  • Key config: OpenAI(base_url="http://localhost:41002/v1", api_key=...). Gateway forwards api_key to upstream provider.
  • Multi-provider: same client instance routes to different providers based on model name in gateway config.
  • Best next pages: TypeScript SDK Patterns, Streaming Patterns, Error Handling.

For engineers

  • Only change is base_url — the gateway is OpenAI-compatible. api_key is forwarded to the upstream provider securely.
  • Use environment variables for base_url so the same code works across local, staging, and production gateways.
  • Pass default_headers with X-Request-Source and X-Correlation-Id for richer audit trails.
  • Handle openai.APIStatusError with status_code == 409 to catch policy blocks — do not retry these.
  • Use AsyncOpenAI for batch workloads; the gateway handles concurrent requests independently.
  • Streaming works identically to direct OpenAI usage; output policies evaluate on the assembled stream.

For leaders

  • Zero SDK migration cost — existing OpenAI Python code adopts governance with a single URL change.
  • Gateway policy enforcement is transparent to the application; developers get compliance without refactoring.
  • Correlation headers enable end-to-end traceability from application logs to gateway decision events.
  • Multi-provider routing through the same client removes vendor lock-in at the application layer.