OpenAI Python SDK with Keeptrusts Gateway
The Keeptrusts gateway is OpenAI-compatible, so the official openai Python SDK works out of the box. This guide covers client configuration, streaming, async patterns, function calling, and graceful handling of policy blocks.
Use this page when
- You are configuring the
openaiPython SDK to route through the Keeptrusts gateway. - You need streaming, async, or function-calling patterns that work with gateway policy enforcement.
- You want to handle 409 policy blocks gracefully in Python applications.
- You are adding correlation headers or custom metadata to enrich the audit trail.
Primary audience
- Primary: Python developers integrating AI into applications via the OpenAI SDK
- Secondary: AI Engineers building production AI services, Backend Developers adding LLM features
Client Configuration
Basic Setup
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)
The only change is base_url. The gateway forwards the api_key to the upstream provider.
Environment-Based Configuration
import os
from openai import OpenAI
client = OpenAI(
base_url=os.environ.get("LLM_GATEWAY_URL", "http://localhost:41002/v1"),
api_key=os.environ["OPENAI_API_KEY"],
timeout=30.0,
max_retries=2,
)
Use environment variables so the same code works across local development, staging, and production gateways.
Custom Headers
Pass metadata to the gateway for richer audit trails:
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key=os.environ["OPENAI_API_KEY"],
default_headers={
"X-Request-Source": "billing-service",
"X-Correlation-Id": correlation_id,
},
)
Streaming Responses
Streaming works identically to direct OpenAI usage. The gateway evaluates output policies on the assembled stream.
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain AI governance briefly."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print()
If a policy triggers mid-stream, the gateway terminates the stream and returns the policy error in the final chunk.
Collecting Streamed Content
collected = []
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List three benefits of policy enforcement."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
collected.append(delta.content)
full_response = "".join(collected)
Async Client
For high-throughput applications, use the async client:
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="http://localhost:41002/v1",
api_key="sk-...",
)
async def governed_completion(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
async def main():
results = await asyncio.gather(
governed_completion("What is prompt injection?"),
governed_completion("Explain data redaction."),
governed_completion("Define AI compliance."),
)
for r in results:
print(r[:80], "...")
asyncio.run(main())
Each concurrent request is independently evaluated by the gateway's policy chain.
Function Calling
Function calling passes through the gateway transparently. Policies can inspect tool names and arguments.
tools = [
{
"type": "function",
"function": {
"name": "get_account_balance",
"description": "Retrieve the balance for a customer account",
"parameters": {
"type": "object",
"properties": {
"account_id": {"type": "string", "description": "The account identifier"}
},
"required": ["account_id"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the balance for account AC-1234?"}],
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
if message.tool_calls:
for call in message.tool_calls:
print(f"Tool: {call.function.name}")
print(f"Args: {call.function.arguments}")
See Governed Function Calling for tool validation policies and budget limits.
Handling Policy Blocks (409)
When a policy blocks a request, the gateway returns HTTP 409. The SDK raises an APIStatusError:
from openai import APIStatusError
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json()
policy_name = error_body.get("error", {}).get("policy", "unknown")
print(f"Blocked by policy: {policy_name}")
return None
raise
Structured Error Envelope
{
"error": {
"type": "policy_violation",
"message": "Request blocked: prompt contains disallowed content",
"policy": "block-prompt-injection",
"code": "input_blocked"
}
}
Retry with Backoff for 429
from openai import RateLimitError
import time
def governed_call(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited, retrying in {wait}s...")
time.sleep(wait)
raise RuntimeError("Exceeded max retries for rate limit")
Multi-Provider Routing
The gateway can route to different providers based on model name:
# Routes through the gateway to OpenAI
openai_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
# Routes through the gateway to Anthropic (if configured)
anthropic_response = client.chat.completions.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}],
)
Same client, same SDK, same policies — the gateway resolves the provider from the model name in your config.
Best Practices
- Always set
base_urlvia environment variable — avoids hardcoding gateway addresses. - Handle 409 explicitly — policy blocks are expected behavior, not crashes.
- Use async for batch workloads — the gateway handles concurrent requests independently.
- Pass correlation headers — enriches the audit trail for debugging.
- Keep
api_keyin env vars — the gateway forwards it securely; never log it. - Test with observe-only policies first — validate integration before enforcing blocks.
Next steps
- TypeScript SDK Patterns — Node.js and Vercel AI SDK integration
- Streaming Patterns — deep dive into SSE and chunked transfer
- Error Handling — comprehensive error envelope reference
For AI systems
- Canonical terms: OpenAI Python SDK,
base_url,openai.OpenAI, streaming, async client (AsyncOpenAI), function calling,X-Request-Sourceheader, policy block (409). - Key config:
OpenAI(base_url="http://localhost:41002/v1", api_key=...). Gateway forwardsapi_keyto upstream provider. - Multi-provider: same client instance routes to different providers based on
modelname in gateway config. - Best next pages: TypeScript SDK Patterns, Streaming Patterns, Error Handling.
For engineers
- Only change is
base_url— the gateway is OpenAI-compatible.api_keyis forwarded to the upstream provider securely. - Use environment variables for
base_urlso the same code works across local, staging, and production gateways. - Pass
default_headerswithX-Request-SourceandX-Correlation-Idfor richer audit trails. - Handle
openai.APIStatusErrorwithstatus_code == 409to catch policy blocks — do not retry these. - Use
AsyncOpenAIfor batch workloads; the gateway handles concurrent requests independently. - Streaming works identically to direct OpenAI usage; output policies evaluate on the assembled stream.
For leaders
- Zero SDK migration cost — existing OpenAI Python code adopts governance with a single URL change.
- Gateway policy enforcement is transparent to the application; developers get compliance without refactoring.
- Correlation headers enable end-to-end traceability from application logs to gateway decision events.
- Multi-provider routing through the same client removes vendor lock-in at the application layer.