OpenAI Python SDK with Keeptrusts Gateway

The Keeptrusts gateway is OpenAI-compatible, so the official openai Python SDK works out of the box. This guide covers client configuration, streaming, async patterns, function calling, and graceful handling of policy blocks.

Use this page when

You are configuring the openai Python SDK to route through the Keeptrusts gateway.
You need streaming, async, or function-calling patterns that work with gateway policy enforcement.
You want to handle 409 policy blocks gracefully in Python applications.
You are adding correlation headers or custom metadata to enrich the audit trail.

Primary audience

Primary: Python developers integrating AI into applications via the OpenAI SDK
Secondary: AI Engineers building production AI services, Backend Developers adding LLM features

Client Configuration

Basic Setup

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="sk-...",
)

The only change is base_url. The gateway forwards the api_key to the upstream provider.

Environment-Based Configuration

import os
from openai import OpenAI

client = OpenAI(
    base_url=os.environ.get("LLM_GATEWAY_URL", "http://localhost:41002/v1"),
    api_key=os.environ["OPENAI_API_KEY"],
    timeout=30.0,
    max_retries=2,
)

Use environment variables so the same code works across local development, staging, and production gateways.

Custom Headers

Pass metadata to the gateway for richer audit trails:

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key=os.environ["OPENAI_API_KEY"],
    default_headers={
        "X-Request-Source": "billing-service",
        "X-Correlation-Id": correlation_id,
    },
)

Streaming Responses

Streaming works identically to direct OpenAI usage. The gateway evaluates output policies on the assembled stream.

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain AI governance briefly."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
print()

If a policy triggers mid-stream, the gateway terminates the stream and returns the policy error in the final chunk.

Collecting Streamed Content

collected = []
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List three benefits of policy enforcement."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        collected.append(delta.content)

full_response = "".join(collected)

Async Client

For high-throughput applications, use the async client:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="http://localhost:41002/v1",
    api_key="sk-...",
)

async def governed_completion(prompt: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

async def main():
    results = await asyncio.gather(
        governed_completion("What is prompt injection?"),
        governed_completion("Explain data redaction."),
        governed_completion("Define AI compliance."),
    )
    for r in results:
        print(r[:80], "...")

asyncio.run(main())

Each concurrent request is independently evaluated by the gateway's policy chain.

Function Calling

Function calling passes through the gateway transparently. Policies can inspect tool names and arguments.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_account_balance",
            "description": "Retrieve the balance for a customer account",
            "parameters": {
                "type": "object",
                "properties": {
                    "account_id": {"type": "string", "description": "The account identifier"}
                },
                "required": ["account_id"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the balance for account AC-1234?"}],
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message
if message.tool_calls:
    for call in message.tool_calls:
        print(f"Tool: {call.function.name}")
        print(f"Args: {call.function.arguments}")

See Governed Function Calling for tool validation policies and budget limits.

Handling Policy Blocks (409)

When a policy blocks a request, the gateway returns HTTP 409. The SDK raises an APIStatusError:

from openai import APIStatusError

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content
except APIStatusError as e:
    if e.status_code == 409:
        error_body = e.response.json()
        policy_name = error_body.get("error", {}).get("policy", "unknown")
        print(f"Blocked by policy: {policy_name}")
        return None
    raise

Structured Error Envelope

{
  "error": {
    "type": "policy_violation",
    "message": "Request blocked: prompt contains disallowed content",
    "policy": "block-prompt-injection",
    "code": "input_blocked"
  }
}

Retry with Backoff for 429

from openai import RateLimitError
import time

def governed_call(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited, retrying in {wait}s...")
            time.sleep(wait)
    raise RuntimeError("Exceeded max retries for rate limit")

Multi-Provider Routing

The gateway can route to different providers based on model name:

# Routes through the gateway to OpenAI
openai_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

# Routes through the gateway to Anthropic (if configured)
anthropic_response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}],
)

Same client, same SDK, same policies — the gateway resolves the provider from the model name in your config.

Best Practices

Always set base_url via environment variable — avoids hardcoding gateway addresses.
Handle 409 explicitly — policy blocks are expected behavior, not crashes.
Use async for batch workloads — the gateway handles concurrent requests independently.
Pass correlation headers — enriches the audit trail for debugging.
Keep api_key in env vars — the gateway forwards it securely; never log it.
Test with observe-only policies first — validate integration before enforcing blocks.

Next steps

TypeScript SDK Patterns — Node.js and Vercel AI SDK integration
Streaming Patterns — deep dive into SSE and chunked transfer
Error Handling — comprehensive error envelope reference

For AI systems

Canonical terms: OpenAI Python SDK, base_url, openai.OpenAI, streaming, async client (AsyncOpenAI), function calling, X-Request-Source header, policy block (409).
Key config: OpenAI(base_url="http://localhost:41002/v1", api_key=...). Gateway forwards api_key to upstream provider.
Multi-provider: same client instance routes to different providers based on model name in gateway config.
Best next pages: TypeScript SDK Patterns, Streaming Patterns, Error Handling.

For engineers

Only change is base_url — the gateway is OpenAI-compatible. api_key is forwarded to the upstream provider securely.
Use environment variables for base_url so the same code works across local, staging, and production gateways.
Pass default_headers with X-Request-Source and X-Correlation-Id for richer audit trails.
Handle openai.APIStatusError with status_code == 409 to catch policy blocks — do not retry these.
Use AsyncOpenAI for batch workloads; the gateway handles concurrent requests independently.
Streaming works identically to direct OpenAI usage; output policies evaluate on the assembled stream.

For leaders

Zero SDK migration cost — existing OpenAI Python code adopts governance with a single URL change.
Gateway policy enforcement is transparent to the application; developers get compliance without refactoring.
Correlation headers enable end-to-end traceability from application logs to gateway decision events.
Multi-provider routing through the same client removes vendor lock-in at the application layer.

Use this page when​

Primary audience​

Client Configuration​

Basic Setup​

Environment-Based Configuration​

Custom Headers​

Streaming Responses​

Collecting Streamed Content​

Async Client​

Function Calling​

Handling Policy Blocks (409)​

Structured Error Envelope​

Retry with Backoff for 429​

Multi-Provider Routing​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​