OpenAI SDK Drop-In Compatibility
The Keeptrusts unified gateway exposes an OpenAI-compatible
POST /v1/chat/completions endpoint. Any standard OpenAI SDK can connect by
changing two values: base_url and api_key. No wrapper library or custom
client is required.
Use this page when
- You want to route existing OpenAI SDK calls through Keeptrusts for governance, cost control, or multi-provider routing.
- You are migrating from direct OpenAI access to a governed gateway.
- You need to verify which features work with which providers through the unified endpoint.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Quick setup
Every OpenAI SDK accepts a base URL override and an API key. Point both at your Keeptrusts gateway:
| Parameter | Value |
|---|---|
base_url | https://<your-gateway>/v1 or http://localhost:8080/v1 for local |
api_key | Your Keeptrusts API key (Access Key or Gateway Key) |
The gateway transparently resolves the requested model, enforces policy, and routes to the configured upstream provider.
Python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="kt_your_keeptrusts_api_key",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello from Keeptrusts!"}],
)
print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain governance in one paragraph."}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content or ""
print(content, end="", flush=True)
Function calling
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in London?"}],
tools=tools,
)
Node.js / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "kt_your_keeptrusts_api_key",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello from Keeptrusts!" }],
});
console.log(response.choices[0].message.content);
Streaming
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Explain governance." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
Go
package main
import (
"context"
"fmt"
"github.com/openai/openai-go"
"github.com/openai/openai-go/option"
)
func main() {
client := openai.NewClient(
option.WithBaseURL("http://localhost:8080/v1"),
option.WithAPIKey("kt_your_keeptrusts_api_key"),
)
resp, err := client.Chat.Completions.New(context.Background(),
openai.ChatCompletionNewParams{
Model: "gpt-4o",
Messages: []openai.ChatCompletionMessageParamUnion{
openai.UserMessage("Hello from Keeptrusts!"),
},
},
)
if err != nil {
panic(err)
}
fmt.Println(resp.Choices[0].Message.Content)
}
Rust
use reqwest::Client;
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new();
let resp = client
.post("http://localhost:8080/v1/chat/completions")
.header("Authorization", "Bearer kt_your_keeptrusts_api_key")
.json(&json!({
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello from Keeptrusts!"}]
}))
.send()
.await?;
let body: serde_json::Value = resp.json().await?;
println!("{}", body["choices"][0]["message"]["content"]);
Ok(())
}
Supported features matrix
| Feature | Status | Notes |
|---|---|---|
| Chat completions | Supported | All providers via unified endpoint |
| Streaming (SSE) | Supported | Normalized to OpenAI chunk format |
| Function / tool calls | Supported | Requires model support (tool_calls capability) |
| JSON mode | Supported | response_format: { type: "json_object" } |
| JSON schema mode | Supported | response_format: { type: "json_schema", ... } |
| Vision (image input) | Supported | Provider must support image input modality |
| Logprobs | Partial | Passed through when provider supports it |
| Seed / determinism | Partial | Passed through; not all providers honor it |
| Embeddings | Not yet | Planned for a future phase |
| Audio (TTS / STT) | Not yet | Planned for a future phase |
| Fine-tuning API | Not yet | Use provider directly for fine-tuning |
| Assistants / Threads | Not yet | Use provider directly |
| Batch API | Not yet | Use provider directly |
Provider compatibility matrix
The unified endpoint normalizes provider-specific formats transparently. Each provider has specific capabilities and limitations:
| Provider | Chat | Streaming | Tools | JSON mode | Vision | Notes |
|---|---|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | Yes | Yes | Full compatibility |
| Anthropic | Yes | Yes | Yes | Yes | Yes | max_completion_tokens mapped to max_tokens |
| Google Gemini | Yes | Yes | Yes | Yes | Yes | n mapped to candidate_count |
| AWS Bedrock | Yes | Yes | Yes | Partial | Model-dependent | Uses SigV4 auth internally |
| Azure OpenAI | Yes | Yes | Yes | Yes | Yes | Same as OpenAI format |
| Together AI | Yes | Yes | Yes | Yes | No | OpenAI-compatible format |
| Groq | Yes | Yes | Yes | Yes | No | OpenAI-compatible format |
| Mistral | Yes | Yes | Yes | Yes | No | OpenAI-compatible format |
| Cohere | Yes | Yes | Yes | Partial | No | Response format translated |
| Fireworks AI | Yes | Yes | Yes | Yes | No | OpenAI-compatible format |
Multi-provider routing
The unified endpoint supports transparent provider routing. Specify the model and optionally pin a provider with a prefix:
# Auto-route to the cheapest provider for this model
client.chat.completions.create(model="gpt-4o", ...)
# Pin to a specific provider
client.chat.completions.create(model="openai/gpt-4o", ...)
client.chat.completions.create(model="anthropic/claude-4-sonnet", ...)
Model aliasing
Family aliases resolve to the latest stable version:
| Alias | Resolves to |
|---|---|
claude-sonnet | claude-4-sonnet |
claude-opus | claude-4-opus |
gemini-pro | gemini-2.5-pro |
gemini-flash | gemini-2.5-flash |
Deprecated models are automatically routed to their successor. The response
includes Deprecation: true and X-Model-Deprecated-Successor headers.
Response metadata
Every response includes a x_keeptrusts object with gateway metadata:
{
"x_keeptrusts": {
"routing_decision_id": "kt_decision_req-abc123",
"provider_used": "openai",
"cache_status": null,
"region": "us-east-1"
}
}
Rate-limit headers
The gateway injects rate-limit headers on every response:
| Header | Description |
|---|---|
X-RateLimit-Limit-Requests | Maximum requests per minute |
X-RateLimit-Remaining-Requests | Remaining requests in current window |
X-RateLimit-Limit-Tokens | Maximum tokens per minute |
X-RateLimit-Remaining-Tokens | Remaining tokens in current window |
X-RateLimit-Reset | Seconds until the window resets |
X-RateLimit-Scope | org or key |
When budget utilization exceeds 80%, the response also includes
X-Budget-Remaining-USD and X-Budget-Limit-USD headers.
Error format
Errors follow the OpenAI error shape regardless of upstream provider:
{
"error": {
"message": "Model 'nonexistent' not found in catalog",
"type": "invalid_request_error",
"param": "model",
"code": null
}
}
What's next
- Multi-provider setup — configure credentials for multiple providers
- Error handling — handle gateway errors
- Streaming patterns — advanced streaming usage
- Cost optimization — leverage routing policies to reduce spend