Routing Across Multiple AI Models
The Keeptrusts gateway can route requests across multiple providers and models. This guide covers provider configuration, model groups, fallback chains, cost-based routing, and A/B testing setups.
Use this page when
- You need to configure multiple LLM providers in a single gateway for multi-model routing.
- You want to set up model groups with weighted load balancing or fallback chains.
- You are implementing cost-based routing, A/B testing, or latency-based model selection.
- You need to troubleshoot routing errors (unknown provider, weight misconfiguration, missing keys).
Primary audience
- Primary: Platform Engineers configuring multi-provider gateway routing
- Secondary: AI Engineers implementing fallback strategies, Technical Leaders planning provider diversification
Multi-Provider Configuration
Define multiple providers in your policy config:
pack:
name: multi-model-routing-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: azure-openai
provider:
base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
secret_key_ref:
env: AZURE_OPENAI_API_KEY
- id: local-ollama
provider:
base_url: http://localhost:11434/v1
secret_key_ref:
env: OLLAMA_DUMMY_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Each provider has its own API key, base URL, and authentication scheme. The gateway normalizes the OpenAI-compatible interface across all providers.
Model Groups
Model groups let you map a single logical model name to multiple physical deployments:
model_groups:
- name: fast-chat
models:
- provider: openai
model: gpt-4o-mini
weight: 70
- provider: anthropic
model: claude-haiku
weight: 30
- name: premium-chat
models:
- provider: openai
model: gpt-4o
weight: 50
- provider: anthropic
model: claude-sonnet-4
weight: 50
Request using the group name instead of a specific model:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="kt_gk_...",
)
# Routes to gpt-4o-mini (70%) or claude-haiku (30%)
response = client.chat.completions.create(
model="fast-chat",
messages=[{"role": "user", "content": "Summarize this document."}],
)
Fallback Chains
Fallback chains try the next model if the primary fails or times out:
model_groups:
- name: reliable-chat
strategy: fallback
models:
- provider: openai
model: gpt-4o
timeout_ms: 5000
- provider: anthropic
model: claude-sonnet-4
timeout_ms: 10000
- provider: local-ollama
model: llama3.1
timeout_ms: 30000
Fallback Behavior
Request → gpt-4o (5s timeout)
├─ Success → return response
└─ Timeout/Error → claude-sonnet-4 (10s timeout)
├─ Success → return response
└─ Timeout/Error → llama3.1 (30s timeout)
├─ Success → return response
└─ Error → return 502 to client
The decision event records which model ultimately served the request and how many fallbacks occurred.
Testing Fallback Behavior
# Send a request and check which model served it
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer kt_gk_..." \
-d '{"model": "reliable-chat", "messages": [{"role": "user", "content": "Hello"}]}' \
| jq '.model'
Check fallback events:
kt events tail --filter "model=reliable-chat" --limit 5 --output json | \
jq '.[] | {actual_model: .resolved_model, fallbacks: .fallback_count}'
Cost-Based Routing
Route requests based on cost constraints:
model_groups:
- name: cost-optimized
strategy: cost
budget_per_request_usd: 0.01
models:
- provider: openai
model: gpt-4o-mini
cost_per_1k_input: 0.00015
cost_per_1k_output: 0.0006
- provider: openai
model: gpt-4o
cost_per_1k_input: 0.0025
cost_per_1k_output: 0.01
The gateway estimates token count from the prompt and routes to the cheapest model that fits within the per-request budget. If the cheap model cannot handle the estimated token count within budget, it falls back to the more capable model.
Monitoring Cost Routing
kt events tail --limit 20 --output json | \
jq '.[] | {model: .resolved_model, estimated_cost: .estimated_cost_usd, actual_cost: .actual_cost_usd}'
A/B Testing Models
Use weighted routing to A/B test model performance:
model_groups:
- name: chat-experiment
strategy: weighted
models:
- provider: openai
model: gpt-4o
weight: 50
tag: control
- provider: anthropic
model: claude-sonnet-4
weight: 50
tag: experiment
Analyzing A/B Results
Query events by tag to compare performance:
# Control group metrics
kt events tail --filter "tag=control" --limit 100 --output json | \
jq '{
avg_latency: ([.[].latency_ms] | add / length),
avg_tokens: ([.[].tokens.total] | add / length),
block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
}'
# Experiment group metrics
kt events tail --filter "tag=experiment" --limit 100 --output json | \
jq '{
avg_latency: ([.[].latency_ms] | add / length),
avg_tokens: ([.[].tokens.total] | add / length),
block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
}'
Session Affinity
For multi-turn conversations, ensure the same model serves all turns:
model_groups:
- name: chat-experiment
strategy: weighted
session_affinity: true # sticky by conversation ID
models:
- provider: openai
model: gpt-4o
weight: 50
- provider: anthropic
model: claude-sonnet-4
weight: 50
Provider-Specific Configuration
OpenAI
pack:
name: multi-model-routing-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Azure OpenAI
pack:
name: multi-model-routing-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: azure-openai
provider:
base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
secret_key_ref:
env: AZURE_OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Anthropic
pack:
name: multi-model-routing-providers-9
version: 1.0.0
enabled: true
providers:
targets:
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Local Ollama
pack:
name: multi-model-routing-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: local-ollama
provider:
base_url: http://localhost:11434/v1
secret_key_ref:
env: OLLAMA_DUMMY_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Validating Your Routing Config
Always validate before deploying:
kt policy lint --file policy-config.yaml
Common validation errors:
| Error | Cause | Fix |
|---|---|---|
Unknown provider in model group | Provider name doesn't match providers list | Check spelling |
Weights must sum to 100 | Weighted group percentages are wrong | Adjust weights |
Duplicate model group name | Two groups with the same name | Rename one |
Missing secret_key_ref | Provider key env var not configured | Add the env var reference |
Best Practices
| Practice | Why |
|---|---|
| Use model groups, not hardcoded models | Enables routing changes without code deploys |
| Set timeouts on every fallback model | Prevents cascading slowness |
| Start A/B tests with 50/50 splits | Statistically cleaner comparison |
| Monitor cost events daily | Catch unexpected cost spikes early |
| Keep a local Ollama fallback for dev | Works offline, no API costs |
| Validate configs before every deploy | Catches routing errors early |
Next steps
- Debugging AI Requests with Events — trace which model served each request
- Managing API Keys & Gateway Keys — scope keys to specific model groups
- Local Development Setup — configure multi-model routing locally
For AI systems
- Canonical terms: multi-model routing, providers, model groups, fallback chain, weighted routing, cost-based routing, A/B testing,
secret_key_ref,model_groups. - Config:
providerslist inpolicy-config.yamlwithname,secret_key_ref.env,base_url.model_groupsmap logical names to physical models with weights. - Gateway normalizes the OpenAI-compatible interface across all providers (OpenAI, Anthropic, Azure, Ollama, etc.).
- Best next pages: Debugging with Events, API Key Management, Local Development Setup.
For engineers
- Define multiple providers in
policy-config.yaml— each with its own API key, base URL, and auth scheme. - Use model groups to map logical names (e.g.,
fast-chat,premium-chat) to multiple physical models with weights. - Configure fallback chains so requests automatically retry on a backup provider if the primary fails.
- Set per-model timeouts in fallback chains to prevent cascading slowness.
- Validate configs before deploying —
kt policy lintcatches routing errors (unknown providers, weight sums). - Keep a local Ollama provider as a fallback for development — works offline with no API costs.
For leaders
- Multi-provider routing eliminates single-vendor dependency and provides resilience against provider outages.
- Model groups enable routing changes (A/B tests, cost optimization, failover) without code deploys.
- Cost-based routing automatically selects the cheapest model that meets quality thresholds.
- A/B testing different providers with governance applied gives production-realistic comparison data.
- Provider diversification is a risk management strategy — negotiate better pricing with multiple viable alternatives.