Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Routing Across Multiple AI Models

The Keeptrusts gateway can route requests across multiple providers and models. This guide covers provider configuration, model groups, fallback chains, cost-based routing, and A/B testing setups.

Use this page when

  • You need to configure multiple LLM providers in a single gateway for multi-model routing.
  • You want to set up model groups with weighted load balancing or fallback chains.
  • You are implementing cost-based routing, A/B testing, or latency-based model selection.
  • You need to troubleshoot routing errors (unknown provider, weight misconfiguration, missing keys).

Primary audience

  • Primary: Platform Engineers configuring multi-provider gateway routing
  • Secondary: AI Engineers implementing fallback strategies, Technical Leaders planning provider diversification

Multi-Provider Configuration

Define multiple providers in your policy config:

pack:
name: multi-model-routing-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: azure-openai
provider:
base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
secret_key_ref:
env: AZURE_OPENAI_API_KEY
- id: local-ollama
provider:
base_url: http://localhost:11434/v1
secret_key_ref:
env: OLLAMA_DUMMY_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Each provider has its own API key, base URL, and authentication scheme. The gateway normalizes the OpenAI-compatible interface across all providers.

Model Groups

Model groups let you map a single logical model name to multiple physical deployments:

model_groups:
- name: fast-chat
models:
- provider: openai
model: gpt-4o-mini
weight: 70
- provider: anthropic
model: claude-haiku
weight: 30

- name: premium-chat
models:
- provider: openai
model: gpt-4o
weight: 50
- provider: anthropic
model: claude-sonnet-4
weight: 50

Request using the group name instead of a specific model:

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="kt_gk_...",
)

# Routes to gpt-4o-mini (70%) or claude-haiku (30%)
response = client.chat.completions.create(
model="fast-chat",
messages=[{"role": "user", "content": "Summarize this document."}],
)

Fallback Chains

Fallback chains try the next model if the primary fails or times out:

model_groups:
- name: reliable-chat
strategy: fallback
models:
- provider: openai
model: gpt-4o
timeout_ms: 5000
- provider: anthropic
model: claude-sonnet-4
timeout_ms: 10000
- provider: local-ollama
model: llama3.1
timeout_ms: 30000

Fallback Behavior

Request → gpt-4o (5s timeout)
├─ Success → return response
└─ Timeout/Error → claude-sonnet-4 (10s timeout)
├─ Success → return response
└─ Timeout/Error → llama3.1 (30s timeout)
├─ Success → return response
└─ Error → return 502 to client

The decision event records which model ultimately served the request and how many fallbacks occurred.

Testing Fallback Behavior

# Send a request and check which model served it
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer kt_gk_..." \
-d '{"model": "reliable-chat", "messages": [{"role": "user", "content": "Hello"}]}' \
| jq '.model'

Check fallback events:

kt events tail --filter "model=reliable-chat" --limit 5 --output json | \
jq '.[] | {actual_model: .resolved_model, fallbacks: .fallback_count}'

Cost-Based Routing

Route requests based on cost constraints:

model_groups:
- name: cost-optimized
strategy: cost
budget_per_request_usd: 0.01
models:
- provider: openai
model: gpt-4o-mini
cost_per_1k_input: 0.00015
cost_per_1k_output: 0.0006
- provider: openai
model: gpt-4o
cost_per_1k_input: 0.0025
cost_per_1k_output: 0.01

The gateway estimates token count from the prompt and routes to the cheapest model that fits within the per-request budget. If the cheap model cannot handle the estimated token count within budget, it falls back to the more capable model.

Monitoring Cost Routing

kt events tail --limit 20 --output json | \
jq '.[] | {model: .resolved_model, estimated_cost: .estimated_cost_usd, actual_cost: .actual_cost_usd}'

A/B Testing Models

Use weighted routing to A/B test model performance:

model_groups:
- name: chat-experiment
strategy: weighted
models:
- provider: openai
model: gpt-4o
weight: 50
tag: control
- provider: anthropic
model: claude-sonnet-4
weight: 50
tag: experiment

Analyzing A/B Results

Query events by tag to compare performance:

# Control group metrics
kt events tail --filter "tag=control" --limit 100 --output json | \
jq '{
avg_latency: ([.[].latency_ms] | add / length),
avg_tokens: ([.[].tokens.total] | add / length),
block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
}'

# Experiment group metrics
kt events tail --filter "tag=experiment" --limit 100 --output json | \
jq '{
avg_latency: ([.[].latency_ms] | add / length),
avg_tokens: ([.[].tokens.total] | add / length),
block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
}'

Session Affinity

For multi-turn conversations, ensure the same model serves all turns:

model_groups:
- name: chat-experiment
strategy: weighted
session_affinity: true # sticky by conversation ID
models:
- provider: openai
model: gpt-4o
weight: 50
- provider: anthropic
model: claude-sonnet-4
weight: 50

Provider-Specific Configuration

OpenAI

pack:
name: multi-model-routing-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Azure OpenAI

pack:
name: multi-model-routing-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: azure-openai
provider:
base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
secret_key_ref:
env: AZURE_OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Anthropic

pack:
name: multi-model-routing-providers-9
version: 1.0.0
enabled: true
providers:
targets:
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Local Ollama

pack:
name: multi-model-routing-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: local-ollama
provider:
base_url: http://localhost:11434/v1
secret_key_ref:
env: OLLAMA_DUMMY_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Validating Your Routing Config

Always validate before deploying:

kt policy lint --file policy-config.yaml

Common validation errors:

ErrorCauseFix
Unknown provider in model groupProvider name doesn't match providers listCheck spelling
Weights must sum to 100Weighted group percentages are wrongAdjust weights
Duplicate model group nameTwo groups with the same nameRename one
Missing secret_key_refProvider key env var not configuredAdd the env var reference

Best Practices

PracticeWhy
Use model groups, not hardcoded modelsEnables routing changes without code deploys
Set timeouts on every fallback modelPrevents cascading slowness
Start A/B tests with 50/50 splitsStatistically cleaner comparison
Monitor cost events dailyCatch unexpected cost spikes early
Keep a local Ollama fallback for devWorks offline, no API costs
Validate configs before every deployCatches routing errors early

Next steps

For AI systems

  • Canonical terms: multi-model routing, providers, model groups, fallback chain, weighted routing, cost-based routing, A/B testing, secret_key_ref, model_groups.
  • Config: providers list in policy-config.yaml with name, secret_key_ref.env, base_url. model_groups map logical names to physical models with weights.
  • Gateway normalizes the OpenAI-compatible interface across all providers (OpenAI, Anthropic, Azure, Ollama, etc.).
  • Best next pages: Debugging with Events, API Key Management, Local Development Setup.

For engineers

  • Define multiple providers in policy-config.yaml — each with its own API key, base URL, and auth scheme.
  • Use model groups to map logical names (e.g., fast-chat, premium-chat) to multiple physical models with weights.
  • Configure fallback chains so requests automatically retry on a backup provider if the primary fails.
  • Set per-model timeouts in fallback chains to prevent cascading slowness.
  • Validate configs before deploying — kt policy lint catches routing errors (unknown providers, weight sums).
  • Keep a local Ollama provider as a fallback for development — works offline with no API costs.

For leaders

  • Multi-provider routing eliminates single-vendor dependency and provides resilience against provider outages.
  • Model groups enable routing changes (A/B tests, cost optimization, failover) without code deploys.
  • Cost-based routing automatically selects the cheapest model that meets quality thresholds.
  • A/B testing different providers with governance applied gives production-realistic comparison data.
  • Provider diversification is a risk management strategy — negotiate better pricing with multiple viable alternatives.