Routing Across Multiple AI Models

The Keeptrusts gateway can route requests across multiple providers and models. This guide covers provider configuration, model groups, fallback chains, cost-based routing, and A/B testing setups.

Use this page when

You need to configure multiple LLM providers in a single gateway for multi-model routing.
You want to set up model groups with weighted load balancing or fallback chains.
You are implementing cost-based routing, A/B testing, or latency-based model selection.
You need to troubleshoot routing errors (unknown provider, weight misconfiguration, missing keys).

Primary audience

Primary: Platform Engineers configuring multi-provider gateway routing
Secondary: AI Engineers implementing fallback strategies, Technical Leaders planning provider diversification

Multi-Provider Configuration

Define multiple providers in your policy config:

pack:
  name: multi-model-routing-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  - id: azure-openai
    provider: 
    base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
    secret_key_ref:
      env: AZURE_OPENAI_API_KEY
  - id: local-ollama
    provider: 
    base_url: http://localhost:11434/v1
    secret_key_ref:
      env: OLLAMA_DUMMY_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Each provider has its own API key, base URL, and authentication scheme. The gateway normalizes the OpenAI-compatible interface across all providers.

Model Groups

Model groups let you map a single logical model name to multiple physical deployments:

model_groups:
  - name: fast-chat
    models:
      - provider: openai
        model: gpt-4o-mini
        weight: 70
      - provider: anthropic
        model: claude-haiku
        weight: 30

  - name: premium-chat
    models:
      - provider: openai
        model: gpt-4o
        weight: 50
      - provider: anthropic
        model: claude-sonnet-4
        weight: 50

Request using the group name instead of a specific model:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="kt_gk_...",
)

# Routes to gpt-4o-mini (70%) or claude-haiku (30%)
response = client.chat.completions.create(
    model="fast-chat",
    messages=[{"role": "user", "content": "Summarize this document."}],
)

Fallback Chains

Fallback chains try the next model if the primary fails or times out:

model_groups:
  - name: reliable-chat
    strategy: fallback
    models:
      - provider: openai
        model: gpt-4o
        timeout_ms: 5000
      - provider: anthropic
        model: claude-sonnet-4
        timeout_ms: 10000
      - provider: local-ollama
        model: llama3.1
        timeout_ms: 30000

Fallback Behavior

Request → gpt-4o (5s timeout)
  ├─ Success → return response
  └─ Timeout/Error → claude-sonnet-4 (10s timeout)
       ├─ Success → return response
       └─ Timeout/Error → llama3.1 (30s timeout)
            ├─ Success → return response
            └─ Error → return 502 to client

The decision event records which model ultimately served the request and how many fallbacks occurred.

Testing Fallback Behavior

# Send a request and check which model served it
curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kt_gk_..." \
  -d '{"model": "reliable-chat", "messages": [{"role": "user", "content": "Hello"}]}' \
  | jq '.model'

Check fallback events:

kt events tail --filter "model=reliable-chat" --limit 5 --output json | \
  jq '.[] | {actual_model: .resolved_model, fallbacks: .fallback_count}'

Cost-Based Routing

Route requests based on cost constraints:

model_groups:
  - name: cost-optimized
    strategy: cost
    budget_per_request_usd: 0.01
    models:
      - provider: openai
        model: gpt-4o-mini
        cost_per_1k_input: 0.00015
        cost_per_1k_output: 0.0006
      - provider: openai
        model: gpt-4o
        cost_per_1k_input: 0.0025
        cost_per_1k_output: 0.01

The gateway estimates token count from the prompt and routes to the cheapest model that fits within the per-request budget. If the cheap model cannot handle the estimated token count within budget, it falls back to the more capable model.

Monitoring Cost Routing

kt events tail --limit 20 --output json | \
  jq '.[] | {model: .resolved_model, estimated_cost: .estimated_cost_usd, actual_cost: .actual_cost_usd}'

A/B Testing Models

Use weighted routing to A/B test model performance:

model_groups:
  - name: chat-experiment
    strategy: weighted
    models:
      - provider: openai
        model: gpt-4o
        weight: 50
        tag: control
      - provider: anthropic
        model: claude-sonnet-4
        weight: 50
        tag: experiment

Analyzing A/B Results

Query events by tag to compare performance:

# Control group metrics
kt events tail --filter "tag=control" --limit 100 --output json | \
  jq '{
    avg_latency: ([.[].latency_ms] | add / length),
    avg_tokens: ([.[].tokens.total] | add / length),
    block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
  }'

# Experiment group metrics
kt events tail --filter "tag=experiment" --limit 100 --output json | \
  jq '{
    avg_latency: ([.[].latency_ms] | add / length),
    avg_tokens: ([.[].tokens.total] | add / length),
    block_rate: ([.[].decision] | map(select(. == "blocked")) | length / length * 100)
  }'

Session Affinity

For multi-turn conversations, ensure the same model serves all turns:

model_groups:
  - name: chat-experiment
    strategy: weighted
    session_affinity: true    # sticky by conversation ID
    models:
      - provider: openai
        model: gpt-4o
        weight: 50
      - provider: anthropic
        model: claude-sonnet-4
        weight: 50

Provider-Specific Configuration

OpenAI

pack:
  name: multi-model-routing-providers-7
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Azure OpenAI

pack:
  name: multi-model-routing-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: azure-openai
    provider: 
    base_url: https://my-instance.openai.azure.com/openai/deployments/gpt-4o/
    secret_key_ref:
      env: AZURE_OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Anthropic

pack:
  name: multi-model-routing-providers-9
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: anthropic
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Local Ollama

pack:
  name: multi-model-routing-providers-10
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: local-ollama
    provider: 
    base_url: http://localhost:11434/v1
    secret_key_ref:
      env: OLLAMA_DUMMY_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Validating Your Routing Config

Always validate before deploying:

kt policy lint --file policy-config.yaml

Common validation errors:

Error	Cause	Fix
`Unknown provider in model group`	Provider name doesn't match `providers` list	Check spelling
`Weights must sum to 100`	Weighted group percentages are wrong	Adjust weights
`Duplicate model group name`	Two groups with the same name	Rename one
`Missing secret_key_ref`	Provider key env var not configured	Add the env var reference

Best Practices

Practice	Why
Use model groups, not hardcoded models	Enables routing changes without code deploys
Set timeouts on every fallback model	Prevents cascading slowness
Start A/B tests with 50/50 splits	Statistically cleaner comparison
Monitor cost events daily	Catch unexpected cost spikes early
Keep a local Ollama fallback for dev	Works offline, no API costs
Validate configs before every deploy	Catches routing errors early

Next steps

Debugging AI Requests with Events — trace which model served each request
Managing API Keys & Gateway Keys — scope keys to specific model groups
Local Development Setup — configure multi-model routing locally

For AI systems

Canonical terms: multi-model routing, providers, model groups, fallback chain, weighted routing, cost-based routing, A/B testing, secret_key_ref, model_groups.
Config: providers list in policy-config.yaml with name, secret_key_ref.env, base_url. model_groups map logical names to physical models with weights.
Gateway normalizes the OpenAI-compatible interface across all providers (OpenAI, Anthropic, Azure, Ollama, etc.).
Best next pages: Debugging with Events, API Key Management, Local Development Setup.

For engineers

Define multiple providers in policy-config.yaml — each with its own API key, base URL, and auth scheme.
Use model groups to map logical names (e.g., fast-chat, premium-chat) to multiple physical models with weights.
Configure fallback chains so requests automatically retry on a backup provider if the primary fails.
Set per-model timeouts in fallback chains to prevent cascading slowness.
Validate configs before deploying — kt policy lint catches routing errors (unknown providers, weight sums).
Keep a local Ollama provider as a fallback for development — works offline with no API costs.

For leaders

Multi-provider routing eliminates single-vendor dependency and provides resilience against provider outages.
Model groups enable routing changes (A/B tests, cost optimization, failover) without code deploys.
Cost-based routing automatically selects the cheapest model that meets quality thresholds.
A/B testing different providers with governance applied gives production-realistic comparison data.
Provider diversification is a risk management strategy — negotiate better pricing with multiple viable alternatives.

Use this page when​

Primary audience​

Multi-Provider Configuration​

Model Groups​

Fallback Chains​

Fallback Behavior​

Testing Fallback Behavior​

Cost-Based Routing​

Monitoring Cost Routing​

A/B Testing Models​

Analyzing A/B Results​

Session Affinity​

Provider-Specific Configuration​

OpenAI​

Azure OpenAI​

Anthropic​

Local Ollama​

Validating Your Routing Config​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​