Multi-Provider Fallback

Keeptrusts supports automatic failover between multiple LLM providers. When the primary provider fails, is rate-limited, or exceeds latency thresholds, the gateway automatically routes to the next available provider.

Use this page when

You need to configure automatic failover between multiple LLM providers (OpenAI, Anthropic, Azure, etc.).
You want to route requests by latency, cost, or ordered priority.
You need circuit breaker configuration to prevent cascading failures across providers.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Configuration

pack:
  name: multi-provider-fallback-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: primary
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: fallback-1
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  - id: fallback-2
    provider: google
    model: gemini-2.0-flash
    secret_key_ref:
      env: GOOGLE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Fallback Strategies

Strategy	Behavior
`ordered`	Try providers in the order listed
`round-robin`	Distribute evenly across providers
`latency`	Route to the fastest responding provider
`cost`	Route to the cheapest available provider
`random`	Randomly select from available providers

Ordered Fallback

providers:
  fallback:
    strategy: ordered
    max_retries: 2
    retry_on:
    - 5xx
    - timeout
    - rate_limit
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Latency-Based Routing

providers:
  fallback:
    strategy: latency
    latency_window: 5m
    max_latency_ms: 5000
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Routes to the provider with the lowest average latency over the trailing 5-minute window.

Cost-Based Routing

providers:
  fallback:
    strategy: cost
    max_cost_per_1k_tokens: 0.01
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Circuit Breaker

Prevent cascading failures with the built-in circuit breaker:

providers:
  circuit_breaker:
    failure_threshold: 5
    recovery_timeout_secs: 30
    half_open_max_requests: 3

State	Behavior
Closed	Normal operation, requests forwarded
Open	Provider marked unhealthy after `failure_threshold` consecutive failures. All requests routed to fallback.
Half-Open	After `recovery_timeout_secs`, allow `half_open_max_requests` to test recovery

Warm-Up Lane for New Providers

When using lowest_latency or highest_throughput routing strategies, newly added providers may not receive traffic because they lack sufficient performance samples. The warmup_ratio setting (default: 0.1) controls the probability that an under-sampled provider is promoted to the front of the routing list for warm-up traffic.

providers:
  routing:
    strategy: lowest_latency
    warmup_ratio: 0.1
    min_sample_count: 5
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Set warmup_ratio: 0 to disable warm-up traffic entirely.

Circuit Breaker Resilience

Circuit breaker state now survives configuration reloads. When you reload the gateway config, providers that were in an Open (degraded) circuit state retain that state — they won't be treated as healthy simply because the config was reloaded.

Health Monitoring

Enable active health checks for proactive failure detection:

providers:
  health_check:
    enabled: true
    interval_secs: 30
    timeout_ms: 5000
    healthy_threshold: 2
    unhealthy_threshold: 3

Use Cases

High Availability

pack:
  name: multi-provider-fallback-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-secondary
    provider: azure
    model: gpt-4o
    secret_key_ref:
      env: AZURE_API_KEY
  - id: anthropic-tertiary
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Cost Optimization

Route to the cheapest provider that meets quality thresholds:

providers:
  targets:
  - id: deepseek
    provider: deepseek
    model: deepseek-chat
    secret_key_ref:
      env: DEEPSEEK_API_KEY
  - id: groq
    provider: groq
    model: llama-3.1-70b-versatile
    secret_key_ref:
      env: GROQ_API_KEY
  - id: openai
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policy:
  quality-scorer:
    thresholds:
      min_aggregate: 0.7
pack:
  name: multi-provider-fallback-example-9
  version: 1.0.0
  enabled: true
policies:
  chain:
  - quality-scorer

Regional Failover

pack:
  name: multi-provider-fallback-providers-10
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: eu-primary
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: eu-secondary
    provider: vertex
    model: gemini-2.0-flash
    secret_key_ref:
      env: GOOGLE_APPLICATION_CREDENTIALS
  - id: eu-fallback
    provider: azure
    model: gpt-4o
    secret_key_ref:
      env: AZURE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

For AI systems

Canonical terms: multi-provider fallback, provider routing, circuit breaker, failover strategy, warmup lane.
Config keys: providers.targets[], providers.fallback.strategy (ordered/round-robin/latency/cost/random), providers.circuit_breaker, providers.health_check, providers.routing.warmup_ratio.
Strategies: ordered, round-robin, latency, cost, random.
Circuit breaker states: Closed → Open → Half-Open.
Related pages: kt gateway run, Managed Mode, Streaming & SSE.

For engineers

Prerequisites: At least two provider targets defined in policy-config.yaml with valid secret_key_ref credentials.
Validate: Run kt policy lint --file policy-config.yaml to confirm the fallback config is structurally valid. Start the gateway and send a test request while one provider is down to confirm failover triggers.
Monitor: Use curl http://localhost:8080/keeptrusts/providers/metrics | jq to see per-provider latency, error counts, and circuit breaker state.
Troubleshooting: If all providers are circuit-broken, requests will fail with 503. Reduce failure_threshold or increase recovery_timeout_secs to allow faster recovery.

For leaders

Multi-provider fallback is the primary mechanism for LLM availability — SLA commitments depend on having at least one healthy fallback provider.
Cost-based routing can reduce AI spend by 30–70% by preferring cheaper providers when quality thresholds are met.
Circuit breaker prevents cascading cost from retrying a failing provider — failed requests are routed away immediately.
Regional fallback configurations help meet data residency requirements (e.g., EU-only providers as primary, with global fallback for availability).

Next steps

kt gateway run — Start the gateway with multi-provider configs
Streaming & SSE — How streaming works with fallback
Managed Mode — Deploy fallback configs via managed polling
CLI overview

Use this page when​

Primary audience​

Configuration​

Fallback Strategies​

Ordered Fallback​

Latency-Based Routing​

Cost-Based Routing​

Circuit Breaker​

Warm-Up Lane for New Providers​

Circuit Breaker Resilience​

Health Monitoring​

Use Cases​

High Availability​

Cost Optimization​

Regional Failover​

For AI systems​

For engineers​

For leaders​

Next steps​