Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Multi-Provider Fallback

Keeptrusts supports automatic failover between multiple LLM providers. When the primary provider fails, is rate-limited, or exceeds latency thresholds, the gateway automatically routes to the next available provider.

Use this page when

  • You need to configure automatic failover between multiple LLM providers (OpenAI, Anthropic, Azure, etc.).
  • You want to route requests by latency, cost, or ordered priority.
  • You need circuit breaker configuration to prevent cascading failures across providers.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Configuration

pack:
name: multi-provider-fallback-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: primary
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: fallback-1
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: fallback-2
provider: google
model: gemini-2.0-flash
secret_key_ref:
env: GOOGLE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Fallback Strategies

StrategyBehavior
orderedTry providers in the order listed
round-robinDistribute evenly across providers
latencyRoute to the fastest responding provider
costRoute to the cheapest available provider
randomRandomly select from available providers

Ordered Fallback

providers:
fallback:
strategy: ordered
max_retries: 2
retry_on:
- 5xx
- timeout
- rate_limit
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY

Latency-Based Routing

providers:
fallback:
strategy: latency
latency_window: 5m
max_latency_ms: 5000
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY

Routes to the provider with the lowest average latency over the trailing 5-minute window.

Cost-Based Routing

providers:
fallback:
strategy: cost
max_cost_per_1k_tokens: 0.01
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY

Circuit Breaker

Prevent cascading failures with the built-in circuit breaker:

providers:
circuit_breaker:
failure_threshold: 5
recovery_timeout_secs: 30
half_open_max_requests: 3
StateBehavior
ClosedNormal operation, requests forwarded
OpenProvider marked unhealthy after failure_threshold consecutive failures. All requests routed to fallback.
Half-OpenAfter recovery_timeout_secs, allow half_open_max_requests to test recovery

Warm-Up Lane for New Providers

When using lowest_latency or highest_throughput routing strategies, newly added providers may not receive traffic because they lack sufficient performance samples. The warmup_ratio setting (default: 0.1) controls the probability that an under-sampled provider is promoted to the front of the routing list for warm-up traffic.

providers:
routing:
strategy: lowest_latency
warmup_ratio: 0.1
min_sample_count: 5
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY

Set warmup_ratio: 0 to disable warm-up traffic entirely.

Circuit Breaker Resilience

Circuit breaker state now survives configuration reloads. When you reload the gateway config, providers that were in an Open (degraded) circuit state retain that state — they won't be treated as healthy simply because the config was reloaded.

Health Monitoring

Enable active health checks for proactive failure detection:

providers:
health_check:
enabled: true
interval_secs: 30
timeout_ms: 5000
healthy_threshold: 2
unhealthy_threshold: 3

Use Cases

High Availability

pack:
name: multi-provider-fallback-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-secondary
provider: azure
model: gpt-4o
secret_key_ref:
env: AZURE_API_KEY
- id: anthropic-tertiary
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Cost Optimization

Route to the cheapest provider that meets quality thresholds:

providers:
targets:
- id: deepseek
provider: deepseek
model: deepseek-chat
secret_key_ref:
env: DEEPSEEK_API_KEY
- id: groq
provider: groq
model: llama-3.1-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
- id: openai
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policy:
quality-scorer:
thresholds:
min_aggregate: 0.7
pack:
name: multi-provider-fallback-example-9
version: 1.0.0
enabled: true
policies:
chain:
- quality-scorer

Regional Failover

pack:
name: multi-provider-fallback-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: eu-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: eu-secondary
provider: vertex
model: gemini-2.0-flash
secret_key_ref:
env: GOOGLE_APPLICATION_CREDENTIALS
- id: eu-fallback
provider: azure
model: gpt-4o
secret_key_ref:
env: AZURE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

For AI systems

  • Canonical terms: multi-provider fallback, provider routing, circuit breaker, failover strategy, warmup lane.
  • Config keys: providers.targets[], providers.fallback.strategy (ordered/round-robin/latency/cost/random), providers.circuit_breaker, providers.health_check, providers.routing.warmup_ratio.
  • Strategies: ordered, round-robin, latency, cost, random.
  • Circuit breaker states: Closed → Open → Half-Open.
  • Related pages: kt gateway run, Managed Mode, Streaming & SSE.

For engineers

  • Prerequisites: At least two provider targets defined in policy-config.yaml with valid secret_key_ref credentials.
  • Validate: Run kt policy lint --file policy-config.yaml to confirm the fallback config is structurally valid. Start the gateway and send a test request while one provider is down to confirm failover triggers.
  • Monitor: Use curl http://localhost:8080/keeptrusts/providers/metrics | jq to see per-provider latency, error counts, and circuit breaker state.
  • Troubleshooting: If all providers are circuit-broken, requests will fail with 503. Reduce failure_threshold or increase recovery_timeout_secs to allow faster recovery.

For leaders

  • Multi-provider fallback is the primary mechanism for LLM availability — SLA commitments depend on having at least one healthy fallback provider.
  • Cost-based routing can reduce AI spend by 30–70% by preferring cheaper providers when quality thresholds are met.
  • Circuit breaker prevents cascading cost from retrying a failing provider — failed requests are routed away immediately.
  • Regional fallback configurations help meet data residency requirements (e.g., EU-only providers as primary, with global fallback for availability).

Next steps