Multi-Provider Fallback
Keeptrusts supports automatic failover between multiple LLM providers. When the primary provider fails, is rate-limited, or exceeds latency thresholds, the gateway automatically routes to the next available provider.
Use this page when
- You need to configure automatic failover between multiple LLM providers (OpenAI, Anthropic, Azure, etc.).
- You want to route requests by latency, cost, or ordered priority.
- You need circuit breaker configuration to prevent cascading failures across providers.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Configuration
pack:
name: multi-provider-fallback-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: primary
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: fallback-1
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: fallback-2
provider: google
model: gemini-2.0-flash
secret_key_ref:
env: GOOGLE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Fallback Strategies
| Strategy | Behavior |
|---|---|
ordered | Try providers in the order listed |
round-robin | Distribute evenly across providers |
latency | Route to the fastest responding provider |
cost | Route to the cheapest available provider |
random | Randomly select from available providers |
Ordered Fallback
providers:
fallback:
strategy: ordered
max_retries: 2
retry_on:
- 5xx
- timeout
- rate_limit
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
Latency-Based Routing
providers:
fallback:
strategy: latency
latency_window: 5m
max_latency_ms: 5000
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
Routes to the provider with the lowest average latency over the trailing 5-minute window.
Cost-Based Routing
providers:
fallback:
strategy: cost
max_cost_per_1k_tokens: 0.01
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
Circuit Breaker
Prevent cascading failures with the built-in circuit breaker:
providers:
circuit_breaker:
failure_threshold: 5
recovery_timeout_secs: 30
half_open_max_requests: 3
| State | Behavior |
|---|---|
| Closed | Normal operation, requests forwarded |
| Open | Provider marked unhealthy after failure_threshold consecutive failures. All requests routed to fallback. |
| Half-Open | After recovery_timeout_secs, allow half_open_max_requests to test recovery |
Warm-Up Lane for New Providers
When using lowest_latency or highest_throughput routing strategies, newly added providers may not receive traffic because they lack sufficient performance samples. The warmup_ratio setting (default: 0.1) controls the probability that an under-sampled provider is promoted to the front of the routing list for warm-up traffic.
providers:
routing:
strategy: lowest_latency
warmup_ratio: 0.1
min_sample_count: 5
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
Set warmup_ratio: 0 to disable warm-up traffic entirely.
Circuit Breaker Resilience
Circuit breaker state now survives configuration reloads. When you reload the gateway config, providers that were in an Open (degraded) circuit state retain that state — they won't be treated as healthy simply because the config was reloaded.
Health Monitoring
Enable active health checks for proactive failure detection:
providers:
health_check:
enabled: true
interval_secs: 30
timeout_ms: 5000
healthy_threshold: 2
unhealthy_threshold: 3
Use Cases
High Availability
pack:
name: multi-provider-fallback-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-secondary
provider: azure
model: gpt-4o
secret_key_ref:
env: AZURE_API_KEY
- id: anthropic-tertiary
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Cost Optimization
Route to the cheapest provider that meets quality thresholds:
providers:
targets:
- id: deepseek
provider: deepseek
model: deepseek-chat
secret_key_ref:
env: DEEPSEEK_API_KEY
- id: groq
provider: groq
model: llama-3.1-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
- id: openai
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policy:
quality-scorer:
thresholds:
min_aggregate: 0.7
pack:
name: multi-provider-fallback-example-9
version: 1.0.0
enabled: true
policies:
chain:
- quality-scorer
Regional Failover
pack:
name: multi-provider-fallback-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: eu-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: eu-secondary
provider: vertex
model: gemini-2.0-flash
secret_key_ref:
env: GOOGLE_APPLICATION_CREDENTIALS
- id: eu-fallback
provider: azure
model: gpt-4o
secret_key_ref:
env: AZURE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
For AI systems
- Canonical terms: multi-provider fallback, provider routing, circuit breaker, failover strategy, warmup lane.
- Config keys:
providers.targets[],providers.fallback.strategy(ordered/round-robin/latency/cost/random),providers.circuit_breaker,providers.health_check,providers.routing.warmup_ratio. - Strategies:
ordered,round-robin,latency,cost,random. - Circuit breaker states: Closed → Open → Half-Open.
- Related pages: kt gateway run, Managed Mode, Streaming & SSE.
For engineers
- Prerequisites: At least two provider targets defined in
policy-config.yamlwith validsecret_key_refcredentials. - Validate: Run
kt policy lint --file policy-config.yamlto confirm the fallback config is structurally valid. Start the gateway and send a test request while one provider is down to confirm failover triggers. - Monitor: Use
curl http://localhost:8080/keeptrusts/providers/metrics | jqto see per-provider latency, error counts, and circuit breaker state. - Troubleshooting: If all providers are circuit-broken, requests will fail with 503. Reduce
failure_thresholdor increaserecovery_timeout_secsto allow faster recovery.
For leaders
- Multi-provider fallback is the primary mechanism for LLM availability — SLA commitments depend on having at least one healthy fallback provider.
- Cost-based routing can reduce AI spend by 30–70% by preferring cheaper providers when quality thresholds are met.
- Circuit breaker prevents cascading cost from retrying a failing provider — failed requests are routed away immediately.
- Regional fallback configurations help meet data residency requirements (e.g., EU-only providers as primary, with global fallback for availability).
Next steps
- kt gateway run — Start the gateway with multi-provider configs
- Streaming & SSE — How streaming works with fallback
- Managed Mode — Deploy fallback configs via managed polling
- CLI overview