Achieve 99.9% Uptime with Multi-Provider Routing
A single LLM provider is a single point of failure. Rate limits, outages, and degraded performance can take your AI features offline without warning. Keeptrusts eliminates this risk by routing across multiple providers with automatic failover, circuit breakers, and real-time health monitoring.
Use this page when
- You need to eliminate single-provider failure risk for production AI workloads.
- You are configuring failover, circuit breakers, or latency-based routing across multiple LLM providers.
- You want to understand how the gateway detects provider degradation and automatically reroutes traffic.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What you'll achieve
- Automatic failover across multiple providers with zero application changes
- Circuit breaker protection that removes unhealthy providers from rotation before they cause cascading failures
- Sub-second failover with pre-warmed connections and configurable retry logic
- Latency-based routing that sends requests to the fastest available provider
- Real-time provider health metrics visible in the console
Basic multi-provider failover
The simplest resilience configuration defines multiple provider targets in priority order:
pack:
name: multi-provider-resilience-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: primary-openai
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: fallback-azure
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: fallback-anthropic
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
The gateway tries primary-openai first. If it fails (rate limit, timeout, 5xx error), the request automatically routes to fallback-azure, then fallback-anthropic.
Failover triggers:
- HTTP 429 (rate limit)
- HTTP 5xx (server error)
- Request timeout
- Content filter blocks
- Zero-token completions (empty responses)
Circuit breakers
Circuit breakers prevent a failing provider from receiving traffic until it recovers. Without them, every request to a down provider wastes time on a doomed attempt.
provider_routing:
strategy: ordered
circuit_breaker:
error_threshold: 5
window_seconds: 60
recovery_timeout_seconds: 30
half_open_max_requests: 3
How it works:
| State | Behavior |
|---|---|
| Closed (normal) | Requests flow normally. Errors are counted. |
| Open (tripped) | After error_threshold failures in window_seconds, all requests skip this provider. |
| Half-open (probing) | After recovery_timeout_seconds, allow half_open_max_requests to test recovery. |
| Closed (recovered) | If probe requests succeed, the provider is back in rotation. |
Retry configuration
Configure retries for transient failures that don't warrant a full provider switch:
provider_routing:
strategy: ordered
retry:
max_retries: 2
backoff_ms: 200
retry_on:
- timeout
- rate_limit
- server_error
Retries happen within the same provider before failing over to the next one. This handles brief rate-limit bursts without switching providers unnecessarily.
Latency-based routing
For user-facing applications where response time matters, route to the fastest provider:
pack:
name: multi-provider-resilience-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-gpt4o
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-gpt4o
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: groq-llama
provider: groq
model: llama-3.3-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Keeptrusts tracks per-provider latency and automatically routes to the fastest option. Rankings refresh after min_sample_count successful responses within window_seconds.
Model groups for workload isolation
Separate routing strategies for different workload types:
providers:
model_groups:
- name: user-facing
models:
- provider: openai
model: gpt-4o
- provider: azure-openai
model: gpt-4o
routing: lowest_latency
- name: batch-processing
models:
- provider: anthropic
model: claude-sonnet-4-20250514
- provider: openai
model: gpt-4o-mini
routing: cost_optimized
- name: embeddings
models:
- provider: openai
model: text-embedding-3-large
- provider: voyage
model: voyage-3
routing: round_robin
Applications target the group name instead of a specific provider. Each group has its own routing strategy and failover chain.
Context compression for cross-provider failover
When failing over to a provider with lower context limits, Keeptrusts can automatically compress conversation history:
provider_routing:
strategy: ordered
context_compression:
enabled: true
strategy: summarize_oldest
target_ratio: 0.6
This ensures long conversations don't fail when falling over from a 128K-context provider to a 32K-context provider.
Monitoring provider health
Gateway metrics endpoint
curl http://localhost:8080/keeptrusts/providers/metrics | jq .
Returns per-provider metrics including:
- Request count and success rate
- Average and p95 latency
- Current circuit breaker state
- Error counts by category
Console visibility
The Gateways page in the console shows:
- Active provider targets and their health status
- Current routing strategy
- Recent failover events
- Error rate trends
Production-ready configuration
A full resilience configuration combining all features:
pack:
name: resilient-gateway
version: '1.0'
provider_routing:
strategy: lowest_latency
lowest_latency:
window_seconds: 300
min_sample_count: 10
circuit_breaker:
error_threshold: 5
window_seconds: 60
recovery_timeout_seconds: 30
half_open_max_requests: 3
retry:
max_retries: 2
backoff_ms: 200
retry_on:
- timeout
- rate_limit
- server_error
context_compression:
enabled: true
strategy: summarize_oldest
target_ratio: 0.6
providers:
targets:
- id: openai-gpt4o
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-gpt4o
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: anthropic-sonnet
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
retention_days: 90
Quick wins
- Add a second provider target — instant failover protection with
strategy: ordered - Enable circuit breakers — prevent cascading failures from a single provider outage
- Monitor
/keeptrusts/providers/metrics— know your providers' health before users notice - Use
lowest_latencyrouting for user-facing workloads — automatic performance optimization - Enable context compression — prevent failover failures from context-length mismatches
For AI systems
- Canonical terms: provider_routing, ordered strategy, circuit_breaker, failover, latency-based routing, health monitoring, retry.
- Config keys:
provider_routing.strategy,provider_routing.circuit_breaker,providers.targets[].id,providers.targets[].provider. - Failover triggers: HTTP 429, HTTP 5xx, timeout, content filter block, zero-token completion.
- Circuit breaker states: closed → open → half-open → closed.
- Best next pages: Reduce AI Spend, Provider Routing, Gateways & Actions.
For engineers
- Prerequisites: API keys for at least two providers (e.g., OpenAI + Anthropic or Azure OpenAI).
- Define multiple
providers.targetsin priority order withprovider_routing.strategy: ordered. - Add
circuit_breakerconfig to remove failing providers from rotation automatically. - Validate: kill or rate-limit the primary provider and confirm requests route to the fallback within seconds.
- Monitor: check the console for circuit-breaker state transitions and provider health metrics.
For leaders
- Single-provider dependency means a provider outage takes your AI features offline; multi-provider routing eliminates this.
- Circuit breakers prevent cascading failures: a degraded provider is automatically removed, not retried.
- Sub-second failover is invisible to end users — no error states, no manual intervention required.
- Multi-provider resilience is a prerequisite for SLA commitments on AI-powered features.
Next steps
- Provider Routing — all 11 routing strategies explained
- Circuit Breaker and Retry — advanced resilience configuration
- Model Groups — workload isolation patterns
- Reduce AI Spend — combine resilience with cost optimization
- Centralize AI Observability — monitor health across all providers