Achieve 99.9% Uptime with Multi-Provider Routing

A single LLM provider is a single point of failure. Rate limits, outages, and degraded performance can take your AI features offline without warning. Keeptrusts eliminates this risk by routing across multiple providers with automatic failover, circuit breakers, and real-time health monitoring.

Use this page when

You need to eliminate single-provider failure risk for production AI workloads.
You are configuring failover, circuit breakers, or latency-based routing across multiple LLM providers.
You want to understand how the gateway detects provider degradation and automatically reroutes traffic.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

What you'll achieve

Automatic failover across multiple providers with zero application changes
Circuit breaker protection that removes unhealthy providers from rotation before they cause cascading failures
Sub-second failover with pre-warmed connections and configurable retry logic
Latency-based routing that sends requests to the fastest available provider
Real-time provider health metrics visible in the console

Basic multi-provider failover

The simplest resilience configuration defines multiple provider targets in priority order:

pack:
  name: multi-provider-resilience-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: primary-openai
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: fallback-azure
    provider: azure-openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com
    secret_key_ref:
      env: AZURE_OPENAI_KEY
  - id: fallback-anthropic
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

The gateway tries primary-openai first. If it fails (rate limit, timeout, 5xx error), the request automatically routes to fallback-azure, then fallback-anthropic.

Failover triggers:

HTTP 429 (rate limit)
HTTP 5xx (server error)
Request timeout
Content filter blocks
Zero-token completions (empty responses)

Circuit breakers

Circuit breakers prevent a failing provider from receiving traffic until it recovers. Without them, every request to a down provider wastes time on a doomed attempt.

provider_routing:
  strategy: ordered
  circuit_breaker:
    error_threshold: 5
    window_seconds: 60
    recovery_timeout_seconds: 30
    half_open_max_requests: 3

How it works:

State	Behavior
Closed (normal)	Requests flow normally. Errors are counted.
Open (tripped)	After `error_threshold` failures in `window_seconds`, all requests skip this provider.
Half-open (probing)	After `recovery_timeout_seconds`, allow `half_open_max_requests` to test recovery.
Closed (recovered)	If probe requests succeed, the provider is back in rotation.

Retry configuration

Configure retries for transient failures that don't warrant a full provider switch:

provider_routing:
  strategy: ordered
  retry:
    max_retries: 2
    backoff_ms: 200
    retry_on:
      - timeout
      - rate_limit
      - server_error

Retries happen within the same provider before failing over to the next one. This handles brief rate-limit bursts without switching providers unnecessarily.

Latency-based routing

For user-facing applications where response time matters, route to the fastest provider:

pack:
  name: multi-provider-resilience-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-gpt4o
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-gpt4o
    provider: azure-openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com
    secret_key_ref:
      env: AZURE_OPENAI_KEY
  - id: groq-llama
    provider: groq
    model: llama-3.3-70b-versatile
    secret_key_ref:
      env: GROQ_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Keeptrusts tracks per-provider latency and automatically routes to the fastest option. Rankings refresh after min_sample_count successful responses within window_seconds.

Model groups for workload isolation

Separate routing strategies for different workload types:

providers:
  model_groups:
    - name: user-facing
      models:
        - provider: openai
          model: gpt-4o
        - provider: azure-openai
          model: gpt-4o
      routing: lowest_latency

    - name: batch-processing
      models:
        - provider: anthropic
          model: claude-sonnet-4-20250514
        - provider: openai
          model: gpt-4o-mini
      routing: cost_optimized

    - name: embeddings
      models:
        - provider: openai
          model: text-embedding-3-large
        - provider: voyage
          model: voyage-3
      routing: round_robin

Applications target the group name instead of a specific provider. Each group has its own routing strategy and failover chain.

Context compression for cross-provider failover

When failing over to a provider with lower context limits, Keeptrusts can automatically compress conversation history:

provider_routing:
  strategy: ordered
  context_compression:
    enabled: true
    strategy: summarize_oldest
    target_ratio: 0.6

This ensures long conversations don't fail when falling over from a 128K-context provider to a 32K-context provider.

Monitoring provider health

Gateway metrics endpoint

curl http://localhost:8080/keeptrusts/providers/metrics | jq .

Returns per-provider metrics including:

Request count and success rate
Average and p95 latency
Current circuit breaker state
Error counts by category

Console visibility

The Gateways page in the console shows:

Active provider targets and their health status
Current routing strategy
Recent failover events
Error rate trends

Production-ready configuration

A full resilience configuration combining all features:

pack:
  name: resilient-gateway
  version: '1.0'
provider_routing:
  strategy: lowest_latency
  lowest_latency:
    window_seconds: 300
    min_sample_count: 10
  circuit_breaker:
    error_threshold: 5
    window_seconds: 60
    recovery_timeout_seconds: 30
    half_open_max_requests: 3
  retry:
    max_retries: 2
    backoff_ms: 200
    retry_on:
    - timeout
    - rate_limit
    - server_error
  context_compression:
    enabled: true
    strategy: summarize_oldest
    target_ratio: 0.6
providers:
  targets:
  - id: openai-gpt4o
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-gpt4o
    provider: azure-openai
    model: gpt-4o
    base_url: https://my-resource.openai.azure.com
    secret_key_ref:
      env: AZURE_OPENAI_KEY
  - id: anthropic-sonnet
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    retention_days: 90

Quick wins

Add a second provider target — instant failover protection with strategy: ordered
Enable circuit breakers — prevent cascading failures from a single provider outage
Monitor /keeptrusts/providers/metrics — know your providers' health before users notice
Use lowest_latency routing for user-facing workloads — automatic performance optimization
Enable context compression — prevent failover failures from context-length mismatches

For AI systems

Canonical terms: provider_routing, ordered strategy, circuit_breaker, failover, latency-based routing, health monitoring, retry.
Config keys: provider_routing.strategy, provider_routing.circuit_breaker, providers.targets[].id, providers.targets[].provider.
Failover triggers: HTTP 429, HTTP 5xx, timeout, content filter block, zero-token completion.
Circuit breaker states: closed → open → half-open → closed.
Best next pages: Reduce AI Spend, Provider Routing, Gateways & Actions.

For engineers

Prerequisites: API keys for at least two providers (e.g., OpenAI + Anthropic or Azure OpenAI).
Define multiple providers.targets in priority order with provider_routing.strategy: ordered.
Add circuit_breaker config to remove failing providers from rotation automatically.
Validate: kill or rate-limit the primary provider and confirm requests route to the fallback within seconds.
Monitor: check the console for circuit-breaker state transitions and provider health metrics.

For leaders

Single-provider dependency means a provider outage takes your AI features offline; multi-provider routing eliminates this.
Circuit breakers prevent cascading failures: a degraded provider is automatically removed, not retried.
Sub-second failover is invisible to end users — no error states, no manual intervention required.
Multi-provider resilience is a prerequisite for SLA commitments on AI-powered features.

Next steps

Provider Routing — all 11 routing strategies explained
Circuit Breaker and Retry — advanced resilience configuration
Model Groups — workload isolation patterns
Reduce AI Spend — combine resilience with cost optimization
Centralize AI Observability — monitor health across all providers

Use this page when​

Primary audience​

What you'll achieve​

Basic multi-provider failover​

Circuit breakers​

Retry configuration​

Latency-based routing​

Model groups for workload isolation​

Context compression for cross-provider failover​

Monitoring provider health​

Gateway metrics endpoint​

Console visibility​

Production-ready configuration​

Quick wins​

For AI systems​

For engineers​

For leaders​

Next steps​