Resilience Engineering for AI Services

LLM providers experience rate limits, outages, and latency spikes. This guide covers resilience patterns you can implement through the Keeptrusts gateway and at the application layer to keep your AI-powered services available.

Use this page when

You are configuring multi-provider failover chains in the gateway
You need retry strategies with exponential backoff and jitter for LLM requests
You are implementing circuit breakers to isolate provider failures
You want graceful degradation patterns when all providers are unavailable

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Provider Failover

Multi-Provider Configuration

Configure multiple providers with failover priority:

pack:
  name: resilience-patterns-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-openai-fallback
    provider: 
    base_url: https://myorg.openai.azure.com/openai/deployments
    secret_key_ref:
      env: AZURE_OPENAI_KEY
  - id: anthropic-fallback
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Failover Flow

Model Mapping for Failover

When failing over between providers, map equivalent models:

model_mapping:
  gpt-4o:
    openai-primary: gpt-4o
    azure-openai-fallback: gpt-4o
    anthropic-fallback: claude-sonnet-4-20250514

  gpt-4o-mini:
    openai-primary: gpt-4o-mini
    azure-openai-fallback: gpt-4o-mini
    anthropic-fallback: claude-haiku-4-20250414

Retry Strategies

Exponential Backoff with Jitter

Never retry with fixed intervals — use exponential backoff with jitter to prevent thundering herd:

gateway:
  retry:
    max_attempts: 3
    initial_delay: 500ms
    max_delay: 10s
    backoff_multiplier: 2.0
    jitter: true
    retryable_codes: [429, 500, 502, 503]

Retry Timing Visualization

Application-Level Retry

For fine-grained control, implement retries in your application:

import httpx
import random
import asyncio

async def call_with_retry(
    messages: list,
    max_attempts: int = 3,
    base_delay: float = 0.5,
):
    for attempt in range(max_attempts):
        try:
            async with httpx.AsyncClient() as client:
                response = await client.post(
                    "http://localhost:41002/v1/chat/completions",
                    json={"model": "gpt-4o", "messages": messages},
                    timeout=120.0,
                )
                response.raise_for_status()
                return response.json()
        except httpx.HTTPStatusError as e:
            if e.response.status_code not in (429, 500, 502, 503):
                raise  # Non-retryable error
            if attempt == max_attempts - 1:
                raise  # Final attempt failed

            delay = base_delay * (2 ** attempt)
            jitter = random.uniform(0, delay * 0.5)
            await asyncio.sleep(delay + jitter)

Circuit Breaker Patterns

Three-State Circuit Breaker

Per-Provider Circuit Breakers

Each provider gets an independent circuit breaker:

pack:
  name: resilience-patterns-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Bulkheading

Isolate provider connections so a misbehaving provider cannot exhaust resources for others:

pack:
  name: resilience-patterns-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  - id: local-llm
    provider: 
    base_url: http://localhost:8080/v1
    secret_key_ref:
      env: LOCAL_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Graceful Degradation

When the Gateway Is Down

Design your application to handle gateway unavailability:

async function getCompletion(messages: Message[]): Promise<string> {
  try {
    // Primary path: through the governed gateway
    const response = await fetch('http://kt-gateway:41002/v1/chat/completions', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model: 'gpt-4o', messages }),
      signal: AbortSignal.timeout(5000),
    });
    return (await response.json()).choices[0].message.content;
  } catch {
    // Degraded path: return a safe fallback
    console.error('[AI] Gateway unreachable, returning fallback');
    return 'I am temporarily unable to process your request. Please try again shortly.';
  }
}

Degradation Tiers

Tier	Condition	Behavior
Full	Gateway + provider healthy	Normal AI responses with full governance
Reduced	Primary provider down	Failover to secondary provider, governance intact
Minimal	Gateway overloaded	Shed non-critical requests, prioritize critical paths
Offline	Gateway unreachable	Static fallback responses, queue requests for replay

Request Priority and Shedding

gateway:
  load_shedding:
    # Start shedding when concurrent requests exceed this
    max_concurrent: 500
    # Priority header for request classification
    priority_header: X-Request-Priority
    # Shed low-priority first
    shed_order: [low, medium, high, critical]

# Critical request — last to be shed
curl -H "X-Request-Priority: critical" \
  http://kt-gateway:41002/v1/chat/completions \
  -d '{"model":"gpt-4o","messages":[...]}'

Health Monitoring

Gateway Health Check

# Check gateway health
kt health

# Detailed status including provider circuit breaker states
kt health --verbose

Provider Health Dashboard

Monitor provider health via the console dashboard:

# Tail real-time events to see failures
kt events tail --filter "status=error"

# Check event counts by provider
kt events stats --group-by provider --last 1h

Resilience Testing

Chaos Engineering with the Gateway

Test resilience by simulating failures:

# Simulate provider timeout
kt gateway run --test-mode \
  --inject-fault openai:timeout:5s

# Simulate rate limiting
kt gateway run --test-mode \
  --inject-fault openai:rate-limit:80%

# Simulate intermittent errors
kt gateway run --test-mode \
  --inject-fault anthropic:error:503:30%

Next steps

Performance Engineering the AI Gateway — optimize throughput
Observability for AI-Governed Systems — monitor resilience metrics
Capacity Planning for AI Workloads — size for failure scenarios

For AI systems

Canonical terms: provider failover, providers[].priority, model_mapping, gateway.retry, exponential backoff with jitter, circuit breaker, bulkhead, retryable_codes: [429, 500, 502, 503], graceful degradation
Key configuration: providers[].priority (1 = primary, 2 = fallback), gateway.retry.max_attempts: 3, gateway.retry.initial_delay: 500ms, gateway.retry.backoff_multiplier: 2.0
Best next pages: Performance Engineering, Capacity Planning, Architecture Patterns

For engineers

Configure failover: assign priority: 1 (primary), priority: 2 (secondary), priority: 3 (tertiary) across providers
Model mapping: map gpt-4o to equivalent models across providers (e.g., claude-sonnet-4-20250514 for Anthropic)
Retry only idempotent failures: [429, 500, 502, 503] — never retry 409 policy blocks or 4xx client errors
Circuit breaker: open after N consecutive failures, half-open after cooldown, close on successful probe
Bulkhead: isolate traffic per consumer group so one team’s failure doesn’t cascade to others

For leaders

Multi-provider failover eliminates single-vendor dependency — provider outages become transparent to applications
Retry and circuit breaker patterns reduce user-visible errors without manual intervention during provider degradation
Cost implication: failover traffic may route to more expensive providers — monitor cost trends during and after incidents

Use this page when​

Primary audience​

Provider Failover​

Multi-Provider Configuration​

Failover Flow​

Model Mapping for Failover​

Retry Strategies​

Exponential Backoff with Jitter​

Retry Timing Visualization​

Application-Level Retry​

Circuit Breaker Patterns​

Three-State Circuit Breaker​

Per-Provider Circuit Breakers​

Bulkheading​

Graceful Degradation​

When the Gateway Is Down​

Degradation Tiers​

Request Priority and Shedding​

Health Monitoring​

Gateway Health Check​

Provider Health Dashboard​

Resilience Testing​

Chaos Engineering with the Gateway​

Next steps​

For AI systems​

For engineers​

For leaders​