Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Resilience Engineering for AI Services

LLM providers experience rate limits, outages, and latency spikes. This guide covers resilience patterns you can implement through the Keeptrusts gateway and at the application layer to keep your AI-powered services available.

Use this page when

  • You are configuring multi-provider failover chains in the gateway
  • You need retry strategies with exponential backoff and jitter for LLM requests
  • You are implementing circuit breakers to isolate provider failures
  • You want graceful degradation patterns when all providers are unavailable

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Provider Failover

Multi-Provider Configuration

Configure multiple providers with failover priority:

pack:
name: resilience-patterns-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-openai-fallback
provider:
base_url: https://myorg.openai.azure.com/openai/deployments
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: anthropic-fallback
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Failover Flow

Model Mapping for Failover

When failing over between providers, map equivalent models:

model_mapping:
gpt-4o:
openai-primary: gpt-4o
azure-openai-fallback: gpt-4o
anthropic-fallback: claude-sonnet-4-20250514

gpt-4o-mini:
openai-primary: gpt-4o-mini
azure-openai-fallback: gpt-4o-mini
anthropic-fallback: claude-haiku-4-20250414

Retry Strategies

Exponential Backoff with Jitter

Never retry with fixed intervals — use exponential backoff with jitter to prevent thundering herd:

gateway:
retry:
max_attempts: 3
initial_delay: 500ms
max_delay: 10s
backoff_multiplier: 2.0
jitter: true
retryable_codes: [429, 500, 502, 503]

Retry Timing Visualization

Application-Level Retry

For fine-grained control, implement retries in your application:

import httpx
import random
import asyncio

async def call_with_retry(
messages: list,
max_attempts: int = 3,
base_delay: float = 0.5,
):
for attempt in range(max_attempts):
try:
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:41002/v1/chat/completions",
json={"model": "gpt-4o", "messages": messages},
timeout=120.0,
)
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as e:
if e.response.status_code not in (429, 500, 502, 503):
raise # Non-retryable error
if attempt == max_attempts - 1:
raise # Final attempt failed

delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.5)
await asyncio.sleep(delay + jitter)

Circuit Breaker Patterns

Three-State Circuit Breaker

Per-Provider Circuit Breakers

Each provider gets an independent circuit breaker:

pack:
name: resilience-patterns-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Bulkheading

Isolate provider connections so a misbehaving provider cannot exhaust resources for others:

pack:
name: resilience-patterns-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: local-llm
provider:
base_url: http://localhost:8080/v1
secret_key_ref:
env: LOCAL_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Graceful Degradation

When the Gateway Is Down

Design your application to handle gateway unavailability:

async function getCompletion(messages: Message[]): Promise<string> {
try {
// Primary path: through the governed gateway
const response = await fetch('http://kt-gateway:41002/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: 'gpt-4o', messages }),
signal: AbortSignal.timeout(5000),
});
return (await response.json()).choices[0].message.content;
} catch {
// Degraded path: return a safe fallback
console.error('[AI] Gateway unreachable, returning fallback');
return 'I am temporarily unable to process your request. Please try again shortly.';
}
}

Degradation Tiers

TierConditionBehavior
FullGateway + provider healthyNormal AI responses with full governance
ReducedPrimary provider downFailover to secondary provider, governance intact
MinimalGateway overloadedShed non-critical requests, prioritize critical paths
OfflineGateway unreachableStatic fallback responses, queue requests for replay

Request Priority and Shedding

gateway:
load_shedding:
# Start shedding when concurrent requests exceed this
max_concurrent: 500
# Priority header for request classification
priority_header: X-Request-Priority
# Shed low-priority first
shed_order: [low, medium, high, critical]
# Critical request — last to be shed
curl -H "X-Request-Priority: critical" \
http://kt-gateway:41002/v1/chat/completions \
-d '{"model":"gpt-4o","messages":[...]}'

Health Monitoring

Gateway Health Check

# Check gateway health
kt health

# Detailed status including provider circuit breaker states
kt health --verbose

Provider Health Dashboard

Monitor provider health via the console dashboard:

# Tail real-time events to see failures
kt events tail --filter "status=error"

# Check event counts by provider
kt events stats --group-by provider --last 1h

Resilience Testing

Chaos Engineering with the Gateway

Test resilience by simulating failures:

# Simulate provider timeout
kt gateway run --test-mode \
--inject-fault openai:timeout:5s

# Simulate rate limiting
kt gateway run --test-mode \
--inject-fault openai:rate-limit:80%

# Simulate intermittent errors
kt gateway run --test-mode \
--inject-fault anthropic:error:503:30%

Next steps

For AI systems

  • Canonical terms: provider failover, providers[].priority, model_mapping, gateway.retry, exponential backoff with jitter, circuit breaker, bulkhead, retryable_codes: [429, 500, 502, 503], graceful degradation
  • Key configuration: providers[].priority (1 = primary, 2 = fallback), gateway.retry.max_attempts: 3, gateway.retry.initial_delay: 500ms, gateway.retry.backoff_multiplier: 2.0
  • Best next pages: Performance Engineering, Capacity Planning, Architecture Patterns

For engineers

  • Configure failover: assign priority: 1 (primary), priority: 2 (secondary), priority: 3 (tertiary) across providers
  • Model mapping: map gpt-4o to equivalent models across providers (e.g., claude-sonnet-4-20250514 for Anthropic)
  • Retry only idempotent failures: [429, 500, 502, 503] — never retry 409 policy blocks or 4xx client errors
  • Circuit breaker: open after N consecutive failures, half-open after cooldown, close on successful probe
  • Bulkhead: isolate traffic per consumer group so one team’s failure doesn’t cascade to others

For leaders

  • Multi-provider failover eliminates single-vendor dependency — provider outages become transparent to applications
  • Retry and circuit breaker patterns reduce user-visible errors without manual intervention during provider degradation
  • Cost implication: failover traffic may route to more expensive providers — monitor cost trends during and after incidents