Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Circuit Breakers & Retry

Keeptrusts includes built-in circuit breakers and retry policies that protect your application against upstream LLM provider failures. Together they form a two-layer resilience system: retries absorb transient errors at the individual request level, while circuit breakers protect the system from spending time and tokens against a provider that is persistently degraded.

Use this page when

  • You need the exact command, config, API, or integration details for Circuit Breakers & Retry.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Circuit Breaker

A circuit breaker wraps each provider target and tracks its recent failure history. When failures exceed a threshold, the circuit "opens" and the gateway immediately routes to the next available provider without waiting for a timeout. After a cooldown period the circuit enters the "half-open" state and probes the provider with a limited number of real requests; if they succeed, the circuit closes again.

States

┌─────────────┐
success │ │ consecutive failures ≥ threshold
┌────────────│ CLOSED │──────────────────────────────────────┐
│ │ │ ▼
│ └─────────────┘ ┌──────────────────┐
│ │ OPEN │
│ ┌──────────────────┐ │ (reject fast) │
└──────────│ HALF-OPEN │◄─────────────────────────┤ │
all probes │ (limited probes)│ cooldown_seconds └──────────────────┘
succeed └──────────────────┘

│ any probe fails
└──────────────────────────────► OPEN
StateBehaviour
ClosedNormal operation. Failures are counted.
OpenAll requests to this provider are immediately rejected without making an upstream call.
Half-OpenA limited number of probe requests are forwarded. Success → Closed; failure → Open again.

Configuration fields

FieldTypeDefaultDescription
enabledboolfalseEnable circuit breaker for this target or globally.
consecutive_failure_thresholdinteger5Number of consecutive failures before the circuit opens.
cooldown_secondsinteger60Seconds to wait in the Open state before entering Half-Open.
half_open_successesinteger2Number of consecutive successes required in Half-Open to close the circuit.

Per-target configuration

pack:
name: circuit-breaker-retry-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-backup
provider: azure:chat:gpt-4o
secret_key_ref:
env: AZURE_OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Global circuit breaker defaults

Set global defaults that apply to all targets that do not declare their own circuit_breaker block:

pack:
name: circuit-breaker-retry-providers-2
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: groq-fast
provider: groq:chat:llama-3.3-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Retry Policy

The retry policy controls how many times the gateway attempts a request before declaring failure, which error conditions trigger retries, and how long to wait between attempts.

Configuration fields

FieldTypeDefaultDescription
max_retriesinteger2Total retry attempts across all triggers.
per_triggermap{}Override max_retries for specific error types.
backoff.strategystringexponentialBackoff timing: fixed, linear, or exponential.
backoff.base_msinteger200Starting delay in milliseconds.
backoff.delay_msinteger500Increment for linear, or base for fixed.
backoff.max_msinteger10000Maximum delay cap regardless of strategy.
jitterbooltrueAdd ±20% random jitter to backoff delays to avoid thundering herds.

Error triggers

TriggerCondition
rate_limitProvider returns HTTP 429.
timeoutNo response within the configured request timeout.
service_unavailableProvider returns HTTP 503 or HTTP 502.
context_window_exceededProvider returns a context-length error (HTTP 400 with a context-window error code).
server_errorAny 5xx response not matched by a more specific trigger.
empty_responseProvider returns HTTP 200 but with zero content in the completion.

Full retry configuration example

pack:
name: circuit-breaker-retry-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Backoff strategies

Every retry waits the same delay_ms regardless of attempt number.

retry:
max_retries: 3
backoff:
strategy: fixed
delay_ms: 1000 # always wait 1 second

Delays: 1000ms → 1000ms → 1000ms


Combining Circuit Breaker + Retry with Fallbacks

The full resilience system layers retry, circuit breaker, and group fallback into a single decision pipeline:

Request


Retry attempt 1 → upstream call
│ fails (timeout)

Retry attempt 2 → upstream call
│ fails (5xx)

Retry attempt 3 → upstream call
│ fails (5xx)
│ consecutive_failure_threshold reached → circuit opens

Route to fallback provider (circuit breaker short-circuits this target)


Response to client

Complete example

pack:
name: resilient-chat
version: 1.0.0
provider_routing:
strategy: ordered
fallback_enabled: true
circuit_breaker_defaults:
enabled: true
consecutive_failure_threshold: 4
cooldown_seconds: 60
half_open_successes: 2
model_groups:
- name: primary-chat
fallback_group: backup-chat
targets:
- id: openai-primary
weight: 1
- name: backup-chat
targets:
- id: anthropic-backup
weight: 1
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-backup
provider: anthropic:chat:claude-3-5-sonnet-20241022
secret_key_ref:
env: ANTHROPIC_API_KEY

What happens when OpenAI degrades:

  1. Request arrives and is forwarded to openai-primary.
  2. The upstream returns 503. The retry policy forwards to openai-primary up to 3 more times (for service_unavailable), each time with exponential backoff.
  3. After 4 consecutive failures, the circuit breaker opens. Subsequent requests to openai-primary are immediately rejected without any upstream calls.
  4. The fallback group backup-chat is activated. Requests are now forwarded to anthropic-backup.
  5. After 60 seconds, the openai-primary circuit enters Half-Open. Two consecutive probe requests succeed, and the circuit closes. Traffic shifts back to openai-primary.

Zero Completion Insurance

Zero Completion Insurance (ZCI) is an additional retry layer that activates when a provider returns a technically successful HTTP 200 response but with no usable completion content. This happens when providers stream an empty choices[0].message.content, return a stop reason of length with zero tokens, or produce a low-quality output that fails a configured assertion.

Configuration fields

FieldTypeDescription
enabledboolEnable ZCI for this target or globally.
conditionslistOne or more conditions that trigger ZCI.
actionstringWhat to do when a condition fires: retry_same, retry_fallback, return_error.
retry_with_fallbackboolIf true, retry on the next available provider rather than the same one.
max_zci_retriesintegerMaximum ZCI-specific retry attempts (default: 2).

Conditions

ConditionDescription
empty_responseThe response body contains no completion tokens.
low_quality_scoreA configured quality scorer rates the response below threshold.
failed_assertionA post-processing policy assertion is not satisfied.
stop_reason_lengthThe model stopped generating due to token limit (truncated output).

ZCI configuration example

pack:
name: circuit-breaker-retry-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-backup
provider: anthropic:chat:claude-3-5-sonnet-20241022
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

When openai-primary returns an empty response, ZCI fires with action: retry_fallback, and the gateway immediately retries the request on anthropic-backup rather than returning the empty response to the client.


Observability

Every circuit breaker state change and every retry attempt emits a structured event in the Keeptrusts event stream, giving full visibility into resilience behaviour in production.

Circuit breaker events

EventFieldsDescription
circuit_breaker.openedtarget_id, failure_count, thresholdCircuit transitioned from Closed → Open.
circuit_breaker.half_openedtarget_id, cooldown_elapsed_msCooldown expired; circuit entered Half-Open probe mode.
circuit_breaker.closedtarget_id, probe_successesAll probes succeeded; circuit closed and normal routing resumed.
circuit_breaker.rejectedtarget_idA request was rejected because the circuit is currently Open.

Retry events

EventFieldsDescription
retry.attempttarget_id, attempt_number, trigger, backoff_msA retry was scheduled.
retry.exhaustedtarget_id, total_attempts, last_triggerAll retry attempts were consumed; request will fail or be routed to fallback.
zci.triggeredtarget_id, condition, actionZero Completion Insurance activated on a successful-status but empty/low-quality response.

Example: alert on circuit opening

Use the Keeptrusts console event rule engine to create an alert when any circuit opens in production:

alert_rules:
- name: circuit-breaker-opened
event_type: circuit_breaker.opened
severity: high
channels:
- pagerduty
- slack-ops
message: "Circuit breaker opened for provider {{ target_id }} after {{ failure_count }} failures."

Best Practices

  1. Set context_window_exceeded retries to 0. Retrying a context-length error on the same provider always fails — the model cannot process a prompt that exceeds its window. Either truncate the prompt or route to a provider with a larger context window.

  2. Keep consecutive_failure_threshold low for user-facing paths (3–5) and higher for batch paths (8–10). Low thresholds protect real-time UX from slow provider degradation; higher thresholds tolerate normal variance in batch workloads.

  3. Always set max_ms on exponential backoff. Without a cap, exponential backoff can produce delays of tens of seconds on attempt 6+, turning a transient error into an apparent hang.

  4. Enable jitter: true in multi-instance deployments. Without jitter, all gateway instances back off to the same retry schedule and retry simultaneously, creating thundering herd traffic spikes against a recovering upstream.

  5. Use per-trigger rate_limit retries generously. Rate limit responses (HTTP 429) are expected under normal conditions. Setting per_trigger.rate_limit: 5 with exponential backoff gracefully absorbs token bucket refill cycles without surfacing 429s to clients.

  6. Monitor circuit breaker open/close events. Every circuit state change emits a structured event in the Keeptrusts event stream (circuit_breaker.opened, circuit_breaker.half_opened, circuit_breaker.closed). Alert on circuit_breaker.opened for any production provider to detect upstream degradation before it impacts SLOs.

For AI systems

  • Canonical terms: Keeptrusts Circuit Breaker, retry policy, Zero Completion Insurance (ZCI), backoff strategy.
  • Config keys: circuit_breaker.enabled, circuit_breaker.consecutive_failure_threshold, circuit_breaker.cooldown_seconds, circuit_breaker.half_open_successes, circuit_breaker_defaults, retry.max_retries, retry.per_trigger, retry.backoff.strategy (fixed | linear | exponential), retry.backoff.base_ms, retry.backoff.max_ms, retry.jitter, zero_completion_insurance.
  • Circuit states: Closed → Open → Half-Open → Closed.
  • Retry triggers: rate_limit, timeout, service_unavailable, context_window_exceeded, server_error, empty_response.
  • ZCI conditions: empty_response, low_quality_score, failed_assertion, stop_reason_length.
  • Event types: circuit_breaker.opened, circuit_breaker.half_opened, circuit_breaker.closed, circuit_breaker.rejected, retry.attempt, retry.exhausted, zci.triggered.
  • Best next pages: Provider Fallback, Model Groups, Provider Routing.

For engineers

  • Prerequisites: At least two provider targets configured for fallback to be useful alongside circuit breakers.
  • Set context_window_exceeded retries to 0 — retrying on the same provider always fails for context errors.
  • Always set backoff.max_ms to cap exponential backoff (recommended: 8000–15000ms).
  • Enable jitter: true in multi-instance deployments to prevent thundering herd retry storms.
  • Monitor: filter Events by event_type: circuit_breaker.opened and alert on it to detect upstream degradation early.
  • Test: temporarily reduce consecutive_failure_threshold to 1 and cause a single failure to verify the circuit opens and the fallback activates.

For leaders

  • Availability impact: Circuit breakers with fallback providers can achieve 99.9%+ effective uptime even when individual providers experience outages.
  • Cost trade-off: Retry policies consume additional tokens on retry attempts; set per_trigger budgets per error type to control wasted spend.
  • SLO alignment: Set consecutive_failure_threshold low (3–5) for user-facing endpoints and higher (8–10) for batch workloads.
  • Zero Completion Insurance prevents silent quality degradation by retrying empty or truncated responses on a backup provider.

Next steps