Circuit Breakers & Retry

Keeptrusts includes built-in circuit breakers and retry policies that protect your application against upstream LLM provider failures. Together they form a two-layer resilience system: retries absorb transient errors at the individual request level, while circuit breakers protect the system from spending time and tokens against a provider that is persistently degraded.

Use this page when

You need the exact command, config, API, or integration details for Circuit Breakers & Retry.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Circuit Breaker

A circuit breaker wraps each provider target and tracks its recent failure history. When failures exceed a threshold, the circuit "opens" and the gateway immediately routes to the next available provider without waiting for a timeout. After a cooldown period the circuit enters the "half-open" state and probes the provider with a limited number of real requests; if they succeed, the circuit closes again.

States

                ┌─────────────┐
     success    │             │   consecutive failures ≥ threshold
   ┌────────────│   CLOSED    │──────────────────────────────────────┐
   │            │             │                                       ▼
   │            └─────────────┘                            ┌──────────────────┐
   │                                                        │      OPEN        │
   │          ┌──────────────────┐                          │  (reject fast)   │
   └──────────│   HALF-OPEN      │◄─────────────────────────┤                  │
  all probes  │  (limited probes)│   cooldown_seconds       └──────────────────┘
  succeed     └──────────────────┘
        │
        │  any probe fails
        └──────────────────────────────► OPEN

State	Behaviour
Closed	Normal operation. Failures are counted.
Open	All requests to this provider are immediately rejected without making an upstream call.
Half-Open	A limited number of probe requests are forwarded. Success → Closed; failure → Open again.

Configuration fields

Field	Type	Default	Description
`enabled`	bool	`false`	Enable circuit breaker for this target or globally.
`consecutive_failure_threshold`	integer	`5`	Number of consecutive failures before the circuit opens.
`cooldown_seconds`	integer	`60`	Seconds to wait in the Open state before entering Half-Open.
`half_open_successes`	integer	`2`	Number of consecutive successes required in Half-Open to close the circuit.

Per-target configuration

pack:
  name: circuit-breaker-retry-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-backup
    provider: azure:chat:gpt-4o
    secret_key_ref:
      env: AZURE_OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Global circuit breaker defaults

Set global defaults that apply to all targets that do not declare their own circuit_breaker block:

pack:
  name: circuit-breaker-retry-providers-2
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: groq-fast
    provider: groq:chat:llama-3.3-70b-versatile
    secret_key_ref:
      env: GROQ_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Retry Policy

The retry policy controls how many times the gateway attempts a request before declaring failure, which error conditions trigger retries, and how long to wait between attempts.

Configuration fields

Field	Type	Default	Description
`max_retries`	integer	`2`	Total retry attempts across all triggers.
`per_trigger`	map	`{}`	Override `max_retries` for specific error types.
`backoff.strategy`	string	`exponential`	Backoff timing: `fixed`, `linear`, or `exponential`.
`backoff.base_ms`	integer	`200`	Starting delay in milliseconds.
`backoff.delay_ms`	integer	`500`	Increment for `linear`, or base for `fixed`.
`backoff.max_ms`	integer	`10000`	Maximum delay cap regardless of strategy.
`jitter`	bool	`true`	Add ±20% random jitter to backoff delays to avoid thundering herds.

Error triggers

Trigger	Condition
`rate_limit`	Provider returns HTTP 429.
`timeout`	No response within the configured request timeout.
`service_unavailable`	Provider returns HTTP 503 or HTTP 502.
`context_window_exceeded`	Provider returns a context-length error (HTTP 400 with a context-window error code).
`server_error`	Any 5xx response not matched by a more specific trigger.
`empty_response`	Provider returns HTTP 200 but with zero content in the completion.

Full retry configuration example

pack:
  name: circuit-breaker-retry-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Backoff strategies

Fixed
Linear
Exponential

Every retry waits the same delay_ms regardless of attempt number.

retry:
  max_retries: 3
  backoff:
    strategy: fixed
    delay_ms: 1000   # always wait 1 second

Delays: 1000ms → 1000ms → 1000ms

Each retry adds another delay_ms to the previous delay, starting from base_ms.

retry:
  max_retries: 4
  backoff:
    strategy: linear
    base_ms: 200
    delay_ms: 300

Delays: 200ms → 500ms → 800ms → 1100ms

Delay doubles on each retry, starting from base_ms, capped at max_ms.

retry:
  max_retries: 5
  backoff:
    strategy: exponential
    base_ms: 250
    max_ms: 8000
    jitter: true

Delays: ~250ms → ~500ms → ~1000ms → ~2000ms → ~4000ms (with jitter applied)

Combining Circuit Breaker + Retry with Fallbacks

The full resilience system layers retry, circuit breaker, and group fallback into a single decision pipeline:

Request
  │
  ▼
Retry attempt 1 → upstream call
  │ fails (timeout)
  ▼
Retry attempt 2 → upstream call
  │ fails (5xx)
  ▼
Retry attempt 3 → upstream call
  │ fails (5xx)
  │ consecutive_failure_threshold reached → circuit opens
  ▼
Route to fallback provider (circuit breaker short-circuits this target)
  │
  ▼
Response to client

Complete example

pack:
  name: resilient-chat
  version: 1.0.0
provider_routing:
  strategy: ordered
  fallback_enabled: true
circuit_breaker_defaults:
  enabled: true
  consecutive_failure_threshold: 4
  cooldown_seconds: 60
  half_open_successes: 2
model_groups:
- name: primary-chat
  fallback_group: backup-chat
  targets:
  - id: openai-primary
    weight: 1
- name: backup-chat
  targets:
  - id: anthropic-backup
    weight: 1
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-backup
    provider: anthropic:chat:claude-3-5-sonnet-20241022
    secret_key_ref:
      env: ANTHROPIC_API_KEY

What happens when OpenAI degrades:

Request arrives and is forwarded to openai-primary.
The upstream returns 503. The retry policy forwards to openai-primary up to 3 more times (for service_unavailable), each time with exponential backoff.
After 4 consecutive failures, the circuit breaker opens. Subsequent requests to openai-primary are immediately rejected without any upstream calls.
The fallback group backup-chat is activated. Requests are now forwarded to anthropic-backup.
After 60 seconds, the openai-primary circuit enters Half-Open. Two consecutive probe requests succeed, and the circuit closes. Traffic shifts back to openai-primary.

Zero Completion Insurance

Zero Completion Insurance (ZCI) is an additional retry layer that activates when a provider returns a technically successful HTTP 200 response but with no usable completion content. This happens when providers stream an empty choices[0].message.content, return a stop reason of length with zero tokens, or produce a low-quality output that fails a configured assertion.

Configuration fields

Field	Type	Description
`enabled`	bool	Enable ZCI for this target or globally.
`conditions`	list	One or more conditions that trigger ZCI.
`action`	string	What to do when a condition fires: `retry_same`, `retry_fallback`, `return_error`.
`retry_with_fallback`	bool	If `true`, retry on the next available provider rather than the same one.
`max_zci_retries`	integer	Maximum ZCI-specific retry attempts (default: `2`).

Conditions

Condition	Description
`empty_response`	The response body contains no completion tokens.
`low_quality_score`	A configured quality scorer rates the response below threshold.
`failed_assertion`	A post-processing policy assertion is not satisfied.
`stop_reason_length`	The model stopped generating due to token limit (truncated output).

ZCI configuration example

pack:
  name: circuit-breaker-retry-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-backup
    provider: anthropic:chat:claude-3-5-sonnet-20241022
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

When openai-primary returns an empty response, ZCI fires with action: retry_fallback, and the gateway immediately retries the request on anthropic-backup rather than returning the empty response to the client.

Observability

Every circuit breaker state change and every retry attempt emits a structured event in the Keeptrusts event stream, giving full visibility into resilience behaviour in production.

Circuit breaker events

Event	Fields	Description
`circuit_breaker.opened`	`target_id`, `failure_count`, `threshold`	Circuit transitioned from Closed → Open.
`circuit_breaker.half_opened`	`target_id`, `cooldown_elapsed_ms`	Cooldown expired; circuit entered Half-Open probe mode.
`circuit_breaker.closed`	`target_id`, `probe_successes`	All probes succeeded; circuit closed and normal routing resumed.
`circuit_breaker.rejected`	`target_id`	A request was rejected because the circuit is currently Open.

Retry events

Event	Fields	Description
`retry.attempt`	`target_id`, `attempt_number`, `trigger`, `backoff_ms`	A retry was scheduled.
`retry.exhausted`	`target_id`, `total_attempts`, `last_trigger`	All retry attempts were consumed; request will fail or be routed to fallback.
`zci.triggered`	`target_id`, `condition`, `action`	Zero Completion Insurance activated on a successful-status but empty/low-quality response.

Example: alert on circuit opening

Use the Keeptrusts console event rule engine to create an alert when any circuit opens in production:

alert_rules:
  - name: circuit-breaker-opened
    event_type: circuit_breaker.opened
    severity: high
    channels:
      - pagerduty
      - slack-ops
    message: "Circuit breaker opened for provider {{ target_id }} after {{ failure_count }} failures."

Best Practices

Set context_window_exceeded retries to 0. Retrying a context-length error on the same provider always fails — the model cannot process a prompt that exceeds its window. Either truncate the prompt or route to a provider with a larger context window.
Keep consecutive_failure_threshold low for user-facing paths (3–5) and higher for batch paths (8–10). Low thresholds protect real-time UX from slow provider degradation; higher thresholds tolerate normal variance in batch workloads.
Always set max_ms on exponential backoff. Without a cap, exponential backoff can produce delays of tens of seconds on attempt 6+, turning a transient error into an apparent hang.
Enable jitter: true in multi-instance deployments. Without jitter, all gateway instances back off to the same retry schedule and retry simultaneously, creating thundering herd traffic spikes against a recovering upstream.
Use per-trigger rate_limit retries generously. Rate limit responses (HTTP 429) are expected under normal conditions. Setting per_trigger.rate_limit: 5 with exponential backoff gracefully absorbs token bucket refill cycles without surfacing 429s to clients.
Monitor circuit breaker open/close events. Every circuit state change emits a structured event in the Keeptrusts event stream (circuit_breaker.opened, circuit_breaker.half_opened, circuit_breaker.closed). Alert on circuit_breaker.opened for any production provider to detect upstream degradation before it impacts SLOs.

For AI systems

Canonical terms: Keeptrusts Circuit Breaker, retry policy, Zero Completion Insurance (ZCI), backoff strategy.
Config keys: circuit_breaker.enabled, circuit_breaker.consecutive_failure_threshold, circuit_breaker.cooldown_seconds, circuit_breaker.half_open_successes, circuit_breaker_defaults, retry.max_retries, retry.per_trigger, retry.backoff.strategy (fixed | linear | exponential), retry.backoff.base_ms, retry.backoff.max_ms, retry.jitter, zero_completion_insurance.
Circuit states: Closed → Open → Half-Open → Closed.
Retry triggers: rate_limit, timeout, service_unavailable, context_window_exceeded, server_error, empty_response.
ZCI conditions: empty_response, low_quality_score, failed_assertion, stop_reason_length.
Event types: circuit_breaker.opened, circuit_breaker.half_opened, circuit_breaker.closed, circuit_breaker.rejected, retry.attempt, retry.exhausted, zci.triggered.
Best next pages: Provider Fallback, Model Groups, Provider Routing.

For engineers

Prerequisites: At least two provider targets configured for fallback to be useful alongside circuit breakers.
Set context_window_exceeded retries to 0 — retrying on the same provider always fails for context errors.
Always set backoff.max_ms to cap exponential backoff (recommended: 8000–15000ms).
Enable jitter: true in multi-instance deployments to prevent thundering herd retry storms.
Monitor: filter Events by event_type: circuit_breaker.opened and alert on it to detect upstream degradation early.
Test: temporarily reduce consecutive_failure_threshold to 1 and cause a single failure to verify the circuit opens and the fallback activates.

For leaders

Availability impact: Circuit breakers with fallback providers can achieve 99.9%+ effective uptime even when individual providers experience outages.
Cost trade-off: Retry policies consume additional tokens on retry attempts; set per_trigger budgets per error type to control wasted spend.
SLO alignment: Set consecutive_failure_threshold low (3–5) for user-facing endpoints and higher (8–10) for batch workloads.
Zero Completion Insurance prevents silent quality degradation by retrying empty or truncated responses on a backup provider.

Next steps

Provider Fallback — configure multi-provider fallback chains
Model Groups — define fallback groups with cascading tiers
Provider Routing — routing strategies that complement circuit breakers
Rate Limiting — prevent upstream rate limits from triggering unnecessary retries

Use this page when​

Primary audience​

Circuit Breaker​

States​

Configuration fields​

Per-target configuration​

Global circuit breaker defaults​

Retry Policy​

Configuration fields​

Error triggers​

Full retry configuration example​

Backoff strategies​

Combining Circuit Breaker + Retry with Fallbacks​

Complete example​

Zero Completion Insurance​

Configuration fields​

Conditions​

ZCI configuration example​

Observability​

Circuit breaker events​

Retry events​

Example: alert on circuit opening​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​

Use this page when

Primary audience

Circuit Breaker

States

Configuration fields

Per-target configuration

Global circuit breaker defaults

Retry Policy

Configuration fields

Error triggers

Full retry configuration example

Backoff strategies

Combining Circuit Breaker + Retry with Fallbacks

Complete example

Zero Completion Insurance

Configuration fields

Conditions

ZCI configuration example

Observability

Circuit breaker events

Retry events

Example: alert on circuit opening

Best Practices

For AI systems

For engineers

For leaders

Next steps