Provider Fallback

Keeptrusts's provider fallback system automatically retries failed requests against backup providers based on configurable error triggers. When a provider fails in a way that matches your trigger list, the gateway transparently re-submits the request to the next available provider in the fallback chain — without changing any code in your application.

Use this page when

You need the exact command, config, API, or integration details for Provider Fallback.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Fallback is distinct from the Circuit Breaker system. Circuit breakers prevent calls to a persistently degraded provider; fallback handles the individual request that encounters an error and routes it somewhere else before returning a failure to the client.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

ProviderFallback Configuration

The provider_fallback block is placed at the top level of your policy config and applies globally unless overridden at the route level.

Fields

Field	Type	Default	Description
`enabled`	bool	`false`	Enable automatic provider fallback.
`triggers`	list of strings	`["rate_limit_exceeded","timeout","service_unavailable"]`	Error conditions that activate the fallback chain.
`max_fallback_attempts`	integer	`3`	Maximum number of providers to try before returning the error to the client.
`content_policy`	string	`"stop"`	What to do when a provider's response is blocked by a content policy rule.
`context_window`	string	`"abort"`	What to do when the request exceeds the target provider's context window limit.

`triggers` Values

Trigger	When It Fires
`rate_limit_exceeded`	Provider returns HTTP 429 or a rate-limit response body.
`timeout`	No response received within the target's configured `timeout_ms`.
`service_unavailable`	Provider returns HTTP 503 or 502, or connection is refused.
`model_not_found`	Provider returns HTTP 404 indicating the requested model is unavailable.
`context_window_exceeded`	Provider returns an error indicating the token count exceeds the model's context window.
`invalid_response`	Provider returns a malformed, non-JSON, or unexpected response body.
`auth_error`	Provider returns HTTP 401 or 403. Useful when rotating API keys and a key expires mid-deployment.

Triggers are OR-evaluated: fallback fires if any single trigger condition is met.

`content_policy` Values

Value	Behaviour
`stop`	If the provider's response is blocked by Keeptrusts's content policy, treat it as a fatal error and return a policy-blocked response to the client. Do not try the next provider.
`continue`	Try the next provider in the fallback chain when a content policy block occurs. Useful when different providers have different refusal tendencies for edge-case prompts.
`retry_with_modified`	Strip the policy-blocked output tokens from the request context and retry the next provider with the sanitized version.

`context_window` Values

Value	Behaviour
`abort`	If the primary provider would exceed its context window, immediately return an error to the client rather than truncating. Use when correctness is critical and lossy truncation is unacceptable.
`truncate`	Truncate the oldest conversation turns (excluding system prompt) to fit within the next provider's documented context window, then proceed with the fallback attempt. The portion truncated is logged in the event as `context_truncation_tokens`.

Ordered Fallback

The most common fallback pattern is ordered: try providers left-to-right and stop at the first success.

Ordered fallback works hand-in-hand with the ordered routing strategy. Set provider_routing.strategy: ordered and provider_fallback.enabled: true together.

Example: Four-Provider Cascade

pack:
  name: fallback-configuration-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-fallback
    provider: anthropic:claude-3-5-sonnet-20241022
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  - id: groq-fallback
    provider: groq:llama-3.3-70b-versatile
    secret_key_ref:
      env: GROQ_API_KEY
  - id: cerebras-fallback
    provider: cerebras:llama3.1-70b
    secret_key_ref:
      env: CEREBRAS_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

In this configuration, a rate_limit_exceeded error on openai-primary causes an immediate retry against anthropic-fallback. If Anthropic also returns a timeout, the gateway falls through to groq-fallback, and then to cerebras-fallback. If all four attempts fail, the original error (from the last attempted provider) is returned to the client.

Trigger-Specific Fallback

You can restrict fallback to only a subset of error types. For example, to fall back on capacity errors but not on auth errors (forcing an alert instead of silently masking a misconfiguration):

provider_fallback:
  enabled: true
  triggers:
    - rate_limit_exceeded
    - timeout
    - service_unavailable
  max_fallback_attempts: 2

Omitting auth_error and model_not_found from the trigger list means those errors are immediately surfaced to the client rather than causing a cascade to the next provider.

Context Window Overflow Handling

When a conversation grows long enough to exceed a provider's token limit, the request fails before any tokens are generated. Provider fallback can handle this case by routing to a provider with a larger context window.

Example: Overflow to Higher-Context Model

pack:
  name: fallback-configuration-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: gpt-4o-standard
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: gemini-extended
    provider: google:gemini-2.0-flash
    secret_key_ref:
      env: GOOGLE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

When a request exceeds GPT-4o's 128k context window, it is automatically re-routed to Gemini's 1M-token context window.

`context_window: truncate` Behavior

When context_window: truncate is set and a fallback target also has a smaller context than the request, Keeptrusts truncates the conversation history before forwarding. The truncation strategy is:

The system prompt is never truncated. It is always preserved in full.
Tool call results and tool definitions are preserved.
The oldest user/assistant turn pairs are removed first, from the beginning of the conversation, until the message list fits within the target's context window.
The number of tokens removed is logged in the event under the context_truncation_tokens field.

If truncation would remove more than 50% of the original conversation, Keeptrusts emits a context_truncation_warning event and proceeds. If truncation would remove the most recent user message itself, the request is aborted rather than producing a semantically empty request.

Fallback + model_groups

Model groups allow you to define named logical groups of providers. Fallback integrates with model groups through the fallback_group field on a ModelGroup entry.

When a request to a model group exhausts all members of the group, Keeptrusts falls through to the fallback_group — a named model group that acts as the next-tier cascade.

Example: Tiered Group Fallback

model_groups:
  - id: premium-tier
    members:
      - openai-gpt4o
      - anthropic-sonnet
    routing: round_robin
    fallback_group: standard-tier

  - id: standard-tier
    members:
      - groq-llama3
      - cerebras-llama3
    routing: round_robin
    fallback_group: free-tier

  - id: free-tier
    members:
      - openrouter-mistral
    routing: ordered

provider_fallback:
  enabled: true
  triggers:
    - rate_limit_exceeded
    - timeout
    - service_unavailable
  max_fallback_attempts: 6

In this configuration:

Requests are distributed across premium-tier (GPT-4o and Claude Sonnet) by round robin.
If all premium-tier providers are unavailable, the request cascades to standard-tier (Groq and Cerebras).
If all standard-tier providers are unavailable, the request cascades to free-tier (Mistral via OpenRouter).

This pattern is particularly useful in production systems where you want to gracefully degrade service quality rather than return errors.

Observability

Every fallback attempt produces a structured event in the Keeptrusts event log. These events allow you to monitor fallback frequency, identify which providers are triggering the most fallbacks, and set alerts when fallback rates exceed expected thresholds.

Fallback Event Fields

Field	Description
`event_type`	`"provider_fallback"`
`attempt_number`	Which attempt this was (1 = primary, 2 = first fallback, etc.).
`trigger`	The error type that caused the fallback (e.g. `"rate_limit_exceeded"`).
`from_provider`	The provider that triggered the fallback.
`to_provider`	The provider being tried on this attempt.
`original_error`	Full error message or status code from the failing provider.
`context_truncation_tokens`	Number of tokens removed if `context_window: truncate` was applied.

Querying Fallback Events

# Count fallbacks by trigger type over the last 24 hours
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
  -H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
  | jq 'group_by(.trigger) | map({trigger: .[0].trigger, count: length}) | sort_by(-.count)'

# Identify which providers trigger the most fallbacks
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
  -H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
  | jq 'group_by(.from_provider) | map({provider: .[0].from_provider, fallbacks: length})'

Console Monitoring

In the Keeptrusts console, navigate to Events and filter by event_type: provider_fallback. The events graph on the Dashboard also charts fallback event frequency as a separate series alongside blocked and allowed events, giving you at-a-glance visibility into provider reliability trends.

Best Practices

Include timeout in triggers. Timeouts are the most common cause of unexpected fallback chains. Without this trigger, a slow provider blocks the chain until the full timeout expires rather than cascading promptly. Always pair the timeout trigger with a conservative timeout_ms value on each target (e.g., 15–30 seconds).
Set max_fallback_attempts no higher than your provider count. If you have three providers and max_fallback_attempts: 10, the extra attempts are wasted — there are only three providers to try. Set it equal to the number of providers in your active chain to avoid surprising behavior.
Avoid auth_error in triggers during initial setup. Including auth_error in triggers can silently mask a misconfigured or expired API key by falling through to the next provider. Leave it out of your trigger list until your key-rotation process is stable; add it later if you want zero-downtime key rotation.
Use context_window: abort for document-processing pipelines. When your application sends structured documents with precise length contracts (e.g., a legal review pipeline that must process the entire document), truncation would produce incorrect outputs. Set context_window: abort so the application receives an explicit error and can handle oversized documents in its own logic.
Monitor context_truncation_tokens in events. Frequent large truncations indicate that your primary model's context window is too small for your typical conversation length. The truncation warnings give you data to make a model-upgrade decision before your users notice degraded response quality.
Combine fallback with circuit breakers at each provider. Fallback handles the individual request that fails; circuit breakers handle sustained degradation of a provider. Using both together means: a single timeout triggers fallback (fast path), while five consecutive timeouts open the circuit (preventing further slow calls to a degraded provider). See Circuit Breakers & Retry for the companion configuration.

Route-Level Fallback Overrides

Fallback behavior can be scoped to individual routes. For example, you might want aggressive fallback on a user-facing chat route but none on an internal batch-processing route where correct provider attribution matters for billing:

pack:
  name: fallback-configuration-routes-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true
routes:
- path: "/v1/chat/completions"
  provider_fallback:
    enabled: true
    triggers:
    - rate_limit_exceeded
    - timeout
    - service_unavailable
    max_fallback_attempts: 3
- path: "/v1/batch"
  provider_fallback:
    enabled: false

Route-level provider_fallback fully overrides the top-level block for that path. Any field omitted at the route level falls back to the top-level value.

Fallback with Retry

Fallback and retry are complementary:

Retry re-sends the request to the same provider after a brief delay. Good for transient network glitches or brief rate-limit windows.
Fallback sends the request to a different provider after the retry budget is exhausted. Good for sustained outages or capacity exhaustion.

Configure them together to get both behaviors:

retry:
  enabled: true
  max_attempts: 2
  initial_delay_ms: 500
  strategy: exponential_backoff
  triggers:
    - timeout
    - service_unavailable

provider_fallback:
  enabled: true
  triggers:
    - rate_limit_exceeded
    - timeout
    - service_unavailable
  max_fallback_attempts: 3

When both are active, the gateway first retries the same provider up to retry.max_attempts times; only after exhausting retries does it trigger the fallback chain to the next provider.

Fallback Latency Budget

Each provider attempt consumes wall-clock time. With three providers each having a 30-second timeout, a worst-case chain can take 90 seconds before returning an error to the client. Plan your fallback chain with this in mind:

Provider Position	Recommended `timeout_ms`	Rationale
Primary	25–30 seconds	Full quality; latency acceptable
First fallback	15–20 seconds	Slightly faster provider or tier
Second fallback	5–10 seconds	Fast/cheap provider; lower quality ok
Last-resort	3–5 seconds	Minimal latency; best-effort only

Tightening timeouts on fallback providers reflects a key insight: by the time you reach the second or third fallback, time is already scarce. A user waiting >30 seconds will have abandoned the request regardless.

Fallback Health Probe Integration

When a provider is known to be unavailable (open circuit), Keeptrusts skips it in the fallback chain rather than wasting a fallback attempt on it. The effective fallback chain at any moment is: all providers whose circuits are currently closed or half-open, tried in the order defined in providers.targets.

This means a fallback chain of four providers where the second is circuit-opened behaves as a three-provider chain: primary → third provider → fourth provider. The skipped provider continues its cooldown independently and re-enters the chain when its circuit closes.

To inspect the current circuit state and effective fallback chain, query the gateway health endpoint:

curl -s "http://localhost:41002/_health/providers" | jq '.providers[] | {id:.id, circuit:.circuit_state, fallback_position:.fallback_position}'

For AI systems

Canonical terms: Keeptrusts Provider Fallback, fallback chain, ordered fallback, context window overflow, fallback group.
Config keys: provider_fallback.enabled, provider_fallback.triggers (rate_limit_exceeded | timeout | service_unavailable | model_not_found | context_window_exceeded | invalid_response | auth_error), provider_fallback.max_fallback_attempts, provider_fallback.content_policy (stop | continue | retry_with_modified), provider_fallback.context_window (abort | truncate).
Event type: provider_fallback with fields attempt_number, trigger, from_provider, to_provider, context_truncation_tokens.
Health endpoint: GET /_health/providers to inspect circuit state and effective fallback chain.
Route-level override: routes[].provider_fallback fully overrides the top-level block for that path.
Best next pages: Circuit Breakers & Retry, Model Groups, Provider Routing.

For engineers

Prerequisites: at least two provider targets defined in providers.targets; set provider_routing.strategy: ordered for predictable cascade behavior.
Always include timeout in triggers and set conservative timeout_ms per target (primary: 25–30s, last-resort: 3–5s).
Set max_fallback_attempts equal to your provider count — higher values waste evaluation time.
Validate: query GET /v1/events?event_type=provider_fallback to see which providers trigger the most fallbacks.
Use context_window: abort for document-processing pipelines where truncation produces incorrect outputs.
Test fallback: temporarily revoke a provider’s API key and verify traffic cascades to the next provider.

For leaders

Availability: provider fallback eliminates single-provider dependency — users never see errors from a single upstream outage.
Latency budget: worst-case end-to-end latency = sum of all provider timeouts; plan fallback chains with decreasing timeouts to bound total wait time.
Cost: fallback to premium models on capacity errors can increase spend; pair with per-group tpm limits or max_price routing filters.
Compliance: context_window: truncate silently removes conversation history — use abort for regulated workflows that require complete document processing.

Next steps

Circuit Breakers & Retry — retry within a provider before cascading to fallback
Model Groups — tiered group fallback with fallback_group
Provider Routing — routing strategies that determine fallback order
Context Compression — compress context before triggering context-window fallback

Use this page when​

Primary audience​

ProviderFallback Configuration​

Fields​

triggers Values​

content_policy Values​

context_window Values​

Ordered Fallback​

Example: Four-Provider Cascade​

Trigger-Specific Fallback​

Context Window Overflow Handling​

Example: Overflow to Higher-Context Model​

context_window: truncate Behavior​

Fallback + model_groups​

Example: Tiered Group Fallback​

Observability​

Fallback Event Fields​

Querying Fallback Events​

Console Monitoring​

Best Practices​

Route-Level Fallback Overrides​

Fallback with Retry​

Fallback Latency Budget​

Fallback Health Probe Integration​

For AI systems​

For engineers​

For leaders​

Next steps​