Provider Fallback
Keeptrusts's provider fallback system automatically retries failed requests against backup providers based on configurable error triggers. When a provider fails in a way that matches your trigger list, the gateway transparently re-submits the request to the next available provider in the fallback chain — without changing any code in your application.
Use this page when
- You need the exact command, config, API, or integration details for Provider Fallback.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Fallback is distinct from the Circuit Breaker system. Circuit breakers prevent calls to a persistently degraded provider; fallback handles the individual request that encounters an error and routes it somewhere else before returning a failure to the client.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
ProviderFallback Configuration
The provider_fallback block is placed at the top level of your policy config and applies globally unless overridden at the route level.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable automatic provider fallback. |
triggers | list of strings | ["rate_limit_exceeded","timeout","service_unavailable"] | Error conditions that activate the fallback chain. |
max_fallback_attempts | integer | 3 | Maximum number of providers to try before returning the error to the client. |
content_policy | string | "stop" | What to do when a provider's response is blocked by a content policy rule. |
context_window | string | "abort" | What to do when the request exceeds the target provider's context window limit. |
triggers Values
| Trigger | When It Fires |
|---|---|
rate_limit_exceeded | Provider returns HTTP 429 or a rate-limit response body. |
timeout | No response received within the target's configured timeout_ms. |
service_unavailable | Provider returns HTTP 503 or 502, or connection is refused. |
model_not_found | Provider returns HTTP 404 indicating the requested model is unavailable. |
context_window_exceeded | Provider returns an error indicating the token count exceeds the model's context window. |
invalid_response | Provider returns a malformed, non-JSON, or unexpected response body. |
auth_error | Provider returns HTTP 401 or 403. Useful when rotating API keys and a key expires mid-deployment. |
Triggers are OR-evaluated: fallback fires if any single trigger condition is met.
content_policy Values
| Value | Behaviour |
|---|---|
stop | If the provider's response is blocked by Keeptrusts's content policy, treat it as a fatal error and return a policy-blocked response to the client. Do not try the next provider. |
continue | Try the next provider in the fallback chain when a content policy block occurs. Useful when different providers have different refusal tendencies for edge-case prompts. |
retry_with_modified | Strip the policy-blocked output tokens from the request context and retry the next provider with the sanitized version. |
context_window Values
| Value | Behaviour |
|---|---|
abort | If the primary provider would exceed its context window, immediately return an error to the client rather than truncating. Use when correctness is critical and lossy truncation is unacceptable. |
truncate | Truncate the oldest conversation turns (excluding system prompt) to fit within the next provider's documented context window, then proceed with the fallback attempt. The portion truncated is logged in the event as context_truncation_tokens. |
Ordered Fallback
The most common fallback pattern is ordered: try providers left-to-right and stop at the first success.
Ordered fallback works hand-in-hand with the ordered routing strategy. Set provider_routing.strategy: ordered and provider_fallback.enabled: true together.
Example: Four-Provider Cascade
pack:
name: fallback-configuration-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-fallback
provider: anthropic:claude-3-5-sonnet-20241022
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: groq-fallback
provider: groq:llama-3.3-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
- id: cerebras-fallback
provider: cerebras:llama3.1-70b
secret_key_ref:
env: CEREBRAS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
In this configuration, a rate_limit_exceeded error on openai-primary causes an immediate retry against anthropic-fallback. If Anthropic also returns a timeout, the gateway falls through to groq-fallback, and then to cerebras-fallback. If all four attempts fail, the original error (from the last attempted provider) is returned to the client.
Trigger-Specific Fallback
You can restrict fallback to only a subset of error types. For example, to fall back on capacity errors but not on auth errors (forcing an alert instead of silently masking a misconfiguration):
provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 2
Omitting auth_error and model_not_found from the trigger list means those errors are immediately surfaced to the client rather than causing a cascade to the next provider.
Context Window Overflow Handling
When a conversation grows long enough to exceed a provider's token limit, the request fails before any tokens are generated. Provider fallback can handle this case by routing to a provider with a larger context window.
Example: Overflow to Higher-Context Model
pack:
name: fallback-configuration-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: gpt-4o-standard
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: gemini-extended
provider: google:gemini-2.0-flash
secret_key_ref:
env: GOOGLE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
When a request exceeds GPT-4o's 128k context window, it is automatically re-routed to Gemini's 1M-token context window.
context_window: truncate Behavior
When context_window: truncate is set and a fallback target also has a smaller context than the request, Keeptrusts truncates the conversation history before forwarding. The truncation strategy is:
- The system prompt is never truncated. It is always preserved in full.
- Tool call results and tool definitions are preserved.
- The oldest user/assistant turn pairs are removed first, from the beginning of the conversation, until the message list fits within the target's context window.
- The number of tokens removed is logged in the event under the
context_truncation_tokensfield.
If truncation would remove more than 50% of the original conversation, Keeptrusts emits a context_truncation_warning event and proceeds. If truncation would remove the most recent user message itself, the request is aborted rather than producing a semantically empty request.
Fallback + model_groups
Model groups allow you to define named logical groups of providers. Fallback integrates with model groups through the fallback_group field on a ModelGroup entry.
When a request to a model group exhausts all members of the group, Keeptrusts falls through to the fallback_group — a named model group that acts as the next-tier cascade.
Example: Tiered Group Fallback
model_groups:
- id: premium-tier
members:
- openai-gpt4o
- anthropic-sonnet
routing: round_robin
fallback_group: standard-tier
- id: standard-tier
members:
- groq-llama3
- cerebras-llama3
routing: round_robin
fallback_group: free-tier
- id: free-tier
members:
- openrouter-mistral
routing: ordered
provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 6
In this configuration:
- Requests are distributed across
premium-tier(GPT-4o and Claude Sonnet) by round robin. - If all premium-tier providers are unavailable, the request cascades to
standard-tier(Groq and Cerebras). - If all standard-tier providers are unavailable, the request cascades to
free-tier(Mistral via OpenRouter).
This pattern is particularly useful in production systems where you want to gracefully degrade service quality rather than return errors.
Observability
Every fallback attempt produces a structured event in the Keeptrusts event log. These events allow you to monitor fallback frequency, identify which providers are triggering the most fallbacks, and set alerts when fallback rates exceed expected thresholds.
Fallback Event Fields
| Field | Description |
|---|---|
event_type | "provider_fallback" |
attempt_number | Which attempt this was (1 = primary, 2 = first fallback, etc.). |
trigger | The error type that caused the fallback (e.g. "rate_limit_exceeded"). |
from_provider | The provider that triggered the fallback. |
to_provider | The provider being tried on this attempt. |
original_error | Full error message or status code from the failing provider. |
context_truncation_tokens | Number of tokens removed if context_window: truncate was applied. |
Querying Fallback Events
# Count fallbacks by trigger type over the last 24 hours
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
| jq 'group_by(.trigger) | map({trigger: .[0].trigger, count: length}) | sort_by(-.count)'
# Identify which providers trigger the most fallbacks
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
| jq 'group_by(.from_provider) | map({provider: .[0].from_provider, fallbacks: length})'
Console Monitoring
In the Keeptrusts console, navigate to Events and filter by event_type: provider_fallback. The events graph on the Dashboard also charts fallback event frequency as a separate series alongside blocked and allowed events, giving you at-a-glance visibility into provider reliability trends.
Best Practices
-
Include
timeoutin triggers. Timeouts are the most common cause of unexpected fallback chains. Without this trigger, a slow provider blocks the chain until the full timeout expires rather than cascading promptly. Always pair thetimeouttrigger with a conservativetimeout_msvalue on each target (e.g., 15–30 seconds). -
Set
max_fallback_attemptsno higher than your provider count. If you have three providers andmax_fallback_attempts: 10, the extra attempts are wasted — there are only three providers to try. Set it equal to the number of providers in your active chain to avoid surprising behavior. -
Avoid
auth_errorin triggers during initial setup. Includingauth_errorin triggers can silently mask a misconfigured or expired API key by falling through to the next provider. Leave it out of your trigger list until your key-rotation process is stable; add it later if you want zero-downtime key rotation. -
Use
context_window: abortfor document-processing pipelines. When your application sends structured documents with precise length contracts (e.g., a legal review pipeline that must process the entire document), truncation would produce incorrect outputs. Setcontext_window: abortso the application receives an explicit error and can handle oversized documents in its own logic. -
Monitor
context_truncation_tokensin events. Frequent large truncations indicate that your primary model's context window is too small for your typical conversation length. The truncation warnings give you data to make a model-upgrade decision before your users notice degraded response quality. -
Combine fallback with circuit breakers at each provider. Fallback handles the individual request that fails; circuit breakers handle sustained degradation of a provider. Using both together means: a single timeout triggers fallback (fast path), while five consecutive timeouts open the circuit (preventing further slow calls to a degraded provider). See Circuit Breakers & Retry for the companion configuration.
Route-Level Fallback Overrides
Fallback behavior can be scoped to individual routes. For example, you might want aggressive fallback on a user-facing chat route but none on an internal batch-processing route where correct provider attribution matters for billing:
pack:
name: fallback-configuration-routes-5
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
routes:
- path: "/v1/chat/completions"
provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 3
- path: "/v1/batch"
provider_fallback:
enabled: false
Route-level provider_fallback fully overrides the top-level block for that path. Any field omitted at the route level falls back to the top-level value.
Fallback with Retry
Fallback and retry are complementary:
- Retry re-sends the request to the same provider after a brief delay. Good for transient network glitches or brief rate-limit windows.
- Fallback sends the request to a different provider after the retry budget is exhausted. Good for sustained outages or capacity exhaustion.
Configure them together to get both behaviors:
retry:
enabled: true
max_attempts: 2
initial_delay_ms: 500
strategy: exponential_backoff
triggers:
- timeout
- service_unavailable
provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 3
When both are active, the gateway first retries the same provider up to retry.max_attempts times; only after exhausting retries does it trigger the fallback chain to the next provider.
Fallback Latency Budget
Each provider attempt consumes wall-clock time. With three providers each having a 30-second timeout, a worst-case chain can take 90 seconds before returning an error to the client. Plan your fallback chain with this in mind:
| Provider Position | Recommended timeout_ms | Rationale |
|---|---|---|
| Primary | 25–30 seconds | Full quality; latency acceptable |
| First fallback | 15–20 seconds | Slightly faster provider or tier |
| Second fallback | 5–10 seconds | Fast/cheap provider; lower quality ok |
| Last-resort | 3–5 seconds | Minimal latency; best-effort only |
Tightening timeouts on fallback providers reflects a key insight: by the time you reach the second or third fallback, time is already scarce. A user waiting >30 seconds will have abandoned the request regardless.
Fallback Health Probe Integration
When a provider is known to be unavailable (open circuit), Keeptrusts skips it in the fallback chain rather than wasting a fallback attempt on it. The effective fallback chain at any moment is: all providers whose circuits are currently closed or half-open, tried in the order defined in providers.targets.
This means a fallback chain of four providers where the second is circuit-opened behaves as a three-provider chain: primary → third provider → fourth provider. The skipped provider continues its cooldown independently and re-enters the chain when its circuit closes.
To inspect the current circuit state and effective fallback chain, query the gateway health endpoint:
curl -s "http://localhost:41002/_health/providers" | jq '.providers[] | {id:.id, circuit:.circuit_state, fallback_position:.fallback_position}'
For AI systems
- Canonical terms: Keeptrusts Provider Fallback, fallback chain, ordered fallback, context window overflow, fallback group.
- Config keys:
provider_fallback.enabled,provider_fallback.triggers(rate_limit_exceeded|timeout|service_unavailable|model_not_found|context_window_exceeded|invalid_response|auth_error),provider_fallback.max_fallback_attempts,provider_fallback.content_policy(stop|continue|retry_with_modified),provider_fallback.context_window(abort|truncate). - Event type:
provider_fallbackwith fieldsattempt_number,trigger,from_provider,to_provider,context_truncation_tokens. - Health endpoint:
GET /_health/providersto inspect circuit state and effective fallback chain. - Route-level override:
routes[].provider_fallbackfully overrides the top-level block for that path. - Best next pages: Circuit Breakers & Retry, Model Groups, Provider Routing.
For engineers
- Prerequisites: at least two provider targets defined in
providers.targets; setprovider_routing.strategy: orderedfor predictable cascade behavior. - Always include
timeoutin triggers and set conservativetimeout_msper target (primary: 25–30s, last-resort: 3–5s). - Set
max_fallback_attemptsequal to your provider count — higher values waste evaluation time. - Validate: query
GET /v1/events?event_type=provider_fallbackto see which providers trigger the most fallbacks. - Use
context_window: abortfor document-processing pipelines where truncation produces incorrect outputs. - Test fallback: temporarily revoke a provider’s API key and verify traffic cascades to the next provider.
For leaders
- Availability: provider fallback eliminates single-provider dependency — users never see errors from a single upstream outage.
- Latency budget: worst-case end-to-end latency = sum of all provider timeouts; plan fallback chains with decreasing timeouts to bound total wait time.
- Cost: fallback to premium models on capacity errors can increase spend; pair with per-group
tpmlimits ormax_pricerouting filters. - Compliance:
context_window: truncatesilently removes conversation history — useabortfor regulated workflows that require complete document processing.
Next steps
- Circuit Breakers & Retry — retry within a provider before cascading to fallback
- Model Groups — tiered group fallback with
fallback_group - Provider Routing — routing strategies that determine fallback order
- Context Compression — compress context before triggering context-window fallback