Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Provider Fallback

Keeptrusts's provider fallback system automatically retries failed requests against backup providers based on configurable error triggers. When a provider fails in a way that matches your trigger list, the gateway transparently re-submits the request to the next available provider in the fallback chain — without changing any code in your application.

Use this page when

  • You need the exact command, config, API, or integration details for Provider Fallback.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Fallback is distinct from the Circuit Breaker system. Circuit breakers prevent calls to a persistently degraded provider; fallback handles the individual request that encounters an error and routes it somewhere else before returning a failure to the client.


Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

ProviderFallback Configuration

The provider_fallback block is placed at the top level of your policy config and applies globally unless overridden at the route level.

Fields

FieldTypeDefaultDescription
enabledboolfalseEnable automatic provider fallback.
triggerslist of strings["rate_limit_exceeded","timeout","service_unavailable"]Error conditions that activate the fallback chain.
max_fallback_attemptsinteger3Maximum number of providers to try before returning the error to the client.
content_policystring"stop"What to do when a provider's response is blocked by a content policy rule.
context_windowstring"abort"What to do when the request exceeds the target provider's context window limit.

triggers Values

TriggerWhen It Fires
rate_limit_exceededProvider returns HTTP 429 or a rate-limit response body.
timeoutNo response received within the target's configured timeout_ms.
service_unavailableProvider returns HTTP 503 or 502, or connection is refused.
model_not_foundProvider returns HTTP 404 indicating the requested model is unavailable.
context_window_exceededProvider returns an error indicating the token count exceeds the model's context window.
invalid_responseProvider returns a malformed, non-JSON, or unexpected response body.
auth_errorProvider returns HTTP 401 or 403. Useful when rotating API keys and a key expires mid-deployment.

Triggers are OR-evaluated: fallback fires if any single trigger condition is met.

content_policy Values

ValueBehaviour
stopIf the provider's response is blocked by Keeptrusts's content policy, treat it as a fatal error and return a policy-blocked response to the client. Do not try the next provider.
continueTry the next provider in the fallback chain when a content policy block occurs. Useful when different providers have different refusal tendencies for edge-case prompts.
retry_with_modifiedStrip the policy-blocked output tokens from the request context and retry the next provider with the sanitized version.

context_window Values

ValueBehaviour
abortIf the primary provider would exceed its context window, immediately return an error to the client rather than truncating. Use when correctness is critical and lossy truncation is unacceptable.
truncateTruncate the oldest conversation turns (excluding system prompt) to fit within the next provider's documented context window, then proceed with the fallback attempt. The portion truncated is logged in the event as context_truncation_tokens.

Ordered Fallback

The most common fallback pattern is ordered: try providers left-to-right and stop at the first success.

Ordered fallback works hand-in-hand with the ordered routing strategy. Set provider_routing.strategy: ordered and provider_fallback.enabled: true together.

Example: Four-Provider Cascade

pack:
name: fallback-configuration-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-fallback
provider: anthropic:claude-3-5-sonnet-20241022
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: groq-fallback
provider: groq:llama-3.3-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
- id: cerebras-fallback
provider: cerebras:llama3.1-70b
secret_key_ref:
env: CEREBRAS_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

In this configuration, a rate_limit_exceeded error on openai-primary causes an immediate retry against anthropic-fallback. If Anthropic also returns a timeout, the gateway falls through to groq-fallback, and then to cerebras-fallback. If all four attempts fail, the original error (from the last attempted provider) is returned to the client.

Trigger-Specific Fallback

You can restrict fallback to only a subset of error types. For example, to fall back on capacity errors but not on auth errors (forcing an alert instead of silently masking a misconfiguration):

provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 2

Omitting auth_error and model_not_found from the trigger list means those errors are immediately surfaced to the client rather than causing a cascade to the next provider.


Context Window Overflow Handling

When a conversation grows long enough to exceed a provider's token limit, the request fails before any tokens are generated. Provider fallback can handle this case by routing to a provider with a larger context window.

Example: Overflow to Higher-Context Model

pack:
name: fallback-configuration-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: gpt-4o-standard
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: gemini-extended
provider: google:gemini-2.0-flash
secret_key_ref:
env: GOOGLE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

When a request exceeds GPT-4o's 128k context window, it is automatically re-routed to Gemini's 1M-token context window.

context_window: truncate Behavior

When context_window: truncate is set and a fallback target also has a smaller context than the request, Keeptrusts truncates the conversation history before forwarding. The truncation strategy is:

  1. The system prompt is never truncated. It is always preserved in full.
  2. Tool call results and tool definitions are preserved.
  3. The oldest user/assistant turn pairs are removed first, from the beginning of the conversation, until the message list fits within the target's context window.
  4. The number of tokens removed is logged in the event under the context_truncation_tokens field.

If truncation would remove more than 50% of the original conversation, Keeptrusts emits a context_truncation_warning event and proceeds. If truncation would remove the most recent user message itself, the request is aborted rather than producing a semantically empty request.


Fallback + model_groups

Model groups allow you to define named logical groups of providers. Fallback integrates with model groups through the fallback_group field on a ModelGroup entry.

When a request to a model group exhausts all members of the group, Keeptrusts falls through to the fallback_group — a named model group that acts as the next-tier cascade.

Example: Tiered Group Fallback

model_groups:
- id: premium-tier
members:
- openai-gpt4o
- anthropic-sonnet
routing: round_robin
fallback_group: standard-tier

- id: standard-tier
members:
- groq-llama3
- cerebras-llama3
routing: round_robin
fallback_group: free-tier

- id: free-tier
members:
- openrouter-mistral
routing: ordered

provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 6

In this configuration:

  1. Requests are distributed across premium-tier (GPT-4o and Claude Sonnet) by round robin.
  2. If all premium-tier providers are unavailable, the request cascades to standard-tier (Groq and Cerebras).
  3. If all standard-tier providers are unavailable, the request cascades to free-tier (Mistral via OpenRouter).

This pattern is particularly useful in production systems where you want to gracefully degrade service quality rather than return errors.


Observability

Every fallback attempt produces a structured event in the Keeptrusts event log. These events allow you to monitor fallback frequency, identify which providers are triggering the most fallbacks, and set alerts when fallback rates exceed expected thresholds.

Fallback Event Fields

FieldDescription
event_type"provider_fallback"
attempt_numberWhich attempt this was (1 = primary, 2 = first fallback, etc.).
triggerThe error type that caused the fallback (e.g. "rate_limit_exceeded").
from_providerThe provider that triggered the fallback.
to_providerThe provider being tried on this attempt.
original_errorFull error message or status code from the failing provider.
context_truncation_tokensNumber of tokens removed if context_window: truncate was applied.

Querying Fallback Events

# Count fallbacks by trigger type over the last 24 hours
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
| jq 'group_by(.trigger) | map({trigger: .[0].trigger, count: length}) | sort_by(-.count)'
# Identify which providers trigger the most fallbacks
curl -s "https://api.keeptrusts.com/v1/events?event_type=provider_fallback&limit=5000" \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
| jq 'group_by(.from_provider) | map({provider: .[0].from_provider, fallbacks: length})'

Console Monitoring

In the Keeptrusts console, navigate to Events and filter by event_type: provider_fallback. The events graph on the Dashboard also charts fallback event frequency as a separate series alongside blocked and allowed events, giving you at-a-glance visibility into provider reliability trends.


Best Practices

  1. Include timeout in triggers. Timeouts are the most common cause of unexpected fallback chains. Without this trigger, a slow provider blocks the chain until the full timeout expires rather than cascading promptly. Always pair the timeout trigger with a conservative timeout_ms value on each target (e.g., 15–30 seconds).

  2. Set max_fallback_attempts no higher than your provider count. If you have three providers and max_fallback_attempts: 10, the extra attempts are wasted — there are only three providers to try. Set it equal to the number of providers in your active chain to avoid surprising behavior.

  3. Avoid auth_error in triggers during initial setup. Including auth_error in triggers can silently mask a misconfigured or expired API key by falling through to the next provider. Leave it out of your trigger list until your key-rotation process is stable; add it later if you want zero-downtime key rotation.

  4. Use context_window: abort for document-processing pipelines. When your application sends structured documents with precise length contracts (e.g., a legal review pipeline that must process the entire document), truncation would produce incorrect outputs. Set context_window: abort so the application receives an explicit error and can handle oversized documents in its own logic.

  5. Monitor context_truncation_tokens in events. Frequent large truncations indicate that your primary model's context window is too small for your typical conversation length. The truncation warnings give you data to make a model-upgrade decision before your users notice degraded response quality.

  6. Combine fallback with circuit breakers at each provider. Fallback handles the individual request that fails; circuit breakers handle sustained degradation of a provider. Using both together means: a single timeout triggers fallback (fast path), while five consecutive timeouts open the circuit (preventing further slow calls to a degraded provider). See Circuit Breakers & Retry for the companion configuration.


Route-Level Fallback Overrides

Fallback behavior can be scoped to individual routes. For example, you might want aggressive fallback on a user-facing chat route but none on an internal batch-processing route where correct provider attribution matters for billing:

pack:
name: fallback-configuration-routes-5
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
routes:
- path: "/v1/chat/completions"
provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 3
- path: "/v1/batch"
provider_fallback:
enabled: false

Route-level provider_fallback fully overrides the top-level block for that path. Any field omitted at the route level falls back to the top-level value.


Fallback with Retry

Fallback and retry are complementary:

  • Retry re-sends the request to the same provider after a brief delay. Good for transient network glitches or brief rate-limit windows.
  • Fallback sends the request to a different provider after the retry budget is exhausted. Good for sustained outages or capacity exhaustion.

Configure them together to get both behaviors:

retry:
enabled: true
max_attempts: 2
initial_delay_ms: 500
strategy: exponential_backoff
triggers:
- timeout
- service_unavailable

provider_fallback:
enabled: true
triggers:
- rate_limit_exceeded
- timeout
- service_unavailable
max_fallback_attempts: 3

When both are active, the gateway first retries the same provider up to retry.max_attempts times; only after exhausting retries does it trigger the fallback chain to the next provider.


Fallback Latency Budget

Each provider attempt consumes wall-clock time. With three providers each having a 30-second timeout, a worst-case chain can take 90 seconds before returning an error to the client. Plan your fallback chain with this in mind:

Provider PositionRecommended timeout_msRationale
Primary25–30 secondsFull quality; latency acceptable
First fallback15–20 secondsSlightly faster provider or tier
Second fallback5–10 secondsFast/cheap provider; lower quality ok
Last-resort3–5 secondsMinimal latency; best-effort only

Tightening timeouts on fallback providers reflects a key insight: by the time you reach the second or third fallback, time is already scarce. A user waiting >30 seconds will have abandoned the request regardless.


Fallback Health Probe Integration

When a provider is known to be unavailable (open circuit), Keeptrusts skips it in the fallback chain rather than wasting a fallback attempt on it. The effective fallback chain at any moment is: all providers whose circuits are currently closed or half-open, tried in the order defined in providers.targets.

This means a fallback chain of four providers where the second is circuit-opened behaves as a three-provider chain: primary → third provider → fourth provider. The skipped provider continues its cooldown independently and re-enters the chain when its circuit closes.

To inspect the current circuit state and effective fallback chain, query the gateway health endpoint:

curl -s "http://localhost:41002/_health/providers" | jq '.providers[] | {id:.id, circuit:.circuit_state, fallback_position:.fallback_position}'

For AI systems

  • Canonical terms: Keeptrusts Provider Fallback, fallback chain, ordered fallback, context window overflow, fallback group.
  • Config keys: provider_fallback.enabled, provider_fallback.triggers (rate_limit_exceeded | timeout | service_unavailable | model_not_found | context_window_exceeded | invalid_response | auth_error), provider_fallback.max_fallback_attempts, provider_fallback.content_policy (stop | continue | retry_with_modified), provider_fallback.context_window (abort | truncate).
  • Event type: provider_fallback with fields attempt_number, trigger, from_provider, to_provider, context_truncation_tokens.
  • Health endpoint: GET /_health/providers to inspect circuit state and effective fallback chain.
  • Route-level override: routes[].provider_fallback fully overrides the top-level block for that path.
  • Best next pages: Circuit Breakers & Retry, Model Groups, Provider Routing.

For engineers

  • Prerequisites: at least two provider targets defined in providers.targets; set provider_routing.strategy: ordered for predictable cascade behavior.
  • Always include timeout in triggers and set conservative timeout_ms per target (primary: 25–30s, last-resort: 3–5s).
  • Set max_fallback_attempts equal to your provider count — higher values waste evaluation time.
  • Validate: query GET /v1/events?event_type=provider_fallback to see which providers trigger the most fallbacks.
  • Use context_window: abort for document-processing pipelines where truncation produces incorrect outputs.
  • Test fallback: temporarily revoke a provider’s API key and verify traffic cascades to the next provider.

For leaders

  • Availability: provider fallback eliminates single-provider dependency — users never see errors from a single upstream outage.
  • Latency budget: worst-case end-to-end latency = sum of all provider timeouts; plan fallback chains with decreasing timeouts to bound total wait time.
  • Cost: fallback to premium models on capacity errors can increase spend; pair with per-group tpm limits or max_price routing filters.
  • Compliance: context_window: truncate silently removes conversation history — use abort for regulated workflows that require complete document processing.

Next steps