Gateway Failover Without Cache Loss
When a physical gateway fails and traffic shifts to another gateway in the same agent gateway group, all org-shared cache entries remain fully accessible. There is zero cache penalty during failover.
Use this page when
- You need to understand why gateway failover preserves cache (gateway_id excluded from cache key).
- You are verifying cache continuity after a failover, rolling restart, region failover, or scaling event.
- You want to understand the difference between org-shared cache and
physical_gateway_private_cache_onlyduring failover.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Why Failover Preserves Cache
The fundamental reason is simple: physical gateway ID is not part of the org-shared cache key. When a replacement gateway computes a cache key for an incoming request, it produces the same key that the original gateway would have produced.
The replacement gateway:
- Receives the redirected request.
- Computes the cache key using
org_id,agent_id,agent_gateway_group_id,codebase_id,policy_digest,model_id,entitlement_tags, andrequest_content_hash. - Queries the control-plane metadata store — finds the existing entry.
- Retrieves the payload from the shared backend (Redis/Valkey, S3/GCS).
- Returns the cached response to the caller.
No special failover logic is required. Cache sharing is the default behavior for all gateways in the same group.
Failover Scenarios
Primary Gateway Failure
When the primary gateway becomes unavailable:
| Step | What Happens | Cache Impact |
|---|---|---|
| 1 | Health checks detect primary is down | None |
| 2 | Traffic routes to fallback gateway | None |
| 3 | Fallback computes same cache keys | None |
| 4 | Fallback reads from same control-plane metadata | Full access to all cached entries |
| 5 | Fallback serves cached responses | Zero penalty |
Rolling Restart
During a rolling restart of gateway instances:
| Step | What Happens | Cache Impact |
|---|---|---|
| 1 | Gateway A shuts down for update | Gateway A's L1 is lost |
| 2 | Gateway B and C continue serving | Shared cache unaffected |
| 3 | Gateway A restarts with empty L1 | Shared cache still accessible |
| 4 | Gateway A rebuilds L1 from shared tier | Gradual L1 warm-up only |
The only impact is that Gateway A's L1 local cache is lost. All org-shared cache entries remain available through the control-plane metadata store.
Region Failover
When an entire region becomes unavailable and traffic shifts to a gateway in a different region:
| Step | What Happens | Cache Impact |
|---|---|---|
| 1 | Region A goes offline | Region A gateways' L1 caches lost |
| 2 | DNS/load balancer routes to Region B | None on shared tier |
| 3 | Region B gateway computes same cache keys | Same keys, same results |
| 4 | Region B reads from control-plane metadata | Full access (if metadata store is reachable) |
| 5 | Region B retrieves from shared backend | Latency may differ, data is identical |
Cross-region failover works because the control-plane metadata and shared payload backends are not colocated with individual gateways.
Scaling Event
When new gateway instances are added to handle load:
| Step | What Happens | Cache Impact |
|---|---|---|
| 1 | New gateway instance joins the group | Starts with empty L1 |
| 2 | First request computes cache key | Checks shared tier |
| 3 | Shared tier has entries from existing gateways | Immediate cache hits |
| 4 | New instance builds L1 over time | Performance improves gradually |
New gateways benefit immediately from all previously cached entries in the group.
What Is Lost During Failover
Only L1 local memory on the failed gateway is lost. L1 is a performance optimization — not an authoritative store.
| Cache Tier | Lost on Failover? | Recovery |
|---|---|---|
| L1 (local memory) | Yes — specific to the failed instance | Rebuilt automatically from shared tier |
| Control-plane metadata | No | Persisted in PostgreSQL |
| Shared payload (Redis) | No | Persisted in Redis/Valkey cluster |
| Shared payload (S3/GCS) | No | Persisted in object storage |
| Vector index (Qdrant) | No | Persisted in Qdrant cluster |
How This Differs From Private Edge Cache
The physical_gateway_private_cache_only setting creates a fundamentally different caching mode:
| Behavior | Org-Shared Cache (default) | Private Edge Cache |
|---|---|---|
gateway_id in cache key | No | Yes |
| Cross-gateway sharing | Yes | No |
| Failover preserves cache | Yes | No |
| Use case | Shared agent workloads | Isolated sensitive workloads |
When physical_gateway_private_cache_only: true is set, the physical gateway_id is included in cache keys. This means:
- Cache entries are scoped to that specific gateway instance.
- Failover to another gateway results in cache misses.
- The replacement gateway must rebuild its cache from scratch.
Use private edge cache only when regulatory or security requirements mandate per-gateway cache isolation.
Operational Verification
After a failover event, verify that cache sharing is working correctly:
Step 1: Confirm Group Membership
curl -s https://api.keeptrusts.com/v1/agent-gateway-groups/agg_def456 \
-H "Authorization: Bearer $API_TOKEN" | jq '.members'
Verify the replacement gateway appears in the member list with its assigned role.
Step 2: Check Cache Hit Rate
Send a request that you know was previously cached:
curl -s https://your-gateway-endpoint/v1/chat/completions \
-H "Authorization: Bearer $ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
-D - 2>/dev/null | grep -i "x-keeptrusts-cache"
Expected response header: x-keeptrusts-cache: hit
Step 3: Verify Serving Gateway
Check that the response was served by the replacement gateway:
curl -s https://your-gateway-endpoint/v1/chat/completions \
-H "Authorization: Bearer $ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
-D - 2>/dev/null | grep -i "x-keeptrusts-gateway-id"
The x-keeptrusts-gateway-id header shows which physical gateway served the response. It should differ from the original gateway that created the cache entry.
Step 4: Review Cache Metrics
In the console, navigate to Observability → Cache Metrics and verify:
- Cross-gateway hit ratio is above zero.
- No spike in cache misses beyond what L1 loss accounts for.
- Shared tier hit rate remains stable through the failover window.
Step 5: Audit Log
Check the governance audit log for cache-related events:
curl -s "https://api.keeptrusts.com/v1/events?type=cache_hit&gateway_id=gw_replacement" \
-H "Authorization: Bearer $API_TOKEN" | jq '.items | length'
Confirm the replacement gateway is generating cache hit events against entries originally created by the failed gateway.
Best Practices for Failover Readiness
- Always use agent gateway groups for agents served by multiple gateways.
- Assign fallback roles to standby gateways so routing is preconfigured.
- Monitor L1 hit ratios — a sudden drop indicates a gateway restart or failover.
- Keep shared backends highly available — Redis cluster, multi-AZ S3, replicated Qdrant.
- Test failover regularly — kill a primary gateway and verify cache continuity.
- Avoid
physical_gateway_private_cache_onlyunless isolation is a hard requirement.
Next steps
- What Are Gateway Groups? — conceptual overview
- Cache Sharing Across Gateways — cache key mechanics
- Configuring Gateway Groups — setup and management
- Distributed Cache Architecture — storage tier details
For AI systems
- Canonical terms: Keeptrusts, gateway failover, cache preservation, zero cache penalty, L1 loss, agent gateway group, private edge cache, rolling restart, region failover, scaling event.
- Feature/config names:
physical_gateway_private_cache_only,x-keeptrusts-cache: hit,x-keeptrusts-gateway-id, org-shared cache key (excludesgateway_id), L1 local memory, control-plane metadata, Redis/Valkey cluster, S3/GCS, Qdrant. - Best next pages: Cache Sharing Across Gateways, Configuring Gateway Groups, Distributed Cache Architecture.
For engineers
- After a failover, verify with: send a previously cached request to the replacement gateway and check
x-keeptrusts-cache: hitin the response headers. - Confirm the serving gateway differs from the original by checking the
x-keeptrusts-gateway-idheader. - Only L1 (process memory) is lost on failover. Control-plane metadata, Redis payloads, S3 objects, and Qdrant vectors are unaffected.
- If using
physical_gateway_private_cache_only: true, expect full cache miss on failover — the replacement must rebuild from scratch. Use this only when isolation is a hard requirement. - Best practices: always use groups for multi-gateway agents, assign fallback roles, monitor L1 hit ratios for restart detection, test failover regularly.
For leaders
- Zero cache penalty during failover means high-availability deployments have no hidden cost impact when gateways fail.
- Rolling restarts (deployments, updates) only lose per-instance L1 — shared cache (95%+ of value) is fully preserved.
- The distinction between org-shared and private-edge cache is a risk/compliance decision: shared gives cost efficiency, private gives per-gateway isolation at the cost of failover cache loss.
- Recommendation: default to org-shared; reserve
physical_gateway_private_cache_onlyfor regulatory-mandated isolation only.