Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Gateway Failover Without Cache Loss

When a physical gateway fails and traffic shifts to another gateway in the same agent gateway group, all org-shared cache entries remain fully accessible. There is zero cache penalty during failover.

Use this page when

  • You need to understand why gateway failover preserves cache (gateway_id excluded from cache key).
  • You are verifying cache continuity after a failover, rolling restart, region failover, or scaling event.
  • You want to understand the difference between org-shared cache and physical_gateway_private_cache_only during failover.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Why Failover Preserves Cache

The fundamental reason is simple: physical gateway ID is not part of the org-shared cache key. When a replacement gateway computes a cache key for an incoming request, it produces the same key that the original gateway would have produced.

The replacement gateway:

  1. Receives the redirected request.
  2. Computes the cache key using org_id, agent_id, agent_gateway_group_id, codebase_id, policy_digest, model_id, entitlement_tags, and request_content_hash.
  3. Queries the control-plane metadata store — finds the existing entry.
  4. Retrieves the payload from the shared backend (Redis/Valkey, S3/GCS).
  5. Returns the cached response to the caller.

No special failover logic is required. Cache sharing is the default behavior for all gateways in the same group.

Failover Scenarios

Primary Gateway Failure

When the primary gateway becomes unavailable:

StepWhat HappensCache Impact
1Health checks detect primary is downNone
2Traffic routes to fallback gatewayNone
3Fallback computes same cache keysNone
4Fallback reads from same control-plane metadataFull access to all cached entries
5Fallback serves cached responsesZero penalty

Rolling Restart

During a rolling restart of gateway instances:

StepWhat HappensCache Impact
1Gateway A shuts down for updateGateway A's L1 is lost
2Gateway B and C continue servingShared cache unaffected
3Gateway A restarts with empty L1Shared cache still accessible
4Gateway A rebuilds L1 from shared tierGradual L1 warm-up only

The only impact is that Gateway A's L1 local cache is lost. All org-shared cache entries remain available through the control-plane metadata store.

Region Failover

When an entire region becomes unavailable and traffic shifts to a gateway in a different region:

StepWhat HappensCache Impact
1Region A goes offlineRegion A gateways' L1 caches lost
2DNS/load balancer routes to Region BNone on shared tier
3Region B gateway computes same cache keysSame keys, same results
4Region B reads from control-plane metadataFull access (if metadata store is reachable)
5Region B retrieves from shared backendLatency may differ, data is identical

Cross-region failover works because the control-plane metadata and shared payload backends are not colocated with individual gateways.

Scaling Event

When new gateway instances are added to handle load:

StepWhat HappensCache Impact
1New gateway instance joins the groupStarts with empty L1
2First request computes cache keyChecks shared tier
3Shared tier has entries from existing gatewaysImmediate cache hits
4New instance builds L1 over timePerformance improves gradually

New gateways benefit immediately from all previously cached entries in the group.

What Is Lost During Failover

Only L1 local memory on the failed gateway is lost. L1 is a performance optimization — not an authoritative store.

Cache TierLost on Failover?Recovery
L1 (local memory)Yes — specific to the failed instanceRebuilt automatically from shared tier
Control-plane metadataNoPersisted in PostgreSQL
Shared payload (Redis)NoPersisted in Redis/Valkey cluster
Shared payload (S3/GCS)NoPersisted in object storage
Vector index (Qdrant)NoPersisted in Qdrant cluster

How This Differs From Private Edge Cache

The physical_gateway_private_cache_only setting creates a fundamentally different caching mode:

BehaviorOrg-Shared Cache (default)Private Edge Cache
gateway_id in cache keyNoYes
Cross-gateway sharingYesNo
Failover preserves cacheYesNo
Use caseShared agent workloadsIsolated sensitive workloads

When physical_gateway_private_cache_only: true is set, the physical gateway_id is included in cache keys. This means:

  • Cache entries are scoped to that specific gateway instance.
  • Failover to another gateway results in cache misses.
  • The replacement gateway must rebuild its cache from scratch.

Use private edge cache only when regulatory or security requirements mandate per-gateway cache isolation.

Operational Verification

After a failover event, verify that cache sharing is working correctly:

Step 1: Confirm Group Membership

curl -s https://api.keeptrusts.com/v1/agent-gateway-groups/agg_def456 \
-H "Authorization: Bearer $API_TOKEN" | jq '.members'

Verify the replacement gateway appears in the member list with its assigned role.

Step 2: Check Cache Hit Rate

Send a request that you know was previously cached:

curl -s https://your-gateway-endpoint/v1/chat/completions \
-H "Authorization: Bearer $ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
-D - 2>/dev/null | grep -i "x-keeptrusts-cache"

Expected response header: x-keeptrusts-cache: hit

Step 3: Verify Serving Gateway

Check that the response was served by the replacement gateway:

curl -s https://your-gateway-endpoint/v1/chat/completions \
-H "Authorization: Bearer $ACCESS_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
-D - 2>/dev/null | grep -i "x-keeptrusts-gateway-id"

The x-keeptrusts-gateway-id header shows which physical gateway served the response. It should differ from the original gateway that created the cache entry.

Step 4: Review Cache Metrics

In the console, navigate to Observability → Cache Metrics and verify:

  • Cross-gateway hit ratio is above zero.
  • No spike in cache misses beyond what L1 loss accounts for.
  • Shared tier hit rate remains stable through the failover window.

Step 5: Audit Log

Check the governance audit log for cache-related events:

curl -s "https://api.keeptrusts.com/v1/events?type=cache_hit&gateway_id=gw_replacement" \
-H "Authorization: Bearer $API_TOKEN" | jq '.items | length'

Confirm the replacement gateway is generating cache hit events against entries originally created by the failed gateway.

Best Practices for Failover Readiness

  1. Always use agent gateway groups for agents served by multiple gateways.
  2. Assign fallback roles to standby gateways so routing is preconfigured.
  3. Monitor L1 hit ratios — a sudden drop indicates a gateway restart or failover.
  4. Keep shared backends highly available — Redis cluster, multi-AZ S3, replicated Qdrant.
  5. Test failover regularly — kill a primary gateway and verify cache continuity.
  6. Avoid physical_gateway_private_cache_only unless isolation is a hard requirement.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, gateway failover, cache preservation, zero cache penalty, L1 loss, agent gateway group, private edge cache, rolling restart, region failover, scaling event.
  • Feature/config names: physical_gateway_private_cache_only, x-keeptrusts-cache: hit, x-keeptrusts-gateway-id, org-shared cache key (excludes gateway_id), L1 local memory, control-plane metadata, Redis/Valkey cluster, S3/GCS, Qdrant.
  • Best next pages: Cache Sharing Across Gateways, Configuring Gateway Groups, Distributed Cache Architecture.

For engineers

  • After a failover, verify with: send a previously cached request to the replacement gateway and check x-keeptrusts-cache: hit in the response headers.
  • Confirm the serving gateway differs from the original by checking the x-keeptrusts-gateway-id header.
  • Only L1 (process memory) is lost on failover. Control-plane metadata, Redis payloads, S3 objects, and Qdrant vectors are unaffected.
  • If using physical_gateway_private_cache_only: true, expect full cache miss on failover — the replacement must rebuild from scratch. Use this only when isolation is a hard requirement.
  • Best practices: always use groups for multi-gateway agents, assign fallback roles, monitor L1 hit ratios for restart detection, test failover regularly.

For leaders

  • Zero cache penalty during failover means high-availability deployments have no hidden cost impact when gateways fail.
  • Rolling restarts (deployments, updates) only lose per-instance L1 — shared cache (95%+ of value) is fully preserved.
  • The distinction between org-shared and private-edge cache is a risk/compliance decision: shared gives cost efficiency, private gives per-gateway isolation at the cost of failover cache loss.
  • Recommendation: default to org-shared; reserve physical_gateway_private_cache_only for regulatory-mandated isolation only.