Gateway Failover Without Cache Loss

When a physical gateway fails and traffic shifts to another gateway in the same agent gateway group, all org-shared cache entries remain fully accessible. There is zero cache penalty during failover.

Use this page when

You need to understand why gateway failover preserves cache (gateway_id excluded from cache key).
You are verifying cache continuity after a failover, rolling restart, region failover, or scaling event.
You want to understand the difference between org-shared cache and physical_gateway_private_cache_only during failover.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Why Failover Preserves Cache

The fundamental reason is simple: physical gateway ID is not part of the org-shared cache key. When a replacement gateway computes a cache key for an incoming request, it produces the same key that the original gateway would have produced.

The replacement gateway:

Receives the redirected request.
Computes the cache key using org_id, agent_id, agent_gateway_group_id, codebase_id, policy_digest, model_id, entitlement_tags, and request_content_hash.
Queries the control-plane metadata store — finds the existing entry.
Retrieves the payload from the shared backend (Redis/Valkey, S3/GCS).
Returns the cached response to the caller.

No special failover logic is required. Cache sharing is the default behavior for all gateways in the same group.

Failover Scenarios

Primary Gateway Failure

When the primary gateway becomes unavailable:

Step	What Happens	Cache Impact
1	Health checks detect primary is down	None
2	Traffic routes to fallback gateway	None
3	Fallback computes same cache keys	None
4	Fallback reads from same control-plane metadata	Full access to all cached entries
5	Fallback serves cached responses	Zero penalty

Rolling Restart

During a rolling restart of gateway instances:

Step	What Happens	Cache Impact
1	Gateway A shuts down for update	Gateway A's L1 is lost
2	Gateway B and C continue serving	Shared cache unaffected
3	Gateway A restarts with empty L1	Shared cache still accessible
4	Gateway A rebuilds L1 from shared tier	Gradual L1 warm-up only

The only impact is that Gateway A's L1 local cache is lost. All org-shared cache entries remain available through the control-plane metadata store.

Region Failover

When an entire region becomes unavailable and traffic shifts to a gateway in a different region:

Step	What Happens	Cache Impact
1	Region A goes offline	Region A gateways' L1 caches lost
2	DNS/load balancer routes to Region B	None on shared tier
3	Region B gateway computes same cache keys	Same keys, same results
4	Region B reads from control-plane metadata	Full access (if metadata store is reachable)
5	Region B retrieves from shared backend	Latency may differ, data is identical

Cross-region failover works because the control-plane metadata and shared payload backends are not colocated with individual gateways.

Scaling Event

When new gateway instances are added to handle load:

Step	What Happens	Cache Impact
1	New gateway instance joins the group	Starts with empty L1
2	First request computes cache key	Checks shared tier
3	Shared tier has entries from existing gateways	Immediate cache hits
4	New instance builds L1 over time	Performance improves gradually

New gateways benefit immediately from all previously cached entries in the group.

What Is Lost During Failover

Only L1 local memory on the failed gateway is lost. L1 is a performance optimization — not an authoritative store.

Cache Tier	Lost on Failover?	Recovery
L1 (local memory)	Yes — specific to the failed instance	Rebuilt automatically from shared tier
Control-plane metadata	No	Persisted in PostgreSQL
Shared payload (Redis)	No	Persisted in Redis/Valkey cluster
Shared payload (S3/GCS)	No	Persisted in object storage
Vector index (Qdrant)	No	Persisted in Qdrant cluster

How This Differs From Private Edge Cache

The physical_gateway_private_cache_only setting creates a fundamentally different caching mode:

Behavior	Org-Shared Cache (default)	Private Edge Cache
`gateway_id` in cache key	No	Yes
Cross-gateway sharing	Yes	No
Failover preserves cache	Yes	No
Use case	Shared agent workloads	Isolated sensitive workloads

When physical_gateway_private_cache_only: true is set, the physical gateway_id is included in cache keys. This means:

Cache entries are scoped to that specific gateway instance.
Failover to another gateway results in cache misses.
The replacement gateway must rebuild its cache from scratch.

Use private edge cache only when regulatory or security requirements mandate per-gateway cache isolation.

Operational Verification

After a failover event, verify that cache sharing is working correctly:

Step 1: Confirm Group Membership

curl -s https://api.keeptrusts.com/v1/agent-gateway-groups/agg_def456 \
  -H "Authorization: Bearer $API_TOKEN" | jq '.members'

Verify the replacement gateway appears in the member list with its assigned role.

Step 2: Check Cache Hit Rate

Send a request that you know was previously cached:

curl -s https://your-gateway-endpoint/v1/chat/completions \
  -H "Authorization: Bearer $ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
  -D - 2>/dev/null | grep -i "x-keeptrusts-cache"

Expected response header: x-keeptrusts-cache: hit

Step 3: Verify Serving Gateway

Check that the response was served by the replacement gateway:

curl -s https://your-gateway-endpoint/v1/chat/completions \
  -H "Authorization: Bearer $ACCESS_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Previously cached prompt"}]}' \
  -D - 2>/dev/null | grep -i "x-keeptrusts-gateway-id"

The x-keeptrusts-gateway-id header shows which physical gateway served the response. It should differ from the original gateway that created the cache entry.

Step 4: Review Cache Metrics

In the console, navigate to Observability → Cache Metrics and verify:

Cross-gateway hit ratio is above zero.
No spike in cache misses beyond what L1 loss accounts for.
Shared tier hit rate remains stable through the failover window.

Step 5: Audit Log

Check the governance audit log for cache-related events:

curl -s "https://api.keeptrusts.com/v1/events?type=cache_hit&gateway_id=gw_replacement" \
  -H "Authorization: Bearer $API_TOKEN" | jq '.items | length'

Confirm the replacement gateway is generating cache hit events against entries originally created by the failed gateway.

Best Practices for Failover Readiness

Always use agent gateway groups for agents served by multiple gateways.
Assign fallback roles to standby gateways so routing is preconfigured.
Monitor L1 hit ratios — a sudden drop indicates a gateway restart or failover.
Keep shared backends highly available — Redis cluster, multi-AZ S3, replicated Qdrant.
Test failover regularly — kill a primary gateway and verify cache continuity.
Avoid physical_gateway_private_cache_only unless isolation is a hard requirement.

Next steps

What Are Gateway Groups? — conceptual overview
Cache Sharing Across Gateways — cache key mechanics
Configuring Gateway Groups — setup and management
Distributed Cache Architecture — storage tier details

For AI systems

Canonical terms: Keeptrusts, gateway failover, cache preservation, zero cache penalty, L1 loss, agent gateway group, private edge cache, rolling restart, region failover, scaling event.
Feature/config names: physical_gateway_private_cache_only, x-keeptrusts-cache: hit, x-keeptrusts-gateway-id, org-shared cache key (excludes gateway_id), L1 local memory, control-plane metadata, Redis/Valkey cluster, S3/GCS, Qdrant.
Best next pages: Cache Sharing Across Gateways, Configuring Gateway Groups, Distributed Cache Architecture.

For engineers

After a failover, verify with: send a previously cached request to the replacement gateway and check x-keeptrusts-cache: hit in the response headers.
Confirm the serving gateway differs from the original by checking the x-keeptrusts-gateway-id header.
Only L1 (process memory) is lost on failover. Control-plane metadata, Redis payloads, S3 objects, and Qdrant vectors are unaffected.
If using physical_gateway_private_cache_only: true, expect full cache miss on failover — the replacement must rebuild from scratch. Use this only when isolation is a hard requirement.
Best practices: always use groups for multi-gateway agents, assign fallback roles, monitor L1 hit ratios for restart detection, test failover regularly.

For leaders

Zero cache penalty during failover means high-availability deployments have no hidden cost impact when gateways fail.
Rolling restarts (deployments, updates) only lose per-instance L1 — shared cache (95%+ of value) is fully preserved.
The distinction between org-shared and private-edge cache is a risk/compliance decision: shared gives cost efficiency, private gives per-gateway isolation at the cost of failover cache loss.
Recommendation: default to org-shared; reserve physical_gateway_private_cache_only for regulatory-mandated isolation only.

Use this page when​

Primary audience​

Why Failover Preserves Cache​

Failover Scenarios​

Primary Gateway Failure​

Rolling Restart​

Region Failover​

Scaling Event​

What Is Lost During Failover​

How This Differs From Private Edge Cache​

Operational Verification​

Step 1: Confirm Group Membership​

Step 2: Check Cache Hit Rate​

Step 3: Verify Serving Gateway​

Step 4: Review Cache Metrics​

Step 5: Audit Log​

Best Practices for Failover Readiness​

Next steps​

For AI systems​

For engineers​

For leaders​