Distributed Cache Architecture: L1 to Control Plane

Agent gateway group cache sharing relies on a three-tier architecture. No gateway ever fetches cache entries directly from another gateway. Instead, all cache state flows through centralized, auditable layers.

Use this page when

You need to understand the three-tier cache architecture (L1 memory, control-plane metadata in PostgreSQL, shared payload/vector backends).
You are planning infrastructure for Redis/Valkey, S3/GCS, or Qdrant to support agent gateway group caching.
You want to understand cache read/write flows and why gateways never communicate cache data directly.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                   Agent Gateway Group                     │
│                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │Gateway A │    │Gateway B │    │Gateway C │          │
│  │  (L1)    │    │  (L1)    │    │  (L1)    │          │
│  └────┬─────┘    └────┬─────┘    └────┬─────┘          │
│       │               │               │                 │
└───────┼───────────────┼───────────────┼─────────────────┘
        │               │               │
        ▼               ▼               ▼
┌─────────────────────────────────────────────────────────┐
│           Control-Plane Metadata (PostgreSQL)             │
│     Authoritative org-shared cache metadata store        │
└────────────────────────────┬────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────┐
│         Shared Payload / Vector Backends                  │
│    Redis/Valkey  │  S3/GCS  │  Qdrant (vector)          │
└─────────────────────────────────────────────────────────┘

Tier 1: Gateway L1 Local Memory

Each gateway maintains a local in-memory cache (L1). This is the fastest tier — responses served from L1 have sub-millisecond overhead.

Characteristics

Property	Value
Storage location	Gateway process memory
Latency	Sub-millisecond
Scope	Single gateway instance
Persistence	None — lost on restart
Size	Configurable, typically 256 MB–2 GB
Eviction	LRU (Least Recently Used)

Purpose

L1 exists as a performance optimization only. It reduces repeated round-trips to the control plane for frequently accessed entries. L1 is not authoritative — the control-plane metadata store is the source of truth.

What L1 Stores

Full response payloads for recent cache hits
Negative cache markers (confirmed misses) to avoid repeated lookups
TTL-bounded entries that expire independently of the shared tier

L1 is private to each gateway instance. Gateway A's L1 does not directly populate Gateway B's L1. However, when Gateway B queries the shared tier and gets a hit (originally written by Gateway A), Gateway B populates its own L1 for subsequent requests.

Tier 2: Control-Plane Metadata (PostgreSQL)

The control-plane metadata store is the authoritative record of all org-shared cache entries. It lives in the platform's PostgreSQL database alongside other governance data.

Characteristics

Property	Value
Storage location	PostgreSQL (control-plane database)
Latency	1–10 ms (network dependent)
Scope	Org-wide, group-scoped
Persistence	Durable, backed up
Size	Metadata only — not full payloads
Eviction	TTL-based + retention policies

What the Metadata Store Contains

Cache entry keys and their associated gates (org, codebase, policy, model, entitlement)
Pointers to payload locations in the shared backend
Entry creation timestamps and TTLs
Audit fields: which gateway created the entry, which agent group owns it
Validity flags: whether the entry has been invalidated

Why Metadata Lives in the Control Plane

Centralizing metadata in PostgreSQL provides:

Auditability — every cache entry is traceable through the governance audit log.
Topology independence — gateways can be added, removed, or replaced without metadata loss.
Consistent security enforcement — entitlement and residency checks happen at the metadata layer.
Transactional guarantees — cache invalidation is atomic and consistent.

Tier 3: Shared Payload and Vector Backends

Full response payloads and vector embeddings live in dedicated shared storage backends. The control-plane metadata store holds pointers to these locations.

Supported Backends

Backend	Use Case	Stored Data
Redis / Valkey	Hot payload cache	Serialized LLM responses, small payloads
S3 / GCS	Large or cold payloads	Large responses, batch results, artifacts
Qdrant	Semantic vector cache	Embedding vectors for similarity-based cache lookup

Redis / Valkey

You configure Redis or Valkey as the hot payload store for frequently accessed cache entries. Gateways retrieve payloads by reference (the pointer stored in control-plane metadata).

Low latency (1–5 ms)
Suitable for responses under 1 MB
TTL-based eviction aligned with cache entry TTLs
Cluster mode supported for horizontal scaling

S3 / GCS

For larger payloads or entries that you want to retain longer, configure an S3-compatible or GCS backend. Gateways retrieve payloads via presigned URLs generated by the control plane.

Higher latency (50–200 ms)
Cost-effective for large or infrequently accessed entries
Lifecycle policies manage storage costs
Data residency controls via bucket location

Qdrant (Vector Cache)

When semantic cache is enabled, vector embeddings are stored in Qdrant. The control plane coordinates similarity searches across the vector index.

Enables cache hits for semantically equivalent (not just identical) requests
Configurable similarity threshold
Org-scoped collections ensure isolation

No Direct Gateway-to-Gateway Cache Fetches

A deliberate architectural constraint: gateways never communicate cache data directly to each other. All cache access flows through the control plane and shared backends.

Why This Constraint Exists

Reason	Explanation
Centralized auditability	Every cache access is logged at the control plane
Topology independence	Gateway count can change without reconfiguring mesh connections
Simpler security model	One enforcement point for entitlement and residency checks
No split-brain risk	Metadata consistency is managed by PostgreSQL, not distributed consensus
Easier operations	No gateway discovery, no peer authentication, no gossip protocols

What This Means in Practice

Adding a new gateway to a group requires zero configuration on existing gateways.
Removing a gateway from a group has no impact on other gateways' cache access.
Network partitions between gateways do not affect cache consistency.
You can replace all gateways in a group simultaneously without losing shared cache.

Cache Read Flow

When a gateway receives a request and checks the shared cache:

L1 check — gateway checks local memory. If hit, return immediately.
Metadata lookup — gateway queries the control-plane metadata store with the computed cache key.
Gate validation — the control plane verifies org, entitlement, residency, and policy gates.
Payload retrieval — if metadata exists and gates pass, the gateway fetches the payload from the referenced shared backend.
L1 population — the gateway stores the result in L1 for future requests.
Response — the cached response is returned to the caller.

Cache Write Flow

When a gateway produces a new response to cache:

Response received — the gateway gets a response from the upstream LLM.
Metadata write — the gateway writes cache metadata to the control plane (key, gates, TTL, payload pointer).
Payload write — the gateway writes the full payload to the configured shared backend.
L1 population — the gateway stores the result in its own L1.
Confirmation — the control plane acknowledges the write.

Other gateways in the group can immediately read the new entry via their own metadata lookups.

Operational Considerations

Latency Budget

Tier	Typical Latency	When Used
L1 hit	< 1 ms	Repeated identical requests on same gateway
Metadata + Redis hit	5–15 ms	Cross-gateway cache hit, hot payload
Metadata + S3 hit	50–200 ms	Cross-gateway cache hit, cold payload
Full miss	0 ms overhead	No cache entry exists, forward to LLM

Monitoring

Monitor these metrics to assess cache architecture health:

L1 hit ratio per gateway
Control-plane metadata lookup latency (p50, p99)
Shared backend retrieval latency
Cache write success rate
Cross-gateway hit ratio (hits served from entries written by a different gateway)

Next steps

What Are Gateway Groups? — conceptual overview
Cache Sharing Across Gateways — cache key mechanics
Configuring Gateway Groups — setup and management
Gateway Failover Without Cache Loss — failover behavior

For AI systems

Canonical terms: Keeptrusts, distributed cache architecture, L1 local memory, control-plane metadata, PostgreSQL, shared payload backend, Redis, Valkey, S3, GCS, Qdrant, vector cache, semantic cache, three-tier architecture.
Feature/config names: L1 cache (256 MB–2 GB, LRU eviction), control-plane metadata store (PostgreSQL), hot payload store (Redis/Valkey), cold payload store (S3/GCS), vector index (Qdrant), gateway L1 hit ratio, cross-gateway hit ratio.
Best next pages: Cache Sharing Across Gateways, Configuring Gateway Groups, Gateway Failover Without Cache Loss.

For engineers

L1 is process memory on each gateway (sub-ms reads, lost on restart). Size it based on your gateway’s memory budget (256 MB–2 GB typical).
Control-plane metadata: 1–10ms lookups, durable in PostgreSQL. This is the source of truth for cache entry existence and gate validation.
Shared backends: Redis/Valkey for hot payloads (1–5ms), S3/GCS for cold/large payloads (50–200ms), Qdrant for semantic vector search.
Monitor: L1 hit ratio per gateway, metadata lookup latency (p50/p99), shared backend retrieval latency, cache write success rate, cross-gateway hit ratio.

For leaders

No gateway-to-gateway communication simplifies operations: no mesh networking, no peer discovery, no gossip protocols, no split-brain risk.
Centralized metadata in PostgreSQL provides full auditability of every cache access through the governance audit log.
Infrastructure cost: Redis cluster for hot payloads, S3 for cold storage (with lifecycle policies), Qdrant for semantic matching. Scale independently per tier.
Topology independence means you can replace, scale, or relocate gateways without impacting shared cache state.

Use this page when​

Primary audience​

Architecture Overview​

Tier 1: Gateway L1 Local Memory​

Characteristics​

Purpose​

What L1 Stores​

L1 and Group Sharing​

Tier 2: Control-Plane Metadata (PostgreSQL)​

Characteristics​

What the Metadata Store Contains​

Why Metadata Lives in the Control Plane​

Tier 3: Shared Payload and Vector Backends​

Supported Backends​

Redis / Valkey​

S3 / GCS​

Qdrant (Vector Cache)​

No Direct Gateway-to-Gateway Cache Fetches​

Why This Constraint Exists​

What This Means in Practice​

Cache Read Flow​

Cache Write Flow​

Operational Considerations​

Latency Budget​

Monitoring​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Architecture Overview

Tier 1: Gateway L1 Local Memory

Characteristics

Purpose

What L1 Stores

L1 and Group Sharing

Tier 2: Control-Plane Metadata (PostgreSQL)

Characteristics

What the Metadata Store Contains

Why Metadata Lives in the Control Plane

Tier 3: Shared Payload and Vector Backends

Supported Backends

Redis / Valkey

S3 / GCS

Qdrant (Vector Cache)

No Direct Gateway-to-Gateway Cache Fetches

Why This Constraint Exists

What This Means in Practice

Cache Read Flow

Cache Write Flow

Operational Considerations

Latency Budget

Monitoring

Next steps

For AI systems

For engineers

For leaders