Cache Backend Selection: Redis, S3, Qdrant
The org-shared cache uses a tiered backend architecture. Each tier stores a different kind of data with different access patterns, retention needs, and performance requirements. This guide helps you choose and configure the right backend for each tier based on your organization's scale, budget, and performance needs.
Use this page when
- You are choosing a cache backend (Redis, S3, Qdrant) for your deployment.
- You need to compare performance, cost, and operational characteristics of each backend option.
- You want to understand which backend suits your scale, latency requirements, and infrastructure constraints.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Architecture Overview
The cache system uses four backend types, each serving a specific role:
| Backend | Role | Data Stored | Access Pattern |
|---|---|---|---|
| Redis/Valkey | Hot cache | Cache keys, metadata, lookup indexes | High frequency, low latency reads |
| S3/GCS | Artifact store | Full response payloads, generated content | Write-once, read-many |
| Qdrant | Vector store | Embedding vectors for semantic matching | Similarity search queries |
| PostgreSQL | Metadata | Entry lifecycle, ownership, audit trail | Transactional writes, indexed reads |
Redis/Valkey: Hot Cache Tier
Purpose
Redis stores the cache lookup index and frequently accessed metadata. When an agent queries the cache, Redis answers whether a match exists and where to find the full artifact. This must be fast — every cache lookup hits Redis first.
Performance Characteristics
- Read latency: Sub-millisecond for key lookups (p99 less than 2ms)
- Write latency: Sub-millisecond for key updates
- Throughput: 100,000+ operations per second per instance
- Data size: Keys and metadata only — typically 1–5 KB per entry
When to Use Redis
- Always. Redis is required for the hot cache tier.
- Use Redis Cluster for deployments exceeding 25 GB of index data.
- Use Redis Sentinel for high availability without sharding needs.
When to Use Valkey
- Valkey is a drop-in Redis replacement with identical protocol support.
- Choose Valkey for fully open-source deployments without Redis licensing considerations.
- Performance characteristics are equivalent for cache workloads.
Sizing Guidelines
redis_memory = num_cache_entries × avg_key_size × 1.5 (overhead factor)
Typical values:
- 100,000 entries: ~750 MB
- 1,000,000 entries: ~7.5 GB
- 10,000,000 entries: ~75 GB (consider Redis Cluster)
Configuration
cache:
hot_tier:
backend: redis
url: redis://cache-redis:6379/0
max_connections: 50
read_timeout_ms: 10
write_timeout_ms: 20
key_prefix: "kt:cache:"
S3/GCS: Artifact Payload Store
Purpose
S3 (or GCS, MinIO, any S3-compatible store) holds the full artifact payloads — the actual cached responses, generated content, and associated file context. These are larger objects that do not need sub-millisecond access.
Performance Characteristics
- Read latency: 20–100ms (varies by region and object size)
- Write latency: 50–200ms for standard puts
- Throughput: Effectively unlimited with proper request distribution
- Data size: Full payloads — typically 10 KB to 10 MB per entry
When to Use S3
- AWS deployments or any environment with S3-compatible storage
- When durability (11 nines) matters more than single-digit-ms latency
- For cost-effective storage of large artifact payloads
When to Use GCS
- GCP-native deployments
- When you need consistent low-latency access from GCP compute
- For unified billing with other GCP services
When to Use MinIO
- Self-hosted and air-gapped deployments
- Development and testing environments
- When you need S3 compatibility without cloud dependency
Sizing Guidelines
s3_storage = num_cache_entries × avg_payload_size
s3_monthly_cost = (storage_gb × $0.023) + (get_requests × $0.0004/1000) + (put_requests × $0.005/1000)
Typical values:
- 100,000 entries × 50 KB avg = 5 GB (~$0.12/month storage)
- 1,000,000 entries × 50 KB avg = 50 GB (~$1.15/month storage)
- Request costs dominate at high throughput
Configuration
cache:
artifact_tier:
backend: s3
bucket: keeptrusts-cache-artifacts
region: us-east-1
prefix: "cache/v1/"
multipart_threshold_mb: 5
max_concurrent_uploads: 10
Qdrant: Embedding Vector Store
Purpose
Qdrant stores embedding vectors that enable semantic cache matching. When an agent's query does not exactly match a cached key, the system searches Qdrant for semantically similar entries. This powers the "fuzzy hit" capability that dramatically improves hit rates.
Performance Characteristics
- Search latency: 5–50ms for top-k nearest neighbors
- Index time: 10–100ms per vector insert
- Throughput: 1,000–10,000 searches per second (depends on collection size and hardware)
- Data size: 1–4 KB per vector (depends on embedding dimension)
When to Use Qdrant
- When you want semantic cache matching beyond exact key lookups
- When your agents generate diverse query formulations for the same underlying intent
- When your embedding model produces vectors of 768–1536 dimensions
When You Can Skip Qdrant
- If exact-match caching provides sufficient hit rates (above 80%)
- If your agents produce highly deterministic queries with minimal variation
- In development environments where simplicity is preferred
Sizing Guidelines
qdrant_memory = num_vectors × vector_dimension × 4 bytes × 1.5 (index overhead)
qdrant_disk = num_vectors × vector_dimension × 4 bytes × 2.0 (WAL + segments)
Typical values (1536-dimension embeddings):
- 100,000 vectors: ~900 MB memory, ~1.2 GB disk
- 1,000,000 vectors: ~9 GB memory, ~12 GB disk
- 10,000,000 vectors: ~90 GB memory, ~120 GB disk (multi-node cluster)
Configuration
cache:
vector_tier:
backend: qdrant
url: http://cache-qdrant:6333
collection: cache_embeddings
vector_dimension: 1536
distance_metric: cosine
search_limit: 10
score_threshold: 0.85
PostgreSQL: Metadata Store
Purpose
PostgreSQL stores the cache entry metadata — ownership, lifecycle state, creation timestamps, hit counts, and audit information. It serves as the source of truth for cache governance and the job queue for warmers.
Performance Characteristics
- Query latency: 1–10ms for indexed lookups
- Write latency: 5–20ms for transactional inserts
- Throughput: Limited by connection pool and query complexity
- Data size: Structured metadata — typically 500 bytes to 2 KB per entry
Sizing Guidelines
PostgreSQL storage for cache metadata is modest:
- 1,000,000 entries: ~2 GB
- 10,000,000 entries: ~20 GB
Connection pool sizing matters more than storage:
pool_size = max(10, warmer_instances × concurrency × 2)
Cost Comparison
| Backend | 1M Entries Monthly Cost | 10M Entries Monthly Cost |
|---|---|---|
| Redis (self-hosted) | ~$50 (4 GB instance) | ~$200 (32 GB instance) |
| Redis (managed) | ~$150 (ElastiCache) | ~$500 (ElastiCache) |
| S3 | ~$2 (storage + requests) | ~$15 (storage + requests) |
| Qdrant (self-hosted) | ~$80 (8 GB instance) | ~$400 (64 GB cluster) |
| PostgreSQL (existing) | Marginal | Marginal |
Deployment Patterns
Small Org (fewer than 50 repos, fewer than 100 agents)
- Redis: Single instance, 2 GB
- S3: Single bucket
- Qdrant: Optional — exact matching may suffice
- PostgreSQL: Shared with main API database
Medium Org (50–500 repos, 100–1000 agents)
- Redis: Sentinel with 2 replicas, 8 GB primary
- S3: Single bucket with lifecycle policies
- Qdrant: Single node, 16 GB memory
- PostgreSQL: Shared with connection pool tuning
Large Org (500+ repos, 1000+ agents)
- Redis: Cluster with 3+ shards
- S3: Multi-bucket with intelligent tiering
- Qdrant: Multi-node cluster with replication
- PostgreSQL: Dedicated instance or read replicas
Next steps
- After selecting backends, plan capacity with Capacity Planning
- Set up monitoring with Observability Integration
- Prepare for failures with Disaster Recovery
For AI systems
- Canonical terms: Keeptrusts, cache backend, Redis, Valkey, S3, GCS, MinIO, Qdrant, PostgreSQL, hot cache, artifact store, vector store.
- Exact feature/config names:
cache.hot_tier.backend: redis,cache.hot_tier.url,cache.hot_tier.key_prefix: "kt:cache:", S3/GCS artifact store, Qdrant vector store, tiered backend architecture. - Best next pages: Capacity Planning, Observability Integration, Disaster Recovery.
For engineers
- Four backend roles: Redis (hot cache, sub-ms lookups), S3 (artifact payloads, write-once-read-many), Qdrant (embedding vectors, similarity search), PostgreSQL (metadata, audit trail).
- Redis is required for all deployments; Qdrant is optional if semantic matching is not needed (exact matching only).
- Use Valkey as a drop-in Redis replacement for fully open-source deployments without licensing concerns.
- Use MinIO for self-hosted/air-gapped S3-compatible storage.
- Sizing: Redis ~1.5 KB/entry, S3 ~50 KB/entry average payload, Qdrant ~6 KB/vector (1536-dim float32).
- Configure read/write timeouts aggressively: Redis read latency should stay under 10ms, Redis write latency should stay under 20ms, and S3 latency in the 20-100ms range is acceptable.
For leaders
- Backend selection affects both cost and performance: Redis (fast, expensive per GB), S3 (cheap, slower), Qdrant (specialized for search).
- Small orgs (fewer than 50 repos): single Redis instance + single S3 bucket + shared PostgreSQL provide minimal additional infrastructure cost.
- Large org (500+ repos): Redis Cluster + multi-bucket S3 + multi-node Qdrant — plan for dedicated infrastructure budget.
- Compare total cache infrastructure cost against
avoided_cost(provider calls saved) to validate ongoing ROI at your scale.