Cache Backend Selection: Redis, S3, Qdrant

The org-shared cache uses a tiered backend architecture. Each tier stores a different kind of data with different access patterns, retention needs, and performance requirements. This guide helps you choose and configure the right backend for each tier based on your organization's scale, budget, and performance needs.

Use this page when

You are choosing a cache backend (Redis, S3, Qdrant) for your deployment.
You need to compare performance, cost, and operational characteristics of each backend option.
You want to understand which backend suits your scale, latency requirements, and infrastructure constraints.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Architecture Overview

The cache system uses four backend types, each serving a specific role:

Backend	Role	Data Stored	Access Pattern
Redis/Valkey	Hot cache	Cache keys, metadata, lookup indexes	High frequency, low latency reads
S3/GCS	Artifact store	Full response payloads, generated content	Write-once, read-many
Qdrant	Vector store	Embedding vectors for semantic matching	Similarity search queries
PostgreSQL	Metadata	Entry lifecycle, ownership, audit trail	Transactional writes, indexed reads

Redis/Valkey: Hot Cache Tier

Purpose

Redis stores the cache lookup index and frequently accessed metadata. When an agent queries the cache, Redis answers whether a match exists and where to find the full artifact. This must be fast — every cache lookup hits Redis first.

Performance Characteristics

Read latency: Sub-millisecond for key lookups (p99 less than 2ms)
Write latency: Sub-millisecond for key updates
Throughput: 100,000+ operations per second per instance
Data size: Keys and metadata only — typically 1–5 KB per entry

When to Use Redis

Always. Redis is required for the hot cache tier.
Use Redis Cluster for deployments exceeding 25 GB of index data.
Use Redis Sentinel for high availability without sharding needs.

When to Use Valkey

Valkey is a drop-in Redis replacement with identical protocol support.
Choose Valkey for fully open-source deployments without Redis licensing considerations.
Performance characteristics are equivalent for cache workloads.

Sizing Guidelines

redis_memory = num_cache_entries × avg_key_size × 1.5 (overhead factor)

Typical values:

100,000 entries: ~750 MB
1,000,000 entries: ~7.5 GB
10,000,000 entries: ~75 GB (consider Redis Cluster)

Configuration

cache:
  hot_tier:
    backend: redis
    url: redis://cache-redis:6379/0
    max_connections: 50
    read_timeout_ms: 10
    write_timeout_ms: 20
    key_prefix: "kt:cache:"

S3/GCS: Artifact Payload Store

Purpose

S3 (or GCS, MinIO, any S3-compatible store) holds the full artifact payloads — the actual cached responses, generated content, and associated file context. These are larger objects that do not need sub-millisecond access.

Performance Characteristics

Read latency: 20–100ms (varies by region and object size)
Write latency: 50–200ms for standard puts
Throughput: Effectively unlimited with proper request distribution
Data size: Full payloads — typically 10 KB to 10 MB per entry

When to Use S3

AWS deployments or any environment with S3-compatible storage
When durability (11 nines) matters more than single-digit-ms latency
For cost-effective storage of large artifact payloads

When to Use GCS

GCP-native deployments
When you need consistent low-latency access from GCP compute
For unified billing with other GCP services

When to Use MinIO

Self-hosted and air-gapped deployments
Development and testing environments
When you need S3 compatibility without cloud dependency

Sizing Guidelines

s3_storage = num_cache_entries × avg_payload_size
s3_monthly_cost = (storage_gb × $0.023) + (get_requests × $0.0004/1000) + (put_requests × $0.005/1000)

Typical values:

100,000 entries × 50 KB avg = 5 GB (~$0.12/month storage)
1,000,000 entries × 50 KB avg = 50 GB (~$1.15/month storage)
Request costs dominate at high throughput

Configuration

cache:
  artifact_tier:
    backend: s3
    bucket: keeptrusts-cache-artifacts
    region: us-east-1
    prefix: "cache/v1/"
    multipart_threshold_mb: 5
    max_concurrent_uploads: 10

Qdrant: Embedding Vector Store

Purpose

Qdrant stores embedding vectors that enable semantic cache matching. When an agent's query does not exactly match a cached key, the system searches Qdrant for semantically similar entries. This powers the "fuzzy hit" capability that dramatically improves hit rates.

Performance Characteristics

Search latency: 5–50ms for top-k nearest neighbors
Index time: 10–100ms per vector insert
Throughput: 1,000–10,000 searches per second (depends on collection size and hardware)
Data size: 1–4 KB per vector (depends on embedding dimension)

When to Use Qdrant

When you want semantic cache matching beyond exact key lookups
When your agents generate diverse query formulations for the same underlying intent
When your embedding model produces vectors of 768–1536 dimensions

When You Can Skip Qdrant

If exact-match caching provides sufficient hit rates (above 80%)
If your agents produce highly deterministic queries with minimal variation
In development environments where simplicity is preferred

Sizing Guidelines

qdrant_memory = num_vectors × vector_dimension × 4 bytes × 1.5 (index overhead)
qdrant_disk = num_vectors × vector_dimension × 4 bytes × 2.0 (WAL + segments)

Typical values (1536-dimension embeddings):

100,000 vectors: ~900 MB memory, ~1.2 GB disk
1,000,000 vectors: ~9 GB memory, ~12 GB disk
10,000,000 vectors: ~90 GB memory, ~120 GB disk (multi-node cluster)

Configuration

cache:
  vector_tier:
    backend: qdrant
    url: http://cache-qdrant:6333
    collection: cache_embeddings
    vector_dimension: 1536
    distance_metric: cosine
    search_limit: 10
    score_threshold: 0.85

PostgreSQL: Metadata Store

Purpose

PostgreSQL stores the cache entry metadata — ownership, lifecycle state, creation timestamps, hit counts, and audit information. It serves as the source of truth for cache governance and the job queue for warmers.

Performance Characteristics

Query latency: 1–10ms for indexed lookups
Write latency: 5–20ms for transactional inserts
Throughput: Limited by connection pool and query complexity
Data size: Structured metadata — typically 500 bytes to 2 KB per entry

Sizing Guidelines

PostgreSQL storage for cache metadata is modest:

1,000,000 entries: ~2 GB
10,000,000 entries: ~20 GB

Connection pool sizing matters more than storage:

pool_size = max(10, warmer_instances × concurrency × 2)

Cost Comparison

Backend	1M Entries Monthly Cost	10M Entries Monthly Cost
Redis (self-hosted)	~$50 (4 GB instance)	~$200 (32 GB instance)
Redis (managed)	~$150 (ElastiCache)	~$500 (ElastiCache)
S3	~$2 (storage + requests)	~$15 (storage + requests)
Qdrant (self-hosted)	~$80 (8 GB instance)	~$400 (64 GB cluster)
PostgreSQL (existing)	Marginal	Marginal

Deployment Patterns

Small Org (fewer than 50 repos, fewer than 100 agents)

Redis: Single instance, 2 GB
S3: Single bucket
Qdrant: Optional — exact matching may suffice
PostgreSQL: Shared with main API database

Medium Org (50–500 repos, 100–1000 agents)

Redis: Sentinel with 2 replicas, 8 GB primary
S3: Single bucket with lifecycle policies
Qdrant: Single node, 16 GB memory
PostgreSQL: Shared with connection pool tuning

Large Org (500+ repos, 1000+ agents)

Redis: Cluster with 3+ shards
S3: Multi-bucket with intelligent tiering
Qdrant: Multi-node cluster with replication
PostgreSQL: Dedicated instance or read replicas

Next steps

After selecting backends, plan capacity with Capacity Planning
Set up monitoring with Observability Integration
Prepare for failures with Disaster Recovery

For AI systems

Canonical terms: Keeptrusts, cache backend, Redis, Valkey, S3, GCS, MinIO, Qdrant, PostgreSQL, hot cache, artifact store, vector store.
Exact feature/config names: cache.hot_tier.backend: redis, cache.hot_tier.url, cache.hot_tier.key_prefix: "kt:cache:", S3/GCS artifact store, Qdrant vector store, tiered backend architecture.
Best next pages: Capacity Planning, Observability Integration, Disaster Recovery.

For engineers

Four backend roles: Redis (hot cache, sub-ms lookups), S3 (artifact payloads, write-once-read-many), Qdrant (embedding vectors, similarity search), PostgreSQL (metadata, audit trail).
Redis is required for all deployments; Qdrant is optional if semantic matching is not needed (exact matching only).
Use Valkey as a drop-in Redis replacement for fully open-source deployments without licensing concerns.
Use MinIO for self-hosted/air-gapped S3-compatible storage.
Sizing: Redis ~1.5 KB/entry, S3 ~50 KB/entry average payload, Qdrant ~6 KB/vector (1536-dim float32).
Configure read/write timeouts aggressively: Redis read latency should stay under 10ms, Redis write latency should stay under 20ms, and S3 latency in the 20-100ms range is acceptable.

For leaders

Backend selection affects both cost and performance: Redis (fast, expensive per GB), S3 (cheap, slower), Qdrant (specialized for search).
Small orgs (fewer than 50 repos): single Redis instance + single S3 bucket + shared PostgreSQL provide minimal additional infrastructure cost.
Large org (500+ repos): Redis Cluster + multi-bucket S3 + multi-node Qdrant — plan for dedicated infrastructure budget.
Compare total cache infrastructure cost against avoided_cost (provider calls saved) to validate ongoing ROI at your scale.

Use this page when​

Primary audience​

Architecture Overview​

Redis/Valkey: Hot Cache Tier​

Purpose​

Performance Characteristics​

When to Use Redis​

When to Use Valkey​

Sizing Guidelines​

Configuration​

S3/GCS: Artifact Payload Store​

Purpose​

Performance Characteristics​

When to Use S3​

When to Use GCS​

When to Use MinIO​

Sizing Guidelines​

Configuration​

Qdrant: Embedding Vector Store​

Purpose​

Performance Characteristics​

When to Use Qdrant​

When You Can Skip Qdrant​

Sizing Guidelines​

Configuration​

PostgreSQL: Metadata Store​

Purpose​

Performance Characteristics​

Sizing Guidelines​

Cost Comparison​

Deployment Patterns​

Small Org (fewer than 50 repos, fewer than 100 agents)​

Medium Org (50–500 repos, 100–1000 agents)​

Large Org (500+ repos, 1000+ agents)​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Architecture Overview

Redis/Valkey: Hot Cache Tier

Purpose

Performance Characteristics

When to Use Redis

When to Use Valkey

Sizing Guidelines

Configuration

S3/GCS: Artifact Payload Store

Purpose

Performance Characteristics

When to Use S3

When to Use GCS

When to Use MinIO

Sizing Guidelines

Configuration

Qdrant: Embedding Vector Store

Purpose

Performance Characteristics

When to Use Qdrant

When You Can Skip Qdrant

Sizing Guidelines

Configuration

PostgreSQL: Metadata Store

Purpose

Performance Characteristics

Sizing Guidelines

Cost Comparison

Deployment Patterns

Small Org (fewer than 50 repos, fewer than 100 agents)

Medium Org (50–500 repos, 100–1000 agents)

Large Org (500+ repos, 1000+ agents)

Next steps

For AI systems

For engineers

For leaders