Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cache Backend Selection: Redis, S3, Qdrant

The org-shared cache uses a tiered backend architecture. Each tier stores a different kind of data with different access patterns, retention needs, and performance requirements. This guide helps you choose and configure the right backend for each tier based on your organization's scale, budget, and performance needs.

Use this page when

  • You are choosing a cache backend (Redis, S3, Qdrant) for your deployment.
  • You need to compare performance, cost, and operational characteristics of each backend option.
  • You want to understand which backend suits your scale, latency requirements, and infrastructure constraints.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Architecture Overview

The cache system uses four backend types, each serving a specific role:

BackendRoleData StoredAccess Pattern
Redis/ValkeyHot cacheCache keys, metadata, lookup indexesHigh frequency, low latency reads
S3/GCSArtifact storeFull response payloads, generated contentWrite-once, read-many
QdrantVector storeEmbedding vectors for semantic matchingSimilarity search queries
PostgreSQLMetadataEntry lifecycle, ownership, audit trailTransactional writes, indexed reads

Redis/Valkey: Hot Cache Tier

Purpose

Redis stores the cache lookup index and frequently accessed metadata. When an agent queries the cache, Redis answers whether a match exists and where to find the full artifact. This must be fast — every cache lookup hits Redis first.

Performance Characteristics

  • Read latency: Sub-millisecond for key lookups (p99 less than 2ms)
  • Write latency: Sub-millisecond for key updates
  • Throughput: 100,000+ operations per second per instance
  • Data size: Keys and metadata only — typically 1–5 KB per entry

When to Use Redis

  • Always. Redis is required for the hot cache tier.
  • Use Redis Cluster for deployments exceeding 25 GB of index data.
  • Use Redis Sentinel for high availability without sharding needs.

When to Use Valkey

  • Valkey is a drop-in Redis replacement with identical protocol support.
  • Choose Valkey for fully open-source deployments without Redis licensing considerations.
  • Performance characteristics are equivalent for cache workloads.

Sizing Guidelines

redis_memory = num_cache_entries × avg_key_size × 1.5 (overhead factor)

Typical values:

  • 100,000 entries: ~750 MB
  • 1,000,000 entries: ~7.5 GB
  • 10,000,000 entries: ~75 GB (consider Redis Cluster)

Configuration

cache:
hot_tier:
backend: redis
url: redis://cache-redis:6379/0
max_connections: 50
read_timeout_ms: 10
write_timeout_ms: 20
key_prefix: "kt:cache:"

S3/GCS: Artifact Payload Store

Purpose

S3 (or GCS, MinIO, any S3-compatible store) holds the full artifact payloads — the actual cached responses, generated content, and associated file context. These are larger objects that do not need sub-millisecond access.

Performance Characteristics

  • Read latency: 20–100ms (varies by region and object size)
  • Write latency: 50–200ms for standard puts
  • Throughput: Effectively unlimited with proper request distribution
  • Data size: Full payloads — typically 10 KB to 10 MB per entry

When to Use S3

  • AWS deployments or any environment with S3-compatible storage
  • When durability (11 nines) matters more than single-digit-ms latency
  • For cost-effective storage of large artifact payloads

When to Use GCS

  • GCP-native deployments
  • When you need consistent low-latency access from GCP compute
  • For unified billing with other GCP services

When to Use MinIO

  • Self-hosted and air-gapped deployments
  • Development and testing environments
  • When you need S3 compatibility without cloud dependency

Sizing Guidelines

s3_storage = num_cache_entries × avg_payload_size
s3_monthly_cost = (storage_gb × $0.023) + (get_requests × $0.0004/1000) + (put_requests × $0.005/1000)

Typical values:

  • 100,000 entries × 50 KB avg = 5 GB (~$0.12/month storage)
  • 1,000,000 entries × 50 KB avg = 50 GB (~$1.15/month storage)
  • Request costs dominate at high throughput

Configuration

cache:
artifact_tier:
backend: s3
bucket: keeptrusts-cache-artifacts
region: us-east-1
prefix: "cache/v1/"
multipart_threshold_mb: 5
max_concurrent_uploads: 10

Qdrant: Embedding Vector Store

Purpose

Qdrant stores embedding vectors that enable semantic cache matching. When an agent's query does not exactly match a cached key, the system searches Qdrant for semantically similar entries. This powers the "fuzzy hit" capability that dramatically improves hit rates.

Performance Characteristics

  • Search latency: 5–50ms for top-k nearest neighbors
  • Index time: 10–100ms per vector insert
  • Throughput: 1,000–10,000 searches per second (depends on collection size and hardware)
  • Data size: 1–4 KB per vector (depends on embedding dimension)

When to Use Qdrant

  • When you want semantic cache matching beyond exact key lookups
  • When your agents generate diverse query formulations for the same underlying intent
  • When your embedding model produces vectors of 768–1536 dimensions

When You Can Skip Qdrant

  • If exact-match caching provides sufficient hit rates (above 80%)
  • If your agents produce highly deterministic queries with minimal variation
  • In development environments where simplicity is preferred

Sizing Guidelines

qdrant_memory = num_vectors × vector_dimension × 4 bytes × 1.5 (index overhead)
qdrant_disk = num_vectors × vector_dimension × 4 bytes × 2.0 (WAL + segments)

Typical values (1536-dimension embeddings):

  • 100,000 vectors: ~900 MB memory, ~1.2 GB disk
  • 1,000,000 vectors: ~9 GB memory, ~12 GB disk
  • 10,000,000 vectors: ~90 GB memory, ~120 GB disk (multi-node cluster)

Configuration

cache:
vector_tier:
backend: qdrant
url: http://cache-qdrant:6333
collection: cache_embeddings
vector_dimension: 1536
distance_metric: cosine
search_limit: 10
score_threshold: 0.85

PostgreSQL: Metadata Store

Purpose

PostgreSQL stores the cache entry metadata — ownership, lifecycle state, creation timestamps, hit counts, and audit information. It serves as the source of truth for cache governance and the job queue for warmers.

Performance Characteristics

  • Query latency: 1–10ms for indexed lookups
  • Write latency: 5–20ms for transactional inserts
  • Throughput: Limited by connection pool and query complexity
  • Data size: Structured metadata — typically 500 bytes to 2 KB per entry

Sizing Guidelines

PostgreSQL storage for cache metadata is modest:

  • 1,000,000 entries: ~2 GB
  • 10,000,000 entries: ~20 GB

Connection pool sizing matters more than storage:

pool_size = max(10, warmer_instances × concurrency × 2)

Cost Comparison

Backend1M Entries Monthly Cost10M Entries Monthly Cost
Redis (self-hosted)~$50 (4 GB instance)~$200 (32 GB instance)
Redis (managed)~$150 (ElastiCache)~$500 (ElastiCache)
S3~$2 (storage + requests)~$15 (storage + requests)
Qdrant (self-hosted)~$80 (8 GB instance)~$400 (64 GB cluster)
PostgreSQL (existing)MarginalMarginal

Deployment Patterns

Small Org (fewer than 50 repos, fewer than 100 agents)

  • Redis: Single instance, 2 GB
  • S3: Single bucket
  • Qdrant: Optional — exact matching may suffice
  • PostgreSQL: Shared with main API database

Medium Org (50–500 repos, 100–1000 agents)

  • Redis: Sentinel with 2 replicas, 8 GB primary
  • S3: Single bucket with lifecycle policies
  • Qdrant: Single node, 16 GB memory
  • PostgreSQL: Shared with connection pool tuning

Large Org (500+ repos, 1000+ agents)

  • Redis: Cluster with 3+ shards
  • S3: Multi-bucket with intelligent tiering
  • Qdrant: Multi-node cluster with replication
  • PostgreSQL: Dedicated instance or read replicas

Next steps

For AI systems

  • Canonical terms: Keeptrusts, cache backend, Redis, Valkey, S3, GCS, MinIO, Qdrant, PostgreSQL, hot cache, artifact store, vector store.
  • Exact feature/config names: cache.hot_tier.backend: redis, cache.hot_tier.url, cache.hot_tier.key_prefix: "kt:cache:", S3/GCS artifact store, Qdrant vector store, tiered backend architecture.
  • Best next pages: Capacity Planning, Observability Integration, Disaster Recovery.

For engineers

  • Four backend roles: Redis (hot cache, sub-ms lookups), S3 (artifact payloads, write-once-read-many), Qdrant (embedding vectors, similarity search), PostgreSQL (metadata, audit trail).
  • Redis is required for all deployments; Qdrant is optional if semantic matching is not needed (exact matching only).
  • Use Valkey as a drop-in Redis replacement for fully open-source deployments without licensing concerns.
  • Use MinIO for self-hosted/air-gapped S3-compatible storage.
  • Sizing: Redis ~1.5 KB/entry, S3 ~50 KB/entry average payload, Qdrant ~6 KB/vector (1536-dim float32).
  • Configure read/write timeouts aggressively: Redis read latency should stay under 10ms, Redis write latency should stay under 20ms, and S3 latency in the 20-100ms range is acceptable.

For leaders

  • Backend selection affects both cost and performance: Redis (fast, expensive per GB), S3 (cheap, slower), Qdrant (specialized for search).
  • Small orgs (fewer than 50 repos): single Redis instance + single S3 bucket + shared PostgreSQL provide minimal additional infrastructure cost.
  • Large org (500+ repos): Redis Cluster + multi-bucket S3 + multi-node Qdrant — plan for dedicated infrastructure budget.
  • Compare total cache infrastructure cost against avoided_cost (provider calls saved) to validate ongoing ROI at your scale.