Capacity Planning for Cached Engineering Orgs
Proper capacity planning ensures your cache infrastructure grows with your organization without service degradation or unexpected cost spikes. This guide provides formulas, reference values, and planning horizons for each cache backend.
Use this page when
- You are planning cache infrastructure capacity for your organization's growth trajectory.
- You need formulas or heuristics for sizing storage, memory, and compute based on engineer count and repo count.
- You want to estimate costs and resource requirements before scaling up.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Core Formula
The fundamental capacity formula for org-shared cache is:
total_storage = repos × avg_artifacts_per_repo × avg_artifact_size
Each backend stores a different projection of this data, so you apply the formula with backend-specific multipliers.
Input Variables
Gather these values for your organization:
| Variable | How to Measure | Typical Range |
|---|---|---|
repos | Count of monitored repositories | 10–5,000 |
avg_artifacts_per_repo | Files × indexing density | 50–2,000 |
avg_artifact_size | Mean response payload size | 5 KB–100 KB |
embedding_dimension | Your embedding model's output size | 768–1,536 |
monthly_growth_rate | New repos + new content per month | 5–20% |
hit_rate | Current or target hit rate | 60–85% |
daily_queries | Total cache lookups per day | 1,000–1,000,000 |
Redis Memory Sizing
Redis stores cache keys, metadata indexes, and lookup tables. Each cache entry requires a Redis key with associated metadata.
Formula
redis_memory = entries × key_overhead
Where:
entries = repos × avg_artifacts_per_repokey_overhead = 1.5 KB(key string + hash fields + TTL + internal overhead)
Reference Sizing Table
| Repositories | Artifacts/Repo | Total Entries | Redis Memory |
|---|---|---|---|
| 50 | 200 | 10,000 | 15 MB |
| 200 | 500 | 100,000 | 150 MB |
| 500 | 1,000 | 500,000 | 750 MB |
| 1,000 | 1,000 | 1,000,000 | 1.5 GB |
| 2,000 | 1,500 | 3,000,000 | 4.5 GB |
| 5,000 | 2,000 | 10,000,000 | 15 GB |
Planning Recommendations
- Provision 2× your calculated need for headroom and peak handling
- Set Redis
maxmemory-policytoallkeys-lruso cold entries evict gracefully - Plan for Redis Cluster when projected memory exceeds 16 GB per instance
- Budget for replication: multiply by 2 (primary + replica) or 3 (primary + 2 replicas)
S3 Storage Projection
S3 stores the full artifact payloads — the largest data volume in your cache system.
Formula
s3_storage = entries × avg_payload_size
s3_monthly_requests = daily_queries × 30 × (1 - hit_rate) × fill_factor
Where fill_factor accounts for the fact that only misses trigger new S3 writes (typically 1.0 for writes, plus read amplification for hits).
Reference Sizing Table
| Total Entries | Avg Payload | S3 Storage | Monthly Cost (S3 Standard) |
|---|---|---|---|
| 10,000 | 20 KB | 200 MB | < $1 |
| 100,000 | 50 KB | 5 GB | ~$0.12 |
| 500,000 | 50 KB | 25 GB | ~$0.58 |
| 1,000,000 | 50 KB | 50 GB | ~$1.15 |
| 5,000,000 | 75 KB | 375 GB | ~$8.63 |
| 10,000,000 | 100 KB | 1 TB | ~$23.00 |
Cost Optimization
- Use S3 Intelligent-Tiering for entries with variable access patterns
- Set lifecycle rules to transition cold entries to Glacier after 90 days
- Enable S3 analytics to identify access pattern shifts
- Consider same-region access to minimize transfer costs
Qdrant Cluster Sizing
Qdrant stores embedding vectors for semantic cache matching. Memory requirements scale linearly with vector count and dimension.
Formula
qdrant_memory = vectors × dimension × 4 bytes × index_overhead
qdrant_disk = vectors × dimension × 4 bytes × persistence_factor
Where:
vectors = entries(one vector per cache entry)dimension= your embedding model output (768, 1024, or 1536)index_overhead= 1.5 (HNSW index plus metadata)persistence_factor= 2.0 (WAL + segment storage)
Reference Sizing Table (1536 dimensions)
| Vectors | Memory (with index) | Disk | Recommended Instance |
|---|---|---|---|
| 10,000 | 90 MB | 120 MB | 2 GB instance |
| 100,000 | 900 MB | 1.2 GB | 4 GB instance |
| 500,000 | 4.5 GB | 6 GB | 8 GB instance |
| 1,000,000 | 9 GB | 12 GB | 16 GB instance |
| 5,000,000 | 45 GB | 60 GB | 64 GB cluster (3 nodes) |
| 10,000,000 | 90 GB | 120 GB | 128 GB cluster (3+ nodes) |
Cluster Configuration
For collections exceeding single-node capacity:
qdrant:
cluster:
nodes: 3
replication_factor: 2
shard_count: 6 # 2 shards per node
collection:
vector_size: 1536
distance: Cosine
on_disk_payload: true # Keep payloads on disk, vectors in memory
PostgreSQL Capacity
PostgreSQL stores cache metadata — relatively small but important for planning connection pools and disk.
Formula
pg_storage = entries × metadata_row_size
pg_connections = warmer_instances × concurrency + api_instances × pool_size
Where:
metadata_row_size= ~2 KB (including indexes)- Typical connection pool: 10–50 per API instance
Reference Table
| Entries | Storage | Recommended Pool Size |
|---|---|---|
| 100,000 | 200 MB | 20 connections |
| 1,000,000 | 2 GB | 50 connections |
| 10,000,000 | 20 GB | 100 connections |
Growth Projection
Project capacity needs over your planning horizon (typically 6–12 months):
future_entries = current_entries × (1 + monthly_growth_rate) ^ months
Example 12-Month Projection
Starting with 500,000 entries and 10% monthly growth:
| Month | Entries | Redis | S3 | Qdrant Memory |
|---|---|---|---|---|
| 0 | 500,000 | 750 MB | 25 GB | 4.5 GB |
| 3 | 665,500 | 1.0 GB | 33 GB | 6.0 GB |
| 6 | 885,780 | 1.3 GB | 44 GB | 8.0 GB |
| 9 | 1,178,860 | 1.8 GB | 59 GB | 10.6 GB |
| 12 | 1,569,210 | 2.4 GB | 78 GB | 14.1 GB |
Capacity Planning Checklist
Run this checklist quarterly:
- Measure current entry count and growth rate
- Compare Redis memory usage against provisioned capacity
- Check S3 storage growth against budget projections
- Verify Qdrant memory headroom exceeds 30%
- Confirm PostgreSQL connection pool is not saturated
- Review warmer throughput against projected job volume
- Update provisioning plan if any backend exceeds 70% utilization
- Schedule infrastructure changes at least 2 weeks ahead of projected need
Budget Planning
Combine backend costs for total cache infrastructure budget:
monthly_cost = redis_cost + s3_cost + qdrant_cost + compute_cost
Where:
redis_cost= instance cost based on memory tiers3_cost= storage + request feesqdrant_cost= instance cost based on memory/computecompute_cost= warmer instances × instance price
Compare against avoided_cost (provider calls saved by cache hits) to validate ROI.
Next steps
- Choose backends based on your scale with Cache Backend Selection
- Monitor utilization trends with Observability Integration
- Prepare for failures with Disaster Recovery
For AI systems
- Canonical terms: Keeptrusts, capacity planning, Redis memory sizing, S3 storage projection, Qdrant cluster sizing, monthly growth rate, infrastructure budget.
- Exact feature/config names:
redis_memory = entries × key_overhead,s3_storage = entries × avg_payload_size,qdrant_memory = vectors × dimension × 4 bytes × 1.5,maxmemory-policy: allkeys-lru, Redis Cluster threshold (16 GB). - Best next pages: Cache Backend Selection, Observability Integration, Disaster Recovery.
For engineers
- Core formula:
total_storage = repos × avg_artifacts_per_repo × avg_artifact_size— apply with backend-specific multipliers. - Redis: ~1.5 KB/entry; plan for 2× calculated need for headroom; Redis Cluster when projected >16 GB per instance.
- S3: cost is dominated by storage volume (pennies per GB) plus request fees; use lifecycle rules for cold entries.
- Qdrant:
vectors × dimension × 4 bytes × 1.5 overhead; HNSW index doubles memory needs. - Monthly review checklist: compare usage vs. provisioned capacity, verify no backend >70% utilization, schedule changes 2 weeks ahead.
- Budget:
monthly_cost = redis + s3 + qdrant + compute— compare againstavoided_costto validate ROI.
For leaders
- Cache infrastructure cost is typically 1-5% of the avoided LLM cost — the ROI is strongly positive at all scales.
- Plan for monthly growth rate (5-20%) as new repos are added and existing repos generate more entries.
- S3 storage costs are negligible even at large scale (1M entries at 50 KB = 50 GB ≈ $1.15/month).
- Redis and Qdrant are the primary scaling cost drivers; right-size instances based on entry count projections and review quarterly.