Capacity Planning for Cached Engineering Orgs

Proper capacity planning ensures your cache infrastructure grows with your organization without service degradation or unexpected cost spikes. This guide provides formulas, reference values, and planning horizons for each cache backend.

Use this page when

You are planning cache infrastructure capacity for your organization's growth trajectory.
You need formulas or heuristics for sizing storage, memory, and compute based on engineer count and repo count.
You want to estimate costs and resource requirements before scaling up.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Core Formula

The fundamental capacity formula for org-shared cache is:

total_storage = repos × avg_artifacts_per_repo × avg_artifact_size

Each backend stores a different projection of this data, so you apply the formula with backend-specific multipliers.

Input Variables

Gather these values for your organization:

Variable	How to Measure	Typical Range
`repos`	Count of monitored repositories	10–5,000
`avg_artifacts_per_repo`	Files × indexing density	50–2,000
`avg_artifact_size`	Mean response payload size	5 KB–100 KB
`embedding_dimension`	Your embedding model's output size	768–1,536
`monthly_growth_rate`	New repos + new content per month	5–20%
`hit_rate`	Current or target hit rate	60–85%
`daily_queries`	Total cache lookups per day	1,000–1,000,000

Redis Memory Sizing

Redis stores cache keys, metadata indexes, and lookup tables. Each cache entry requires a Redis key with associated metadata.

Formula

redis_memory = entries × key_overhead

Where:

entries = repos × avg_artifacts_per_repo
key_overhead = 1.5 KB (key string + hash fields + TTL + internal overhead)

Reference Sizing Table

Repositories	Artifacts/Repo	Total Entries	Redis Memory
50	200	10,000	15 MB
200	500	100,000	150 MB
500	1,000	500,000	750 MB
1,000	1,000	1,000,000	1.5 GB
2,000	1,500	3,000,000	4.5 GB
5,000	2,000	10,000,000	15 GB

Planning Recommendations

Provision 2× your calculated need for headroom and peak handling
Set Redis maxmemory-policy to allkeys-lru so cold entries evict gracefully
Plan for Redis Cluster when projected memory exceeds 16 GB per instance
Budget for replication: multiply by 2 (primary + replica) or 3 (primary + 2 replicas)

S3 Storage Projection

S3 stores the full artifact payloads — the largest data volume in your cache system.

Formula

s3_storage = entries × avg_payload_size
s3_monthly_requests = daily_queries × 30 × (1 - hit_rate) × fill_factor

Where fill_factor accounts for the fact that only misses trigger new S3 writes (typically 1.0 for writes, plus read amplification for hits).

Reference Sizing Table

Total Entries	Avg Payload	S3 Storage	Monthly Cost (S3 Standard)
10,000	20 KB	200 MB	< $1
100,000	50 KB	5 GB	~$0.12
500,000	50 KB	25 GB	~$0.58
1,000,000	50 KB	50 GB	~$1.15
5,000,000	75 KB	375 GB	~$8.63
10,000,000	100 KB	1 TB	~$23.00

Cost Optimization

Use S3 Intelligent-Tiering for entries with variable access patterns
Set lifecycle rules to transition cold entries to Glacier after 90 days
Enable S3 analytics to identify access pattern shifts
Consider same-region access to minimize transfer costs

Qdrant Cluster Sizing

Qdrant stores embedding vectors for semantic cache matching. Memory requirements scale linearly with vector count and dimension.

Formula

qdrant_memory = vectors × dimension × 4 bytes × index_overhead
qdrant_disk = vectors × dimension × 4 bytes × persistence_factor

Where:

vectors = entries (one vector per cache entry)
dimension = your embedding model output (768, 1024, or 1536)
index_overhead = 1.5 (HNSW index plus metadata)
persistence_factor = 2.0 (WAL + segment storage)

Reference Sizing Table (1536 dimensions)

Vectors	Memory (with index)	Disk	Recommended Instance
10,000	90 MB	120 MB	2 GB instance
100,000	900 MB	1.2 GB	4 GB instance
500,000	4.5 GB	6 GB	8 GB instance
1,000,000	9 GB	12 GB	16 GB instance
5,000,000	45 GB	60 GB	64 GB cluster (3 nodes)
10,000,000	90 GB	120 GB	128 GB cluster (3+ nodes)

Cluster Configuration

For collections exceeding single-node capacity:

qdrant:
  cluster:
    nodes: 3
    replication_factor: 2
    shard_count: 6  # 2 shards per node
  collection:
    vector_size: 1536
    distance: Cosine
    on_disk_payload: true  # Keep payloads on disk, vectors in memory

PostgreSQL Capacity

PostgreSQL stores cache metadata — relatively small but important for planning connection pools and disk.

Formula

pg_storage = entries × metadata_row_size
pg_connections = warmer_instances × concurrency + api_instances × pool_size

Where:

metadata_row_size = ~2 KB (including indexes)
Typical connection pool: 10–50 per API instance

Reference Table

Entries	Storage	Recommended Pool Size
100,000	200 MB	20 connections
1,000,000	2 GB	50 connections
10,000,000	20 GB	100 connections

Growth Projection

Project capacity needs over your planning horizon (typically 6–12 months):

future_entries = current_entries × (1 + monthly_growth_rate) ^ months

Example 12-Month Projection

Starting with 500,000 entries and 10% monthly growth:

Month	Entries	Redis	S3	Qdrant Memory
0	500,000	750 MB	25 GB	4.5 GB
3	665,500	1.0 GB	33 GB	6.0 GB
6	885,780	1.3 GB	44 GB	8.0 GB
9	1,178,860	1.8 GB	59 GB	10.6 GB
12	1,569,210	2.4 GB	78 GB	14.1 GB

Capacity Planning Checklist

Run this checklist quarterly:

Measure current entry count and growth rate
Compare Redis memory usage against provisioned capacity
Check S3 storage growth against budget projections
Verify Qdrant memory headroom exceeds 30%
Confirm PostgreSQL connection pool is not saturated
Review warmer throughput against projected job volume
Update provisioning plan if any backend exceeds 70% utilization
Schedule infrastructure changes at least 2 weeks ahead of projected need

Budget Planning

Combine backend costs for total cache infrastructure budget:

monthly_cost = redis_cost + s3_cost + qdrant_cost + compute_cost

Where:

redis_cost = instance cost based on memory tier
s3_cost = storage + request fees
qdrant_cost = instance cost based on memory/compute
compute_cost = warmer instances × instance price

Compare against avoided_cost (provider calls saved by cache hits) to validate ROI.

Next steps

Choose backends based on your scale with Cache Backend Selection
Monitor utilization trends with Observability Integration
Prepare for failures with Disaster Recovery

For AI systems

Canonical terms: Keeptrusts, capacity planning, Redis memory sizing, S3 storage projection, Qdrant cluster sizing, monthly growth rate, infrastructure budget.
Exact feature/config names: redis_memory = entries × key_overhead, s3_storage = entries × avg_payload_size, qdrant_memory = vectors × dimension × 4 bytes × 1.5, maxmemory-policy: allkeys-lru, Redis Cluster threshold (16 GB).
Best next pages: Cache Backend Selection, Observability Integration, Disaster Recovery.

For engineers

Core formula: total_storage = repos × avg_artifacts_per_repo × avg_artifact_size — apply with backend-specific multipliers.
Redis: ~1.5 KB/entry; plan for 2× calculated need for headroom; Redis Cluster when projected >16 GB per instance.
S3: cost is dominated by storage volume (pennies per GB) plus request fees; use lifecycle rules for cold entries.
Qdrant: vectors × dimension × 4 bytes × 1.5 overhead; HNSW index doubles memory needs.
Monthly review checklist: compare usage vs. provisioned capacity, verify no backend >70% utilization, schedule changes 2 weeks ahead.
Budget: monthly_cost = redis + s3 + qdrant + compute — compare against avoided_cost to validate ROI.

For leaders

Cache infrastructure cost is typically 1-5% of the avoided LLM cost — the ROI is strongly positive at all scales.
Plan for monthly growth rate (5-20%) as new repos are added and existing repos generate more entries.
S3 storage costs are negligible even at large scale (1M entries at 50 KB = 50 GB ≈ $1.15/month).
Redis and Qdrant are the primary scaling cost drivers; right-size instances based on entry count projections and review quarterly.

Use this page when​

Primary audience​

Core Formula​

Input Variables​

Redis Memory Sizing​

Formula​

Reference Sizing Table​

Planning Recommendations​

S3 Storage Projection​

Formula​

Reference Sizing Table​

Cost Optimization​

Qdrant Cluster Sizing​

Formula​

Reference Sizing Table (1536 dimensions)​

Cluster Configuration​

PostgreSQL Capacity​

Formula​

Reference Table​

Growth Projection​

Example 12-Month Projection​

Capacity Planning Checklist​

Budget Planning​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Core Formula

Input Variables

Redis Memory Sizing

Formula

Reference Sizing Table

Planning Recommendations

S3 Storage Projection

Formula

Reference Sizing Table

Cost Optimization

Qdrant Cluster Sizing

Formula

Reference Sizing Table (1536 dimensions)

Cluster Configuration

PostgreSQL Capacity

Formula

Reference Table

Growth Projection

Example 12-Month Projection

Capacity Planning Checklist

Budget Planning

Next steps

For AI systems

For engineers

For leaders