Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Capacity Planning for Cached Engineering Orgs

Proper capacity planning ensures your cache infrastructure grows with your organization without service degradation or unexpected cost spikes. This guide provides formulas, reference values, and planning horizons for each cache backend.

Use this page when

  • You are planning cache infrastructure capacity for your organization's growth trajectory.
  • You need formulas or heuristics for sizing storage, memory, and compute based on engineer count and repo count.
  • You want to estimate costs and resource requirements before scaling up.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Core Formula

The fundamental capacity formula for org-shared cache is:

total_storage = repos × avg_artifacts_per_repo × avg_artifact_size

Each backend stores a different projection of this data, so you apply the formula with backend-specific multipliers.

Input Variables

Gather these values for your organization:

VariableHow to MeasureTypical Range
reposCount of monitored repositories10–5,000
avg_artifacts_per_repoFiles × indexing density50–2,000
avg_artifact_sizeMean response payload size5 KB–100 KB
embedding_dimensionYour embedding model's output size768–1,536
monthly_growth_rateNew repos + new content per month5–20%
hit_rateCurrent or target hit rate60–85%
daily_queriesTotal cache lookups per day1,000–1,000,000

Redis Memory Sizing

Redis stores cache keys, metadata indexes, and lookup tables. Each cache entry requires a Redis key with associated metadata.

Formula

redis_memory = entries × key_overhead

Where:

  • entries = repos × avg_artifacts_per_repo
  • key_overhead = 1.5 KB (key string + hash fields + TTL + internal overhead)

Reference Sizing Table

RepositoriesArtifacts/RepoTotal EntriesRedis Memory
5020010,00015 MB
200500100,000150 MB
5001,000500,000750 MB
1,0001,0001,000,0001.5 GB
2,0001,5003,000,0004.5 GB
5,0002,00010,000,00015 GB

Planning Recommendations

  • Provision 2× your calculated need for headroom and peak handling
  • Set Redis maxmemory-policy to allkeys-lru so cold entries evict gracefully
  • Plan for Redis Cluster when projected memory exceeds 16 GB per instance
  • Budget for replication: multiply by 2 (primary + replica) or 3 (primary + 2 replicas)

S3 Storage Projection

S3 stores the full artifact payloads — the largest data volume in your cache system.

Formula

s3_storage = entries × avg_payload_size
s3_monthly_requests = daily_queries × 30 × (1 - hit_rate) × fill_factor

Where fill_factor accounts for the fact that only misses trigger new S3 writes (typically 1.0 for writes, plus read amplification for hits).

Reference Sizing Table

Total EntriesAvg PayloadS3 StorageMonthly Cost (S3 Standard)
10,00020 KB200 MB< $1
100,00050 KB5 GB~$0.12
500,00050 KB25 GB~$0.58
1,000,00050 KB50 GB~$1.15
5,000,00075 KB375 GB~$8.63
10,000,000100 KB1 TB~$23.00

Cost Optimization

  • Use S3 Intelligent-Tiering for entries with variable access patterns
  • Set lifecycle rules to transition cold entries to Glacier after 90 days
  • Enable S3 analytics to identify access pattern shifts
  • Consider same-region access to minimize transfer costs

Qdrant Cluster Sizing

Qdrant stores embedding vectors for semantic cache matching. Memory requirements scale linearly with vector count and dimension.

Formula

qdrant_memory = vectors × dimension × 4 bytes × index_overhead
qdrant_disk = vectors × dimension × 4 bytes × persistence_factor

Where:

  • vectors = entries (one vector per cache entry)
  • dimension = your embedding model output (768, 1024, or 1536)
  • index_overhead = 1.5 (HNSW index plus metadata)
  • persistence_factor = 2.0 (WAL + segment storage)

Reference Sizing Table (1536 dimensions)

VectorsMemory (with index)DiskRecommended Instance
10,00090 MB120 MB2 GB instance
100,000900 MB1.2 GB4 GB instance
500,0004.5 GB6 GB8 GB instance
1,000,0009 GB12 GB16 GB instance
5,000,00045 GB60 GB64 GB cluster (3 nodes)
10,000,00090 GB120 GB128 GB cluster (3+ nodes)

Cluster Configuration

For collections exceeding single-node capacity:

qdrant:
cluster:
nodes: 3
replication_factor: 2
shard_count: 6 # 2 shards per node
collection:
vector_size: 1536
distance: Cosine
on_disk_payload: true # Keep payloads on disk, vectors in memory

PostgreSQL Capacity

PostgreSQL stores cache metadata — relatively small but important for planning connection pools and disk.

Formula

pg_storage = entries × metadata_row_size
pg_connections = warmer_instances × concurrency + api_instances × pool_size

Where:

  • metadata_row_size = ~2 KB (including indexes)
  • Typical connection pool: 10–50 per API instance

Reference Table

EntriesStorageRecommended Pool Size
100,000200 MB20 connections
1,000,0002 GB50 connections
10,000,00020 GB100 connections

Growth Projection

Project capacity needs over your planning horizon (typically 6–12 months):

future_entries = current_entries × (1 + monthly_growth_rate) ^ months

Example 12-Month Projection

Starting with 500,000 entries and 10% monthly growth:

MonthEntriesRedisS3Qdrant Memory
0500,000750 MB25 GB4.5 GB
3665,5001.0 GB33 GB6.0 GB
6885,7801.3 GB44 GB8.0 GB
91,178,8601.8 GB59 GB10.6 GB
121,569,2102.4 GB78 GB14.1 GB

Capacity Planning Checklist

Run this checklist quarterly:

  • Measure current entry count and growth rate
  • Compare Redis memory usage against provisioned capacity
  • Check S3 storage growth against budget projections
  • Verify Qdrant memory headroom exceeds 30%
  • Confirm PostgreSQL connection pool is not saturated
  • Review warmer throughput against projected job volume
  • Update provisioning plan if any backend exceeds 70% utilization
  • Schedule infrastructure changes at least 2 weeks ahead of projected need

Budget Planning

Combine backend costs for total cache infrastructure budget:

monthly_cost = redis_cost + s3_cost + qdrant_cost + compute_cost

Where:

  • redis_cost = instance cost based on memory tier
  • s3_cost = storage + request fees
  • qdrant_cost = instance cost based on memory/compute
  • compute_cost = warmer instances × instance price

Compare against avoided_cost (provider calls saved by cache hits) to validate ROI.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, capacity planning, Redis memory sizing, S3 storage projection, Qdrant cluster sizing, monthly growth rate, infrastructure budget.
  • Exact feature/config names: redis_memory = entries × key_overhead, s3_storage = entries × avg_payload_size, qdrant_memory = vectors × dimension × 4 bytes × 1.5, maxmemory-policy: allkeys-lru, Redis Cluster threshold (16 GB).
  • Best next pages: Cache Backend Selection, Observability Integration, Disaster Recovery.

For engineers

  • Core formula: total_storage = repos × avg_artifacts_per_repo × avg_artifact_size — apply with backend-specific multipliers.
  • Redis: ~1.5 KB/entry; plan for 2× calculated need for headroom; Redis Cluster when projected >16 GB per instance.
  • S3: cost is dominated by storage volume (pennies per GB) plus request fees; use lifecycle rules for cold entries.
  • Qdrant: vectors × dimension × 4 bytes × 1.5 overhead; HNSW index doubles memory needs.
  • Monthly review checklist: compare usage vs. provisioned capacity, verify no backend >70% utilization, schedule changes 2 weeks ahead.
  • Budget: monthly_cost = redis + s3 + qdrant + compute — compare against avoided_cost to validate ROI.

For leaders

  • Cache infrastructure cost is typically 1-5% of the avoided LLM cost — the ROI is strongly positive at all scales.
  • Plan for monthly growth rate (5-20%) as new repos are added and existing repos generate more entries.
  • S3 storage costs are negligible even at large scale (1M entries at 50 KB = 50 GB ≈ $1.15/month).
  • Redis and Qdrant are the primary scaling cost drivers; right-size instances based on entry count projections and review quarterly.