Tuning Cache Warmer Concurrency
The KEEPTRUSTS_CACHE_WARMER_CONCURRENCY setting controls how many cache generation jobs each warmer instance processes simultaneously. Tuning this value correctly ensures your cache stays warm without overwhelming your infrastructure.
Use this page when
- You need to tune
KEEPTRUSTS_CACHE_WARMER_CONCURRENCYbased on your workload, resource constraints, and queue depth. - You are deciding between vertical scaling (more concurrency per instance) and horizontal scaling (more instances).
- You see queue depth alerts firing and need to determine the right scaling response.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Default Configuration
Each warmer instance defaults to processing 4 concurrent jobs:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=4
This means a single warmer process runs up to 4 artifact generation tasks in parallel. Each task may involve LLM calls, file parsing, embedding computation, or graph construction depending on the artifact type.
Understanding Resource Consumption
Different artifact types have different resource profiles:
| Artifact Type | CPU | Memory | LLM Calls | Typical Duration |
|---|---|---|---|---|
repo_map | Low | Low | 0 | 5–15s |
dependency_graph | Medium | Medium | 0 | 10–30s |
test_map | Medium | Low | 0 | 10–20s |
api_inventory | Medium | Medium | 1–3 | 30–60s |
symbol_index | High | High | 0 | 30–120s |
embedding_index | High | High | Many | 60–300s |
file_summary | Low | Low | 1 per file | 10–30s per file |
When tuning concurrency, consider the mix of artifact types in your queue. A queue dominated by embedding_index jobs needs lower concurrency than one processing mostly repo_map entries.
When to Increase Concurrency
Increase KEEPTRUSTS_CACHE_WARMER_CONCURRENCY when:
Large Repositories
Repositories with 500K+ lines of code generate many warming jobs on connect. Higher concurrency reduces the time to full cache coverage:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8
Many Connected Repositories
Organizations with 50+ repositories produce a steady stream of commit-triggered jobs. Increase concurrency to prevent queue buildup:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=6
Slow Embedding Generation
If your embedding model has high latency (>1s per chunk), higher concurrency keeps the pipeline saturated while waiting for responses:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=10
Rapid Development Cadence
Teams pushing 50+ commits per day need warmers that keep pace with change velocity:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8
When to Decrease Concurrency
Decrease KEEPTRUSTS_CACHE_WARMER_CONCURRENCY when:
Limited Compute Resources
If the warmer shares a host with other services, high concurrency can starve critical workloads:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2
Shared Infrastructure
When multiple warmer instances compete for the same database connection pool or LLM endpoint, reduce per-instance concurrency to avoid contention:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2
Memory Constraints
symbol_index and embedding_index jobs can consume 500MB–2GB each. On memory-constrained hosts, limit concurrency to prevent OOM kills:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2
Rate-Limited LLM Providers
If your LLM provider enforces strict rate limits, reduce concurrency to stay within quota:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=3
Monitoring Queue Depth
Track queue depth to determine if your concurrency settings are adequate. Check the console under Settings → Engineering Cache → Warmers → Queue:
- Queue depth: Number of jobs waiting. Sustained depth > 50 suggests you need more capacity.
- Oldest job age: Time the oldest queued job has been waiting. If this exceeds 10 minutes during normal operation, increase concurrency or add instances.
- Processing rate: Jobs completed per minute. Compare against job arrival rate.
- Average job duration: Helps predict drain time for the current queue.
Alerting on Queue Depth
Configure alerts to notify you when the queue grows beyond acceptable levels:
engineering_cache:
alerts:
queue_depth_warning: 50
queue_depth_critical: 200
oldest_job_age_warning_seconds: 600
oldest_job_age_critical_seconds: 1800
Horizontal Scaling with Multiple Workers
Instead of increasing concurrency on a single instance, you can deploy multiple warmer instances. Each instance independently claims jobs from the shared queue using advisory locks, preventing duplicate work.
Scaling Formula
A good starting point:
Total concurrency = instances × KEEPTRUSTS_CACHE_WARMER_CONCURRENCY
For example, 3 instances with concurrency 4 gives you 12 parallel jobs total.
When to Scale Horizontally vs. Vertically
| Approach | Use When |
|---|---|
| Increase concurrency (vertical) | Single host has spare CPU/memory |
| Add instances (horizontal) | Need fault tolerance, host is at capacity |
| Both | Large organizations with high warming demand |
Deploying Multiple Instances
In Docker Compose, use the replicas setting:
services:
keeptrusts-cache-warmer:
image: keeptrusts/api:latest
command: ["/usr/local/bin/worker_cache_warmer"]
environment:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY: "4"
deploy:
replicas: 3
In Kubernetes, set the Deployment replicas:
spec:
replicas: 3
Avoiding Database Connection Exhaustion
Each concurrent job holds a database connection. Calculate your total connection requirement:
Max connections = instances × concurrency × 2 (jobs + polling)
Ensure your database connection pool (KEEPTRUSTS_DB_MAX_CONNECTIONS) accommodates all warmer instances plus the main API. Consider using PgBouncer for connection pooling in large deployments.
Recommended Configurations
| Organization Size | Repos | Instances | Concurrency | Total Parallel |
|---|---|---|---|---|
| Small (1–10 devs) | 1–5 | 1 | 4 | 4 |
| Medium (10–50 devs) | 5–20 | 2 | 4 | 8 |
| Large (50–200 devs) | 20–100 | 3 | 6 | 18 |
| Enterprise (200+ devs) | 100+ | 5+ | 8 | 40+ |
These are starting points. Monitor queue depth and job age after deployment and adjust based on observed behavior.
Verifying Your Configuration
After changing concurrency settings, verify the warmer is operating correctly:
# Check warmer logs for the new concurrency value
docker compose logs keeptrusts-cache-warmer | grep "concurrency"
# Expected: INFO worker_cache_warmer: starting cache warmer worker concurrency=8
# Monitor queue drain rate
watch -n5 'curl -s http://localhost:8081/health | jq .queue_depth'
Next steps
- Setting Up Cache Warmers — Initial warmer deployment.
- Warmer Triggers: Connect, Commit, and Miss — Understand job sources.
- Cache TTL and Expiry — Balance freshness with warming capacity.
For AI systems
- Canonical terms: Keeptrusts,
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY, horizontal scaling, queue depth, advisory lock, warmer instances,worker_cache_warmer. - Config keys:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY,engineering_cache.alerts.queue_depth_warning,engineering_cache.alerts.queue_depth_critical,engineering_cache.alerts.oldest_job_age_warning_seconds. - Best next pages: Setting Up Cache Warmers, Warmer Triggers, Cache TTL and Expiry.
For engineers
- Default: 4 concurrent jobs per instance. Resource profiles vary:
repo_mapis light (5–15s),embedding_indexis heavy (60–300s, 500MB–2GB). - Increase concurrency (6–8) for: large repos (500K+ LOC), many connected repos (50+), slow embedding models, high commit velocity (50+/day).
- Decrease concurrency (2) for: shared hosts, memory constraints, rate-limited LLM providers.
- Total concurrency formula:
instances × KEEPTRUSTS_CACHE_WARMER_CONCURRENCY. E.g., 3 instances × 4 = 12 parallel jobs. - Max DB connections needed:
instances × concurrency × 2. Ensure pool accommodates all warmers + main API. - Monitor: sustained queue depth > 50 or oldest job age > 10 min means you need more capacity.
For leaders
- Concurrency directly trades infrastructure cost against cache freshness latency.
- Under-provisioned warmers mean slower cache fills and delayed cost savings after code changes.
- Horizontal scaling provides fault tolerance (one instance failure doesn’t stop warming).
- Enterprise deployments (200+ devs, 100+ repos) typically need 5+ instances with concurrency 8 (40+ parallel jobs).