Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Tuning Cache Warmer Concurrency

The KEEPTRUSTS_CACHE_WARMER_CONCURRENCY setting controls how many cache generation jobs each warmer instance processes simultaneously. Tuning this value correctly ensures your cache stays warm without overwhelming your infrastructure.

Use this page when

  • You need to tune KEEPTRUSTS_CACHE_WARMER_CONCURRENCY based on your workload, resource constraints, and queue depth.
  • You are deciding between vertical scaling (more concurrency per instance) and horizontal scaling (more instances).
  • You see queue depth alerts firing and need to determine the right scaling response.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Default Configuration

Each warmer instance defaults to processing 4 concurrent jobs:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=4

This means a single warmer process runs up to 4 artifact generation tasks in parallel. Each task may involve LLM calls, file parsing, embedding computation, or graph construction depending on the artifact type.

Understanding Resource Consumption

Different artifact types have different resource profiles:

Artifact TypeCPUMemoryLLM CallsTypical Duration
repo_mapLowLow05–15s
dependency_graphMediumMedium010–30s
test_mapMediumLow010–20s
api_inventoryMediumMedium1–330–60s
symbol_indexHighHigh030–120s
embedding_indexHighHighMany60–300s
file_summaryLowLow1 per file10–30s per file

When tuning concurrency, consider the mix of artifact types in your queue. A queue dominated by embedding_index jobs needs lower concurrency than one processing mostly repo_map entries.

When to Increase Concurrency

Increase KEEPTRUSTS_CACHE_WARMER_CONCURRENCY when:

Large Repositories

Repositories with 500K+ lines of code generate many warming jobs on connect. Higher concurrency reduces the time to full cache coverage:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8

Many Connected Repositories

Organizations with 50+ repositories produce a steady stream of commit-triggered jobs. Increase concurrency to prevent queue buildup:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=6

Slow Embedding Generation

If your embedding model has high latency (>1s per chunk), higher concurrency keeps the pipeline saturated while waiting for responses:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=10

Rapid Development Cadence

Teams pushing 50+ commits per day need warmers that keep pace with change velocity:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8

When to Decrease Concurrency

Decrease KEEPTRUSTS_CACHE_WARMER_CONCURRENCY when:

Limited Compute Resources

If the warmer shares a host with other services, high concurrency can starve critical workloads:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2

Shared Infrastructure

When multiple warmer instances compete for the same database connection pool or LLM endpoint, reduce per-instance concurrency to avoid contention:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2

Memory Constraints

symbol_index and embedding_index jobs can consume 500MB–2GB each. On memory-constrained hosts, limit concurrency to prevent OOM kills:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=2

Rate-Limited LLM Providers

If your LLM provider enforces strict rate limits, reduce concurrency to stay within quota:

KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=3

Monitoring Queue Depth

Track queue depth to determine if your concurrency settings are adequate. Check the console under Settings → Engineering Cache → Warmers → Queue:

  • Queue depth: Number of jobs waiting. Sustained depth > 50 suggests you need more capacity.
  • Oldest job age: Time the oldest queued job has been waiting. If this exceeds 10 minutes during normal operation, increase concurrency or add instances.
  • Processing rate: Jobs completed per minute. Compare against job arrival rate.
  • Average job duration: Helps predict drain time for the current queue.

Alerting on Queue Depth

Configure alerts to notify you when the queue grows beyond acceptable levels:

engineering_cache:
alerts:
queue_depth_warning: 50
queue_depth_critical: 200
oldest_job_age_warning_seconds: 600
oldest_job_age_critical_seconds: 1800

Horizontal Scaling with Multiple Workers

Instead of increasing concurrency on a single instance, you can deploy multiple warmer instances. Each instance independently claims jobs from the shared queue using advisory locks, preventing duplicate work.

Scaling Formula

A good starting point:

Total concurrency = instances × KEEPTRUSTS_CACHE_WARMER_CONCURRENCY

For example, 3 instances with concurrency 4 gives you 12 parallel jobs total.

When to Scale Horizontally vs. Vertically

ApproachUse When
Increase concurrency (vertical)Single host has spare CPU/memory
Add instances (horizontal)Need fault tolerance, host is at capacity
BothLarge organizations with high warming demand

Deploying Multiple Instances

In Docker Compose, use the replicas setting:

services:
keeptrusts-cache-warmer:
image: keeptrusts/api:latest
command: ["/usr/local/bin/worker_cache_warmer"]
environment:
KEEPTRUSTS_CACHE_WARMER_CONCURRENCY: "4"
deploy:
replicas: 3

In Kubernetes, set the Deployment replicas:

spec:
replicas: 3

Avoiding Database Connection Exhaustion

Each concurrent job holds a database connection. Calculate your total connection requirement:

Max connections = instances × concurrency × 2 (jobs + polling)

Ensure your database connection pool (KEEPTRUSTS_DB_MAX_CONNECTIONS) accommodates all warmer instances plus the main API. Consider using PgBouncer for connection pooling in large deployments.

Organization SizeReposInstancesConcurrencyTotal Parallel
Small (1–10 devs)1–5144
Medium (10–50 devs)5–20248
Large (50–200 devs)20–1003618
Enterprise (200+ devs)100+5+840+

These are starting points. Monitor queue depth and job age after deployment and adjust based on observed behavior.

Verifying Your Configuration

After changing concurrency settings, verify the warmer is operating correctly:

# Check warmer logs for the new concurrency value
docker compose logs keeptrusts-cache-warmer | grep "concurrency"
# Expected: INFO worker_cache_warmer: starting cache warmer worker concurrency=8

# Monitor queue drain rate
watch -n5 'curl -s http://localhost:8081/health | jq .queue_depth'

Next steps

For AI systems

  • Canonical terms: Keeptrusts, KEEPTRUSTS_CACHE_WARMER_CONCURRENCY, horizontal scaling, queue depth, advisory lock, warmer instances, worker_cache_warmer.
  • Config keys: KEEPTRUSTS_CACHE_WARMER_CONCURRENCY, engineering_cache.alerts.queue_depth_warning, engineering_cache.alerts.queue_depth_critical, engineering_cache.alerts.oldest_job_age_warning_seconds.
  • Best next pages: Setting Up Cache Warmers, Warmer Triggers, Cache TTL and Expiry.

For engineers

  • Default: 4 concurrent jobs per instance. Resource profiles vary: repo_map is light (5–15s), embedding_index is heavy (60–300s, 500MB–2GB).
  • Increase concurrency (6–8) for: large repos (500K+ LOC), many connected repos (50+), slow embedding models, high commit velocity (50+/day).
  • Decrease concurrency (2) for: shared hosts, memory constraints, rate-limited LLM providers.
  • Total concurrency formula: instances × KEEPTRUSTS_CACHE_WARMER_CONCURRENCY. E.g., 3 instances × 4 = 12 parallel jobs.
  • Max DB connections needed: instances × concurrency × 2. Ensure pool accommodates all warmers + main API.
  • Monitor: sustained queue depth > 50 or oldest job age > 10 min means you need more capacity.

For leaders

  • Concurrency directly trades infrastructure cost against cache freshness latency.
  • Under-provisioned warmers mean slower cache fills and delayed cost savings after code changes.
  • Horizontal scaling provides fault tolerance (one instance failure doesn’t stop warming).
  • Enterprise deployments (200+ devs, 100+ repos) typically need 5+ instances with concurrency 8 (40+ parallel jobs).