Cache Hit Rates: What Good Looks Like
Cache hit rate is the single most important metric for understanding your org-shared cache effectiveness. This guide defines what good looks like at each stage of maturity and how to troubleshoot when hit rates are below expectations.
Use this page when
- You need to define target cache hit rates for your team profile and maturity stage.
- You are troubleshooting unexpectedly low hit rates and need a diagnostic checklist.
- You want to interpret savings dashboard charts and understand what the numbers mean.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What Hit Rate Means
Hit rate = cache hits ÷ total cacheable requests × 100%
- Cache hit: A request whose response was served from cache without calling the upstream provider
- Total cacheable requests: All requests eligible for caching (excludes requests with
no-cacheheaders or isolation policy requirements)
A 85% hit rate means 85 out of every 100 cacheable requests are served from cache at zero provider cost.
Hit Rate Benchmarks
By Team Profile
| Team profile | Expected steady-state hit rate | Rationale |
|---|---|---|
| 100 engineers, 1-3 shared repos | 85-95% | Maximum overlap, high density |
| 100 engineers, 5-10 repos | 75-88% | Strong overlap, some dispersion |
| 50 engineers, 3-5 repos | 70-85% | Good overlap, smaller fill pool |
| 20 engineers, 2-3 repos | 55-70% | Moderate overlap, fewer repeated questions |
| 10 engineers, diverse repos | 35-55% | Lower overlap, more unique questions |
By Time Period
Hit rate improves over time as the cache fills:
| Period | Expected hit rate | What's happening |
|---|---|---|
| Hour 1 | 0-10% | Cache is empty, everything misses |
| Hours 2-4 | 10-25% | Common questions starting to hit |
| Hours 4-12 | 25-50% | Active fill period, coverage building |
| Day 1-3 | 50-70% | Major codebase areas covered |
| Week 1 | 65-80% | Strong coverage, long-tail filling |
| Week 2-4 | 75-88% | Near steady-state for most teams |
| Month 2+ | 80-95% | Mature cache, mostly incremental refills |
The Maturity Model
Week 1: Foundation
Expected hit rate: 40-65%
During the first week:
- Core modules and frequently-accessed code gets cached
- High-traffic patterns (morning standup questions, common errors) start hitting
- Fabric artifacts complete building and start contributing to convergence
- Single-flight fill catches concurrent duplicates
Key indicator: Hit rate should be climbing daily. If it's flat after day 3, investigate.
Month 1: Growth
Expected hit rate: 70-85%
By the end of the first month:
- Most common codebase questions are cached
- Fabric context attachment has stabilized hit patterns
- TTL-expired entries get refilled naturally by ongoing traffic
- Less common modules start getting coverage from individual engineers
Key indicator: Daily savings should exceed daily fill cost by 5-10×.
Month 3: Maturity
Expected hit rate: 80-92%
At three months:
- Long-tail coverage is strong
- Seasonal patterns (sprint planning, release cycles) are captured
- New engineer onboarding hits existing cache entries consistently
- Incremental fill cost is minimal compared to savings
Key indicator: Monthly savings should be 80-95% of what uncached spend would have been.
Steady State: Maintenance
Expected hit rate: 82-95%
Mature caches maintain high hit rates with:
- Automatic refill on TTL expiry (from ongoing traffic)
- Incremental fabric refresh on code changes
- Stable cache key patterns (consistent entitlement digests, config versions)
Key indicator: Hit rate should be stable week-over-week with minor fluctuation around code changes and deployments.
Reading the Savings Dashboard
Navigate to Cost & Spend → Savings to find your hit rate metrics:
Hit Rate Chart
The hit rate chart shows hourly/daily/weekly hit rate over time. Look for:
- Upward trend in first week: Normal fill behavior
- Stable plateau after week 2: Healthy steady state
- Periodic dips: Normal — correspond to code deployments or TTL expiry waves
- Sudden drops: Investigate — may indicate config version bump, policy change, or infrastructure issue
Avoided Cost
Avoided cost = what you would have paid without cache × hit rate:
Avoided cost = (cache_hits × avg_tokens_per_request × cost_per_token)
This number should grow over time and represent 70-90% of your theoretical uncached spend.
Cache Entries
Total active cache entries indicates coverage depth:
| Entries | Interpretation |
|---|---|
| < 1,000 | Early fill phase, limited coverage |
| 1,000 - 10,000 | Good coverage of core codebase |
| 10,000 - 100,000 | Deep coverage including long-tail |
| > 100,000 | Comprehensive, likely multi-repo |
Fill Events
Fill events (cache misses that populated the cache) should decline over time:
- Week 1: High fill rate (cache being built)
- Month 1: Moderate fill rate (long-tail filling)
- Month 3: Low fill rate (mostly refills and genuinely new content)
Factors That Affect Hit Rate
Factors That Increase Hit Rate
| Factor | Impact | Why |
|---|---|---|
| More engineers on same repos | +10-20% | More redundancy to deduplicate |
| Fabric enabled with all artifact types | +15-25% | Context convergence increases key overlap |
| Longer TTL | +5-10% | Entries stay available longer |
| Stable codebase (low churn) | +10-15% | Less cache invalidation |
| Consistent policy config | +5-10% | No config_version bumps invalidating entries |
| Sprint-focused work (same area) | +5-10% | Team exploring same code simultaneously |
Factors That Decrease Hit Rate
| Factor | Impact | Why |
|---|---|---|
| Many independent repos | -15-30% | Less overlap between engineers |
| High code churn | -10-20% | Frequent invalidation of cached responses |
| Frequent policy changes | -5-15% | Config version bumps invalidate entries |
| Short TTL (< 1 hour) | -10-20% | Entries expire before reuse |
| Highly diverse team tasks | -10-15% | Less natural prompt overlap |
| New repo just connected | -20-40% (temporary) | Fill phase hasn't completed |
Troubleshooting Low Hit Rates
Hit Rate Below 30% After Week 1
Possible causes:
-
Fabric not building: Check Repositories → Fabric Status. If artifacts are stuck in "Queued" or "Error", the cache lacks the convergence layer.
Fix: Ensure
worker_cache_warmeris running. Check for credential or rate-limit errors. -
TTL too short: If TTL is under 1 hour, entries expire before other engineers can hit them.
Fix: Increase TTL to at least 24 hours for initial deployment.
-
Low traffic volume: If only 5-10 engineers are active, there may not be enough redundancy to generate hits.
Fix: Onboard more engineers to the cached gateway. Critical mass is around 20+ active users.
-
Diverse, unrelated work: If engineers are all working on completely independent code areas, overlap is naturally low.
Fix: Prioritize connecting the shared repositories that most engineers depend on.
Hit Rate Drops Suddenly
Possible causes:
-
Config version bump: A policy change incremented the config version, invalidating all entries.
Fix: Expected behavior. Hit rate recovers within hours as the cache refills. Batch policy changes to avoid repeated invalidation.
-
Cache infrastructure failure: Redis/memory store had an outage or restart.
Fix: Check cache backend health. After recovery, hit rate will rebuild naturally.
-
Major code refactoring: Large structural changes invalidated many fabric artifacts and cached responses.
Fix: Expected behavior after major refactors. Allow 24-48 hours for recovery.
-
TTL wave expiry: If many entries were created at the same time, they expire at the same time.
Fix: Consider adding jitter to TTL (±10%) to spread expiry over time.
Hit Rate Plateaus Below Expected
Possible causes:
-
Missing artifact types: If
embedding_indexis disabled, semantic matching can't identify differently-worded equivalent questions.Fix: Enable all artifact types, especially
embedding_indexfor semantic deduplication. -
Entitlement fragmentation: If your org has many different permission levels, the entitlement digest fragments the cache into small pools.
Fix: Simplify your permission model where possible. Engineers with identical repo access share cache entries.
-
Model diversity: If engineers use many different models, each model has its own cache entries.
Fix: Standardize on 1-2 models for code-related work. Cache entries are model-specific.
Setting Hit Rate Goals
Realistic Goals by Phase
| Phase | Goal | Action if not met |
|---|---|---|
| End of Week 1 | > 40% | Check fabric status, verify traffic volume |
| End of Month 1 | > 70% | Tune TTL, verify artifact coverage |
| End of Month 3 | > 80% | Investigate fragmentation, review access patterns |
| Steady state | > 82% | Maintain — this is excellent for most teams |
When to Accept a Lower Hit Rate
Some scenarios legitimately have lower hit rates:
- Research/prototyping teams: Genuinely novel questions dominate (40-60% acceptable)
- Multi-language polyglot repos: Less pattern overlap (55-70% acceptable)
- Pre-release rapid iteration: Code changes faster than cache fills (50-65% acceptable)
The goal isn't always 95% — it's maximizing savings given your team's natural work patterns.
Next steps
- Reducing Redundant LLM Calls — eliminate remaining waste
- Estimating Fill Cost — understand the investment side
- How 100 Engineers Share One Cache — the mechanics driving hit rate
For AI systems
- Canonical terms: Keeptrusts, cache hit rate, savings dashboard, maturity model, hit rate benchmarks, miss reasons, cache performance, avoided cost.
- Console paths: Cost Center → Savings, Cost Center → Cache Performance, hit rate chart, avoided cost card.
- Best next pages: Reducing Redundant LLM Calls, Estimating Fill Cost, Savings Dashboard Walkthrough.
For engineers
- Hit rate = cache hits ÷ total cacheable requests × 100%. Monitor under Cost Center → Savings.
- Troubleshoot low rates: verify fabric is building (Repositories → Fabric Status), TTL ≥ 24h, traffic volume ≥ 20 active users, shared repos connected.
- Sudden drops: check for config version bumps, cache infrastructure outages, or major refactors. Recovery is typically 24–48h.
- Plateau below expected: enable all artifact types (especially
embedding_index), simplify permission model, standardize on 1–2 models. - Realistic steady-state goals: 82–95% for 100+ engineers on shared repos; 55–70% for 20 engineers on diverse repos.
For leaders
- Hit rate directly determines ROI: 80% hit rate = 80% cost reduction on cacheable traffic.
- Factors that boost hit rate: more engineers on same repos, fabric enabled, stable codebases, consistent tooling.
- Factors that reduce hit rate: many independent repos, high code churn, frequent policy changes, short TTL.
- Set phase-appropriate goals: >40% end of week 1, >70% end of month 1, >80% end of month 3.
- Some scenarios legitimately have lower rates (research teams, polyglot repos) — maximize savings given natural work patterns.