Cache Hit Rates: What Good Looks Like

Cache hit rate is the single most important metric for understanding your org-shared cache effectiveness. This guide defines what good looks like at each stage of maturity and how to troubleshoot when hit rates are below expectations.

Use this page when

You need to define target cache hit rates for your team profile and maturity stage.
You are troubleshooting unexpectedly low hit rates and need a diagnostic checklist.
You want to interpret savings dashboard charts and understand what the numbers mean.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

What Hit Rate Means

Hit rate = cache hits ÷ total cacheable requests × 100%

Cache hit: A request whose response was served from cache without calling the upstream provider
Total cacheable requests: All requests eligible for caching (excludes requests with no-cache headers or isolation policy requirements)

A 85% hit rate means 85 out of every 100 cacheable requests are served from cache at zero provider cost.

Hit Rate Benchmarks

By Team Profile

Team profile	Expected steady-state hit rate	Rationale
100 engineers, 1-3 shared repos	85-95%	Maximum overlap, high density
100 engineers, 5-10 repos	75-88%	Strong overlap, some dispersion
50 engineers, 3-5 repos	70-85%	Good overlap, smaller fill pool
20 engineers, 2-3 repos	55-70%	Moderate overlap, fewer repeated questions
10 engineers, diverse repos	35-55%	Lower overlap, more unique questions

By Time Period

Hit rate improves over time as the cache fills:

Period	Expected hit rate	What's happening
Hour 1	0-10%	Cache is empty, everything misses
Hours 2-4	10-25%	Common questions starting to hit
Hours 4-12	25-50%	Active fill period, coverage building
Day 1-3	50-70%	Major codebase areas covered
Week 1	65-80%	Strong coverage, long-tail filling
Week 2-4	75-88%	Near steady-state for most teams
Month 2+	80-95%	Mature cache, mostly incremental refills

The Maturity Model

Week 1: Foundation

Expected hit rate: 40-65%

During the first week:

Core modules and frequently-accessed code gets cached
High-traffic patterns (morning standup questions, common errors) start hitting
Fabric artifacts complete building and start contributing to convergence
Single-flight fill catches concurrent duplicates

Key indicator: Hit rate should be climbing daily. If it's flat after day 3, investigate.

Month 1: Growth

Expected hit rate: 70-85%

By the end of the first month:

Most common codebase questions are cached
Fabric context attachment has stabilized hit patterns
TTL-expired entries get refilled naturally by ongoing traffic
Less common modules start getting coverage from individual engineers

Key indicator: Daily savings should exceed daily fill cost by 5-10×.

Month 3: Maturity

Expected hit rate: 80-92%

At three months:

Long-tail coverage is strong
Seasonal patterns (sprint planning, release cycles) are captured
New engineer onboarding hits existing cache entries consistently
Incremental fill cost is minimal compared to savings

Key indicator: Monthly savings should be 80-95% of what uncached spend would have been.

Steady State: Maintenance

Expected hit rate: 82-95%

Mature caches maintain high hit rates with:

Automatic refill on TTL expiry (from ongoing traffic)
Incremental fabric refresh on code changes
Stable cache key patterns (consistent entitlement digests, config versions)

Key indicator: Hit rate should be stable week-over-week with minor fluctuation around code changes and deployments.

Reading the Savings Dashboard

Navigate to Cost & Spend → Savings to find your hit rate metrics:

Hit Rate Chart

The hit rate chart shows hourly/daily/weekly hit rate over time. Look for:

Upward trend in first week: Normal fill behavior
Stable plateau after week 2: Healthy steady state
Periodic dips: Normal — correspond to code deployments or TTL expiry waves
Sudden drops: Investigate — may indicate config version bump, policy change, or infrastructure issue

Avoided Cost

Avoided cost = what you would have paid without cache × hit rate:

Avoided cost = (cache_hits × avg_tokens_per_request × cost_per_token)

This number should grow over time and represent 70-90% of your theoretical uncached spend.

Cache Entries

Total active cache entries indicates coverage depth:

Entries	Interpretation
< 1,000	Early fill phase, limited coverage
1,000 - 10,000	Good coverage of core codebase
10,000 - 100,000	Deep coverage including long-tail
> 100,000	Comprehensive, likely multi-repo

Fill Events

Fill events (cache misses that populated the cache) should decline over time:

Week 1: High fill rate (cache being built)
Month 1: Moderate fill rate (long-tail filling)
Month 3: Low fill rate (mostly refills and genuinely new content)

Factors That Affect Hit Rate

Factors That Increase Hit Rate

Factor	Impact	Why
More engineers on same repos	+10-20%	More redundancy to deduplicate
Fabric enabled with all artifact types	+15-25%	Context convergence increases key overlap
Longer TTL	+5-10%	Entries stay available longer
Stable codebase (low churn)	+10-15%	Less cache invalidation
Consistent policy config	+5-10%	No config_version bumps invalidating entries
Sprint-focused work (same area)	+5-10%	Team exploring same code simultaneously

Factors That Decrease Hit Rate

Factor	Impact	Why
Many independent repos	-15-30%	Less overlap between engineers
High code churn	-10-20%	Frequent invalidation of cached responses
Frequent policy changes	-5-15%	Config version bumps invalidate entries
Short TTL (< 1 hour)	-10-20%	Entries expire before reuse
Highly diverse team tasks	-10-15%	Less natural prompt overlap
New repo just connected	-20-40% (temporary)	Fill phase hasn't completed

Troubleshooting Low Hit Rates

Hit Rate Below 30% After Week 1

Possible causes:

Fabric not building: Check Repositories → Fabric Status. If artifacts are stuck in "Queued" or "Error", the cache lacks the convergence layer.

Fix: Ensure worker_cache_warmer is running. Check for credential or rate-limit errors.
TTL too short: If TTL is under 1 hour, entries expire before other engineers can hit them.

Fix: Increase TTL to at least 24 hours for initial deployment.
Low traffic volume: If only 5-10 engineers are active, there may not be enough redundancy to generate hits.

Fix: Onboard more engineers to the cached gateway. Critical mass is around 20+ active users.
Diverse, unrelated work: If engineers are all working on completely independent code areas, overlap is naturally low.

Fix: Prioritize connecting the shared repositories that most engineers depend on.

Hit Rate Drops Suddenly

Possible causes:

Config version bump: A policy change incremented the config version, invalidating all entries.

Fix: Expected behavior. Hit rate recovers within hours as the cache refills. Batch policy changes to avoid repeated invalidation.
Cache infrastructure failure: Redis/memory store had an outage or restart.

Fix: Check cache backend health. After recovery, hit rate will rebuild naturally.
Major code refactoring: Large structural changes invalidated many fabric artifacts and cached responses.

Fix: Expected behavior after major refactors. Allow 24-48 hours for recovery.
TTL wave expiry: If many entries were created at the same time, they expire at the same time.

Fix: Consider adding jitter to TTL (±10%) to spread expiry over time.

Hit Rate Plateaus Below Expected

Possible causes:

Missing artifact types: If embedding_index is disabled, semantic matching can't identify differently-worded equivalent questions.

Fix: Enable all artifact types, especially embedding_index for semantic deduplication.
Entitlement fragmentation: If your org has many different permission levels, the entitlement digest fragments the cache into small pools.

Fix: Simplify your permission model where possible. Engineers with identical repo access share cache entries.
Model diversity: If engineers use many different models, each model has its own cache entries.

Fix: Standardize on 1-2 models for code-related work. Cache entries are model-specific.

Setting Hit Rate Goals

Realistic Goals by Phase

Phase	Goal	Action if not met
End of Week 1	> 40%	Check fabric status, verify traffic volume
End of Month 1	> 70%	Tune TTL, verify artifact coverage
End of Month 3	> 80%	Investigate fragmentation, review access patterns
Steady state	> 82%	Maintain — this is excellent for most teams

When to Accept a Lower Hit Rate

Some scenarios legitimately have lower hit rates:

Research/prototyping teams: Genuinely novel questions dominate (40-60% acceptable)
Multi-language polyglot repos: Less pattern overlap (55-70% acceptable)
Pre-release rapid iteration: Code changes faster than cache fills (50-65% acceptable)

The goal isn't always 95% — it's maximizing savings given your team's natural work patterns.

Next steps

Reducing Redundant LLM Calls — eliminate remaining waste
Estimating Fill Cost — understand the investment side
How 100 Engineers Share One Cache — the mechanics driving hit rate

For AI systems

Canonical terms: Keeptrusts, cache hit rate, savings dashboard, maturity model, hit rate benchmarks, miss reasons, cache performance, avoided cost.
Console paths: Cost Center → Savings, Cost Center → Cache Performance, hit rate chart, avoided cost card.
Best next pages: Reducing Redundant LLM Calls, Estimating Fill Cost, Savings Dashboard Walkthrough.

For engineers

Hit rate = cache hits ÷ total cacheable requests × 100%. Monitor under Cost Center → Savings.
Troubleshoot low rates: verify fabric is building (Repositories → Fabric Status), TTL ≥ 24h, traffic volume ≥ 20 active users, shared repos connected.
Sudden drops: check for config version bumps, cache infrastructure outages, or major refactors. Recovery is typically 24–48h.
Plateau below expected: enable all artifact types (especially embedding_index), simplify permission model, standardize on 1–2 models.
Realistic steady-state goals: 82–95% for 100+ engineers on shared repos; 55–70% for 20 engineers on diverse repos.

For leaders

Hit rate directly determines ROI: 80% hit rate = 80% cost reduction on cacheable traffic.
Factors that boost hit rate: more engineers on same repos, fabric enabled, stable codebases, consistent tooling.
Factors that reduce hit rate: many independent repos, high code churn, frequent policy changes, short TTL.
Set phase-appropriate goals: >40% end of week 1, >70% end of month 1, >80% end of month 3.
Some scenarios legitimately have lower rates (research teams, polyglot repos) — maximize savings given natural work patterns.

Use this page when​

Primary audience​

What Hit Rate Means​

Hit Rate Benchmarks​

By Team Profile​

By Time Period​

The Maturity Model​

Week 1: Foundation​

Month 1: Growth​

Month 3: Maturity​

Steady State: Maintenance​

Reading the Savings Dashboard​

Hit Rate Chart​

Avoided Cost​

Cache Entries​

Fill Events​

Factors That Affect Hit Rate​

Factors That Increase Hit Rate​

Factors That Decrease Hit Rate​

Troubleshooting Low Hit Rates​

Hit Rate Below 30% After Week 1​

Hit Rate Drops Suddenly​

Hit Rate Plateaus Below Expected​

Setting Hit Rate Goals​

Realistic Goals by Phase​

When to Accept a Lower Hit Rate​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

What Hit Rate Means

Hit Rate Benchmarks

By Team Profile

By Time Period

The Maturity Model

Week 1: Foundation

Month 1: Growth

Month 3: Maturity

Steady State: Maintenance

Reading the Savings Dashboard

Hit Rate Chart

Avoided Cost

Cache Entries

Fill Events

Factors That Affect Hit Rate

Factors That Increase Hit Rate

Factors That Decrease Hit Rate

Troubleshooting Low Hit Rates

Hit Rate Below 30% After Week 1

Hit Rate Drops Suddenly

Hit Rate Plateaus Below Expected

Setting Hit Rate Goals

Realistic Goals by Phase

When to Accept a Lower Hit Rate

Next steps

For AI systems

For engineers

For leaders