Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cache Hit Rates: What Good Looks Like

Cache hit rate is the single most important metric for understanding your org-shared cache effectiveness. This guide defines what good looks like at each stage of maturity and how to troubleshoot when hit rates are below expectations.

Use this page when

  • You need to define target cache hit rates for your team profile and maturity stage.
  • You are troubleshooting unexpectedly low hit rates and need a diagnostic checklist.
  • You want to interpret savings dashboard charts and understand what the numbers mean.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What Hit Rate Means

Hit rate = cache hits ÷ total cacheable requests × 100%
  • Cache hit: A request whose response was served from cache without calling the upstream provider
  • Total cacheable requests: All requests eligible for caching (excludes requests with no-cache headers or isolation policy requirements)

A 85% hit rate means 85 out of every 100 cacheable requests are served from cache at zero provider cost.

Hit Rate Benchmarks

By Team Profile

Team profileExpected steady-state hit rateRationale
100 engineers, 1-3 shared repos85-95%Maximum overlap, high density
100 engineers, 5-10 repos75-88%Strong overlap, some dispersion
50 engineers, 3-5 repos70-85%Good overlap, smaller fill pool
20 engineers, 2-3 repos55-70%Moderate overlap, fewer repeated questions
10 engineers, diverse repos35-55%Lower overlap, more unique questions

By Time Period

Hit rate improves over time as the cache fills:

PeriodExpected hit rateWhat's happening
Hour 10-10%Cache is empty, everything misses
Hours 2-410-25%Common questions starting to hit
Hours 4-1225-50%Active fill period, coverage building
Day 1-350-70%Major codebase areas covered
Week 165-80%Strong coverage, long-tail filling
Week 2-475-88%Near steady-state for most teams
Month 2+80-95%Mature cache, mostly incremental refills

The Maturity Model

Week 1: Foundation

Expected hit rate: 40-65%

During the first week:

  • Core modules and frequently-accessed code gets cached
  • High-traffic patterns (morning standup questions, common errors) start hitting
  • Fabric artifacts complete building and start contributing to convergence
  • Single-flight fill catches concurrent duplicates

Key indicator: Hit rate should be climbing daily. If it's flat after day 3, investigate.

Month 1: Growth

Expected hit rate: 70-85%

By the end of the first month:

  • Most common codebase questions are cached
  • Fabric context attachment has stabilized hit patterns
  • TTL-expired entries get refilled naturally by ongoing traffic
  • Less common modules start getting coverage from individual engineers

Key indicator: Daily savings should exceed daily fill cost by 5-10×.

Month 3: Maturity

Expected hit rate: 80-92%

At three months:

  • Long-tail coverage is strong
  • Seasonal patterns (sprint planning, release cycles) are captured
  • New engineer onboarding hits existing cache entries consistently
  • Incremental fill cost is minimal compared to savings

Key indicator: Monthly savings should be 80-95% of what uncached spend would have been.

Steady State: Maintenance

Expected hit rate: 82-95%

Mature caches maintain high hit rates with:

  • Automatic refill on TTL expiry (from ongoing traffic)
  • Incremental fabric refresh on code changes
  • Stable cache key patterns (consistent entitlement digests, config versions)

Key indicator: Hit rate should be stable week-over-week with minor fluctuation around code changes and deployments.

Reading the Savings Dashboard

Navigate to Cost & Spend → Savings to find your hit rate metrics:

Hit Rate Chart

The hit rate chart shows hourly/daily/weekly hit rate over time. Look for:

  • Upward trend in first week: Normal fill behavior
  • Stable plateau after week 2: Healthy steady state
  • Periodic dips: Normal — correspond to code deployments or TTL expiry waves
  • Sudden drops: Investigate — may indicate config version bump, policy change, or infrastructure issue

Avoided Cost

Avoided cost = what you would have paid without cache × hit rate:

Avoided cost = (cache_hits × avg_tokens_per_request × cost_per_token)

This number should grow over time and represent 70-90% of your theoretical uncached spend.

Cache Entries

Total active cache entries indicates coverage depth:

EntriesInterpretation
< 1,000Early fill phase, limited coverage
1,000 - 10,000Good coverage of core codebase
10,000 - 100,000Deep coverage including long-tail
> 100,000Comprehensive, likely multi-repo

Fill Events

Fill events (cache misses that populated the cache) should decline over time:

  • Week 1: High fill rate (cache being built)
  • Month 1: Moderate fill rate (long-tail filling)
  • Month 3: Low fill rate (mostly refills and genuinely new content)

Factors That Affect Hit Rate

Factors That Increase Hit Rate

FactorImpactWhy
More engineers on same repos+10-20%More redundancy to deduplicate
Fabric enabled with all artifact types+15-25%Context convergence increases key overlap
Longer TTL+5-10%Entries stay available longer
Stable codebase (low churn)+10-15%Less cache invalidation
Consistent policy config+5-10%No config_version bumps invalidating entries
Sprint-focused work (same area)+5-10%Team exploring same code simultaneously

Factors That Decrease Hit Rate

FactorImpactWhy
Many independent repos-15-30%Less overlap between engineers
High code churn-10-20%Frequent invalidation of cached responses
Frequent policy changes-5-15%Config version bumps invalidate entries
Short TTL (< 1 hour)-10-20%Entries expire before reuse
Highly diverse team tasks-10-15%Less natural prompt overlap
New repo just connected-20-40% (temporary)Fill phase hasn't completed

Troubleshooting Low Hit Rates

Hit Rate Below 30% After Week 1

Possible causes:

  1. Fabric not building: Check Repositories → Fabric Status. If artifacts are stuck in "Queued" or "Error", the cache lacks the convergence layer.

    Fix: Ensure worker_cache_warmer is running. Check for credential or rate-limit errors.

  2. TTL too short: If TTL is under 1 hour, entries expire before other engineers can hit them.

    Fix: Increase TTL to at least 24 hours for initial deployment.

  3. Low traffic volume: If only 5-10 engineers are active, there may not be enough redundancy to generate hits.

    Fix: Onboard more engineers to the cached gateway. Critical mass is around 20+ active users.

  4. Diverse, unrelated work: If engineers are all working on completely independent code areas, overlap is naturally low.

    Fix: Prioritize connecting the shared repositories that most engineers depend on.

Hit Rate Drops Suddenly

Possible causes:

  1. Config version bump: A policy change incremented the config version, invalidating all entries.

    Fix: Expected behavior. Hit rate recovers within hours as the cache refills. Batch policy changes to avoid repeated invalidation.

  2. Cache infrastructure failure: Redis/memory store had an outage or restart.

    Fix: Check cache backend health. After recovery, hit rate will rebuild naturally.

  3. Major code refactoring: Large structural changes invalidated many fabric artifacts and cached responses.

    Fix: Expected behavior after major refactors. Allow 24-48 hours for recovery.

  4. TTL wave expiry: If many entries were created at the same time, they expire at the same time.

    Fix: Consider adding jitter to TTL (±10%) to spread expiry over time.

Hit Rate Plateaus Below Expected

Possible causes:

  1. Missing artifact types: If embedding_index is disabled, semantic matching can't identify differently-worded equivalent questions.

    Fix: Enable all artifact types, especially embedding_index for semantic deduplication.

  2. Entitlement fragmentation: If your org has many different permission levels, the entitlement digest fragments the cache into small pools.

    Fix: Simplify your permission model where possible. Engineers with identical repo access share cache entries.

  3. Model diversity: If engineers use many different models, each model has its own cache entries.

    Fix: Standardize on 1-2 models for code-related work. Cache entries are model-specific.

Setting Hit Rate Goals

Realistic Goals by Phase

PhaseGoalAction if not met
End of Week 1> 40%Check fabric status, verify traffic volume
End of Month 1> 70%Tune TTL, verify artifact coverage
End of Month 3> 80%Investigate fragmentation, review access patterns
Steady state> 82%Maintain — this is excellent for most teams

When to Accept a Lower Hit Rate

Some scenarios legitimately have lower hit rates:

  • Research/prototyping teams: Genuinely novel questions dominate (40-60% acceptable)
  • Multi-language polyglot repos: Less pattern overlap (55-70% acceptable)
  • Pre-release rapid iteration: Code changes faster than cache fills (50-65% acceptable)

The goal isn't always 95% — it's maximizing savings given your team's natural work patterns.

Next steps

For AI systems

For engineers

  • Hit rate = cache hits ÷ total cacheable requests × 100%. Monitor under Cost Center → Savings.
  • Troubleshoot low rates: verify fabric is building (Repositories → Fabric Status), TTL ≥ 24h, traffic volume ≥ 20 active users, shared repos connected.
  • Sudden drops: check for config version bumps, cache infrastructure outages, or major refactors. Recovery is typically 24–48h.
  • Plateau below expected: enable all artifact types (especially embedding_index), simplify permission model, standardize on 1–2 models.
  • Realistic steady-state goals: 82–95% for 100+ engineers on shared repos; 55–70% for 20 engineers on diverse repos.

For leaders

  • Hit rate directly determines ROI: 80% hit rate = 80% cost reduction on cacheable traffic.
  • Factors that boost hit rate: more engineers on same repos, fabric enabled, stable codebases, consistent tooling.
  • Factors that reduce hit rate: many independent repos, high code churn, frequent policy changes, short TTL.
  • Set phase-appropriate goals: >40% end of week 1, >70% end of month 1, >80% end of month 3.
  • Some scenarios legitimately have lower rates (research teams, polyglot repos) — maximize savings given natural work patterns.