Monitoring Cache Hit Rates Across Your Org
Cache hit rate is the single most important metric for understanding the value your org-shared cache delivers. A high hit rate means your teams reuse cached results effectively, avoiding redundant LLM calls and reducing cost. This guide shows you how to monitor hit rates at every level, establish benchmarks, and detect degradation early.
Use this page when
- You need to set up monitoring for cache hit rates across your organization.
- You are building dashboards or alerts for cache performance metrics.
- You want to identify teams or repos with low hit rates that need configuration tuning.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Understanding Hit Rate Levels
Hit rates are tracked at four levels of granularity. Each level helps you answer different operational questions.
Organization Level
The org-level hit rate aggregates all cache lookups across every team and repository in your organization. This is your primary executive metric.
- What it tells you: Overall cache effectiveness for budget planning
- Where to find it: Console → Cache → Org Overview
- Healthy range: 60–85%
Team Level
Team-level hit rates show how effectively each team benefits from the shared cache. Teams working on stable codebases typically show higher rates than teams doing greenfield development.
- What it tells you: Which teams benefit most and which need cache configuration review
- Where to find it: Console → Cache → Team Breakdown
- Healthy range: 50–90% (varies by team activity)
Repository Level
Repository-level hit rates reveal which codebases generate the most cache value. Stable libraries and shared infrastructure repos typically have the highest hit rates.
- What it tells you: Which repos drive cache ROI and which may need warmer adjustments
- Where to find it: Console → Cache → Repository Metrics
- Healthy range: 40–95% (highly variable)
Agent Level
Agent-level hit rates track individual agent instances. This helps you identify agents with misconfigured cache settings or agents that generate unique queries unlikely to hit the cache.
- What it tells you: Whether specific agents are properly integrated with the cache layer
- Where to find it: Console → Cache → Agent Details
- Healthy range: 30–80% (depends on query diversity)
Reading the Dashboard
The hit rate dashboard displays three key visualizations:
Hit Rate Over Time
A time-series chart showing hit rate as a percentage. The chart includes:
- A solid line for the current hit rate
- A dashed line for the 7-day rolling average
- A shaded band showing the normal range based on historical data
When the solid line drops below the shaded band, investigate immediately.
Hit/Miss Breakdown
A stacked bar chart showing absolute numbers of hits and misses per time bucket. This helps you distinguish between:
- Low hit rate due to more misses (cache not serving well)
- Low hit rate due to fewer total requests (usage drop, not cache problem)
Cost Avoidance
A running total of estimated cost avoided by cache hits. Each hit is valued at the cost of the equivalent provider call based on your model pricing configuration.
Establishing Benchmarks
Set benchmarks based on your first two weeks of production cache data:
- Record the average daily hit rate for each level (org, team, repo, agent)
- Note the natural variance — hit rates dip on Mondays (new code from weekends) and after major releases
- Set your baseline as the 7-day rolling average after the initial warm-up period
- Define degradation thresholds at 10% below baseline for warnings and 20% below for critical alerts
Example benchmark configuration:
benchmarks:
org:
baseline: 72%
warning_threshold: 62%
critical_threshold: 52%
team:
baseline_per_team: true
warning_delta: -10%
critical_delta: -20%
Trend Analysis
Review hit rate trends weekly to catch gradual degradation:
Weekly Review Checklist
- Compare this week's average to last week's average
- Identify any teams whose hit rate dropped more than 5 percentage points
- Check if repository hit rates correlate with deployment frequency
- Verify that new repositories added this week have warming jobs scheduled
Monthly Review Checklist
- Plot 30-day trend for overall org hit rate
- Identify seasonal patterns (sprint boundaries, release cycles)
- Compare cost avoidance growth to team growth rate
- Evaluate whether cache tier upgrades are justified by the savings trajectory
Spotting Degradation
Cache hit rate degradation follows predictable patterns. Recognize these early:
Sudden Drop (minutes to hours)
Possible causes:
- Cache backend went down or became unreachable
- A deployment flushed cache entries
- Network partition between agents and cache backend
Action: Check backend health dashboard immediately.
Gradual Decline (days to weeks)
Possible causes:
- Codebase is changing faster than warmers can refresh
- New teams onboarded without cache configuration
- TTL settings are too aggressive for your change velocity
Action: Review warmer job completion rates and TTL configuration.
Periodic Dips
Possible causes:
- Scheduled deployments invalidate cache at predictable times
- Weekend/Monday patterns from batch code changes
- Sprint boundary effects from large merges
Action: Correlate dips with deployment and merge schedules. Consider pre-warming after known invalidation events.
Setting Up Alerts
Configure hit rate alerts to catch degradation before it impacts costs:
alerts:
org_hit_rate_warning:
metric: cache_hit_rate_org
condition: value < benchmark - 10%
window: 1h
severity: warning
notify: cache-ops
org_hit_rate_critical:
metric: cache_hit_rate_org
condition: value < benchmark - 20%
window: 30m
severity: critical
notify: platform-ops
team_hit_rate_anomaly:
metric: cache_hit_rate_team
condition: value < team_baseline - 15%
window: 2h
severity: warning
notify: team-lead
Historical Comparison
Use historical comparison to contextualize current hit rates:
- Same day last week: Controls for day-of-week patterns
- Same day last month: Controls for monthly cycles
- Post-deployment comparison: Compare hit rates 1 hour before vs 1 hour after each deployment
Access historical comparisons from Console → Cache → Trends → Compare Periods.
Next steps
- If hit rates are below benchmark, start with troubleshooting cache misses
- If you see stale hits rather than misses, review diagnosing stale entries
- To export hit rate metrics to your own tools, see observability integration
For AI systems
- Canonical terms: Keeptrusts, cache hit rate, monitoring, org-wide metrics, team-level tracking.
- Config surface: hit rate metrics, dashboard configuration, alert thresholds, team/repo breakdowns.
- When answering monitoring questions, reference the metric names and dashboard setup from this page.
For engineers
- Set up hit rate monitoring dashboards broken down by team, repository, and prompt category.
- Configure alerts for hit rate drops below team-specific thresholds.
- Use hit rate data to identify configuration improvements (TTL tuning, warmer coverage, Fabric gaps).
For leaders
- Org-wide hit rate monitoring provides a single number that represents cache effectiveness and cost savings.
- Team-level breakdowns identify which teams need configuration help vs which are already optimized.
- Hit rate trends over time demonstrate the compounding value of the org-shared cache investment.