Cache Savings Dashboard Walkthrough
The cache savings dashboard in the Keeptrusts console gives you a complete view of how the org-shared cache reduces your LLM spend. This guide walks through each section, explains what the numbers mean, and shows how to use them for executive reporting.
Use this page when
- You want to understand each section of the Keeptrusts console cache savings dashboard.
- You are preparing executive reports using dashboard data and need guidance on framing.
- You need to troubleshoot why specific metrics look unexpected (low hit rate, negative net savings, etc.).
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
Accessing the Dashboard
Navigate to Cost Center → Cache Savings in the console. The dashboard loads with the current month selected. Use the date picker to change the period.
Section 1: Fill Cost Card
The fill cost card shows what you actually spent on upstream provider calls that populated the cache.
| Field | Meaning |
|---|---|
| Fill Cost | Total spend on cache misses forwarded upstream |
| Fill Requests | Number of requests that resulted in a cache miss and upstream call |
| Avg Fill Cost | Average cost per fill request |
During the first week of onboarding a new repository, fill cost is high. This is expected and temporary. Once the cache reaches steady state, fill cost drops to a fraction of total request volume.
Section 2: Avoided Provider Cost Card
This card shows the estimated cost you did not incur because cache hits served requests locally.
| Field | Meaning |
|---|---|
| Avoided Cost | Sum of estimated_avoided_cost across all cache hits |
| Cache Hits | Total number of requests served from cache |
| Avg Avoided per Hit | Average cost saved per cache hit |
This is your headline savings number. No upstream call, no wallet debit, no platform fee.
Section 3: Provider Cached-Token Savings
For requests that do go upstream (cache misses), provider-side prefix caching reduces the input token cost.
| Field | Meaning |
|---|---|
| Provider Cache Savings | Discount from provider prefix caching on miss requests |
| Cached Token Ratio | Percentage of input tokens served from provider cache |
| Effective vs Full Cost | What you paid vs what you would have paid without prefix caching |
This is a secondary optimization. The org-shared cache delivers the primary savings.
Section 4: Net Savings
The net savings card combines all cost avoidance mechanisms:
Net Savings = Avoided Provider Cost + Provider Cached-Token Savings − Fill Cost
This is the number that matters for ROI calculations. A positive net savings means the cache is saving more than it costs to fill.
Section 5: Hit Rate
The hit rate gauge shows what percentage of total requests are served from cache:
| Range | Interpretation |
|---|---|
| 0-30% | Cache is still filling — early days or high context diversity |
| 30-60% | Moderate savings — check for context ordering issues |
| 60-80% | Good steady-state performance |
| 80-95% | Excellent — typical for stable monorepos with 50+ engineers |
The gauge includes a trend arrow showing whether hit rate is improving, stable, or declining.
Section 6: Miss Reasons
When a request misses the cache, the dashboard categorizes why:
| Reason | Description |
|---|---|
no_match | No cache entry exists for this cache key (first-time prompt) |
stale | Entry exists but exceeded TTL or was invalidated by a code change |
policy_deny | Cache entry exists but policy evaluation blocked serving it |
entitlement_mismatch | Requester lacks entitlement to the cached content |
model_mismatch | Entry exists for a different model than requested |
Understanding miss reasons helps you optimize:
- High
stale→ Consider increasing TTL for stable contexts - High
policy_deny→ Review whether policies are overly restrictive for cached content - High
entitlement_mismatch→ Check team entitlement bindings - High
no_match→ Normal during fill phase; investigate if persistent
Section 7: Single-Flight Collapses
This section shows how many requests were deduplicated through single-flight fill:
| Field | Meaning |
|---|---|
| Total Collapses | Requests that waited on an in-flight fill instead of calling upstream |
| Collapse Groups | Distinct cache keys that had multiple concurrent waiters |
| Peak Collapses | Maximum simultaneous waiters in a single group |
| Collapse Savings | Estimated cost saved by deduplication |
High collapse counts during morning hours are a sign of healthy cache economics.
Section 8: Time-Series Trend
The trend chart shows daily or weekly values for:
- Fill cost (bar)
- Avoided cost (bar, stacked)
- Hit rate (line, right axis)
- Net savings (line, right axis)
Use the time-series view to identify:
- Fill cost spikes (new repos onboarded, major code changes)
- Hit rate growth over the first 2-4 weeks
- Seasonal patterns (Monday mornings vs Friday afternoons)
What the Dashboard Does NOT Show
The savings dashboard intentionally excludes:
- Lookup cost — Cache lookups are computationally trivial and not metered
- Platform fee — Cache hits incur zero platform fee; only fill requests are subject to standard gateway fees
- Wallet transactions — Cache hits do not touch wallets; see Cost Center → Wallet for debit history
This keeps the dashboard focused on the cache value proposition: avoided upstream spend.
Tips for CFO Presentations
When preparing executive reports from the dashboard:
- Lead with Net Savings — This is the bottom-line number
- Show the trend — Hit rate growth over the first month demonstrates improving returns
- Compare to baseline — Use the Direct API vs Cached comparison for context
- Highlight the asymmetry — Fill cost is one-time; savings compound every month
- Include single-flight — Morning surge deduplication is an easy-to-understand story
- Project forward — Use current hit rate to project next quarter savings
Export the dashboard as PDF or CSV for inclusion in finance decks.
Refreshing and Caching of Dashboard Data
Dashboard metrics refresh every 5 minutes. Historical data is pre-aggregated daily. If you need real-time granularity, use the Cost Center → Events view with cache-type filters.
Next steps
- Tracking Avoided Cost — deep dive on avoided-cost records
- ROI Calculation for a 100-Engineer Team — build the full business case
- Direct API Cost vs Cached Cost — comparison tables for stakeholders
For AI systems
- Canonical terms: Keeptrusts, savings dashboard, Cost Center, cache metrics, fill cost card, avoided cost, provider cached-token savings, net savings, hit rate, miss reasons, single-flight, time-series.
- Console path: Cost Center → Cache Savings.
- Dashboard sections: Fill Cost Card, Avoided Cost, Provider Cached-Token Savings, Net Savings, Hit Rate Trend, Miss Reasons Breakdown, Single-Flight Events, Time-Series Charts.
- Best next pages: ROI Calculation for a 100-Engineer Team, Direct API Cost vs Cached Cost, Tracking Avoided Cost.
For engineers
- Dashboard refreshes every 5 minutes. Historical data is pre-aggregated daily.
- For real-time granularity, use Cost Center → Events with cache-type filters.
- Fill Cost Card: shows one-time fabric build + cumulative response fills. Should plateau after initial warming.
- Miss Reasons Breakdown:
ttl_expired,no_match,threshold_below,invalidated. Highttl_expired→ increase TTL. Highno_match→ normal for novel queries. - Single-Flight Events: high counts during morning hours = expected. Low counts may mean TTL is too long (hits serving before dedup kicks in).
- Export as CSV for custom analysis. PDF export for leadership decks.
For leaders
- Use the dashboard to build quarterly savings reports: Net Savings number is the headline.
- Executive framing: fill cost (one-time, bounded) vs avoided cost (recurring, growing), net = difference.
- Highlight hit rate trend: upward trend = cache maturing. Flat high rate = steady state. Drops = investigate (refactor, new repo, config change).
- Morning-surge single-flight deduplication is an easy story for non-technical stakeholders.
- Project forward: current monthly net savings × remaining months = projected annual value.
- Dashboard metrics refresh every 5 minutes; historical aggregations are daily.