Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cache Savings Dashboard Walkthrough

The cache savings dashboard in the Keeptrusts console gives you a complete view of how the org-shared cache reduces your LLM spend. This guide walks through each section, explains what the numbers mean, and shows how to use them for executive reporting.

Use this page when

  • You want to understand each section of the Keeptrusts console cache savings dashboard.
  • You are preparing executive reports using dashboard data and need guidance on framing.
  • You need to troubleshoot why specific metrics look unexpected (low hit rate, negative net savings, etc.).

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

Accessing the Dashboard

Navigate to Cost Center → Cache Savings in the console. The dashboard loads with the current month selected. Use the date picker to change the period.

Section 1: Fill Cost Card

The fill cost card shows what you actually spent on upstream provider calls that populated the cache.

FieldMeaning
Fill CostTotal spend on cache misses forwarded upstream
Fill RequestsNumber of requests that resulted in a cache miss and upstream call
Avg Fill CostAverage cost per fill request

During the first week of onboarding a new repository, fill cost is high. This is expected and temporary. Once the cache reaches steady state, fill cost drops to a fraction of total request volume.

Section 2: Avoided Provider Cost Card

This card shows the estimated cost you did not incur because cache hits served requests locally.

FieldMeaning
Avoided CostSum of estimated_avoided_cost across all cache hits
Cache HitsTotal number of requests served from cache
Avg Avoided per HitAverage cost saved per cache hit

This is your headline savings number. No upstream call, no wallet debit, no platform fee.

Section 3: Provider Cached-Token Savings

For requests that do go upstream (cache misses), provider-side prefix caching reduces the input token cost.

FieldMeaning
Provider Cache SavingsDiscount from provider prefix caching on miss requests
Cached Token RatioPercentage of input tokens served from provider cache
Effective vs Full CostWhat you paid vs what you would have paid without prefix caching

This is a secondary optimization. The org-shared cache delivers the primary savings.

Section 4: Net Savings

The net savings card combines all cost avoidance mechanisms:

Net Savings = Avoided Provider Cost + Provider Cached-Token Savings − Fill Cost

This is the number that matters for ROI calculations. A positive net savings means the cache is saving more than it costs to fill.

Section 5: Hit Rate

The hit rate gauge shows what percentage of total requests are served from cache:

RangeInterpretation
0-30%Cache is still filling — early days or high context diversity
30-60%Moderate savings — check for context ordering issues
60-80%Good steady-state performance
80-95%Excellent — typical for stable monorepos with 50+ engineers

The gauge includes a trend arrow showing whether hit rate is improving, stable, or declining.

Section 6: Miss Reasons

When a request misses the cache, the dashboard categorizes why:

ReasonDescription
no_matchNo cache entry exists for this cache key (first-time prompt)
staleEntry exists but exceeded TTL or was invalidated by a code change
policy_denyCache entry exists but policy evaluation blocked serving it
entitlement_mismatchRequester lacks entitlement to the cached content
model_mismatchEntry exists for a different model than requested

Understanding miss reasons helps you optimize:

  • High stale → Consider increasing TTL for stable contexts
  • High policy_deny → Review whether policies are overly restrictive for cached content
  • High entitlement_mismatch → Check team entitlement bindings
  • High no_match → Normal during fill phase; investigate if persistent

Section 7: Single-Flight Collapses

This section shows how many requests were deduplicated through single-flight fill:

FieldMeaning
Total CollapsesRequests that waited on an in-flight fill instead of calling upstream
Collapse GroupsDistinct cache keys that had multiple concurrent waiters
Peak CollapsesMaximum simultaneous waiters in a single group
Collapse SavingsEstimated cost saved by deduplication

High collapse counts during morning hours are a sign of healthy cache economics.

Section 8: Time-Series Trend

The trend chart shows daily or weekly values for:

  • Fill cost (bar)
  • Avoided cost (bar, stacked)
  • Hit rate (line, right axis)
  • Net savings (line, right axis)

Use the time-series view to identify:

  • Fill cost spikes (new repos onboarded, major code changes)
  • Hit rate growth over the first 2-4 weeks
  • Seasonal patterns (Monday mornings vs Friday afternoons)

What the Dashboard Does NOT Show

The savings dashboard intentionally excludes:

  • Lookup cost — Cache lookups are computationally trivial and not metered
  • Platform fee — Cache hits incur zero platform fee; only fill requests are subject to standard gateway fees
  • Wallet transactions — Cache hits do not touch wallets; see Cost Center → Wallet for debit history

This keeps the dashboard focused on the cache value proposition: avoided upstream spend.

Tips for CFO Presentations

When preparing executive reports from the dashboard:

  1. Lead with Net Savings — This is the bottom-line number
  2. Show the trend — Hit rate growth over the first month demonstrates improving returns
  3. Compare to baseline — Use the Direct API vs Cached comparison for context
  4. Highlight the asymmetry — Fill cost is one-time; savings compound every month
  5. Include single-flight — Morning surge deduplication is an easy-to-understand story
  6. Project forward — Use current hit rate to project next quarter savings

Export the dashboard as PDF or CSV for inclusion in finance decks.

Refreshing and Caching of Dashboard Data

Dashboard metrics refresh every 5 minutes. Historical data is pre-aggregated daily. If you need real-time granularity, use the Cost Center → Events view with cache-type filters.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, savings dashboard, Cost Center, cache metrics, fill cost card, avoided cost, provider cached-token savings, net savings, hit rate, miss reasons, single-flight, time-series.
  • Console path: Cost Center → Cache Savings.
  • Dashboard sections: Fill Cost Card, Avoided Cost, Provider Cached-Token Savings, Net Savings, Hit Rate Trend, Miss Reasons Breakdown, Single-Flight Events, Time-Series Charts.
  • Best next pages: ROI Calculation for a 100-Engineer Team, Direct API Cost vs Cached Cost, Tracking Avoided Cost.

For engineers

  • Dashboard refreshes every 5 minutes. Historical data is pre-aggregated daily.
  • For real-time granularity, use Cost Center → Events with cache-type filters.
  • Fill Cost Card: shows one-time fabric build + cumulative response fills. Should plateau after initial warming.
  • Miss Reasons Breakdown: ttl_expired, no_match, threshold_below, invalidated. High ttl_expired → increase TTL. High no_match → normal for novel queries.
  • Single-Flight Events: high counts during morning hours = expected. Low counts may mean TTL is too long (hits serving before dedup kicks in).
  • Export as CSV for custom analysis. PDF export for leadership decks.

For leaders

  • Use the dashboard to build quarterly savings reports: Net Savings number is the headline.
  • Executive framing: fill cost (one-time, bounded) vs avoided cost (recurring, growing), net = difference.
  • Highlight hit rate trend: upward trend = cache maturing. Flat high rate = steady state. Drops = investigate (refactor, new repo, config change).
  • Morning-surge single-flight deduplication is an easy story for non-technical stakeholders.
  • Project forward: current monthly net savings × remaining months = projected annual value.
  • Dashboard metrics refresh every 5 minutes; historical aggregations are daily.