Forecasting Monthly AI Spend with Caching

Accurate cost forecasting with org-shared caching requires a different model than traditional per-request billing. Your actual spend depends on hit rate, request volume, and average cost per miss. Use the savings dashboard trend data to build reliable monthly projections.

Use this page when

You need to build a monthly AI cost forecast incorporating hit rate, request volume, and fill cost amortization.
You are presenting budget projections to finance and need the formula and scenario tables.
You want to adjust forecasts for known upcoming events (refactors, team growth, new repos).

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

The Forecasting Formula

Your monthly AI cost with caching follows this formula:

monthly_cost = (total_requests × (1 - hit_rate) × avg_cost_per_miss) + fill_cost_amortized

Where:

total_requests — Total AI requests across your organization per month.
hit_rate — Percentage of requests served from cache (decimal, e.g., 0.80 for 80%).
avg_cost_per_miss — Average cost per request that goes upstream (provider tokens + platform fee).
fill_cost_amortized — Cost of populating new cache entries, spread over their useful life.

Scenario Table: 100 Engineers, 50 Requests/Day

Hit Rate	Monthly Requests	Billable Misses	Cost at $0.03/miss	Cost at $0.05/miss
50%	150,000	75,000	$2,250	$3,750
60%	150,000	60,000	$1,800	$3,000
70%	150,000	45,000	$1,350	$2,250
80%	150,000	30,000	$900	$1,500
85%	150,000	22,500	$675	$1,125
90%	150,000	15,000	$450	$750
95%	150,000	7,500	$225	$375

Assumptions: 100 engineers × 50 requests/day × 30 days = 150,000 monthly requests.

Estimating Your Hit Rate

If you are new to caching, estimate your initial hit rate based on team topology:

Team Structure	Expected Initial Hit Rate	After 30 Days
Monorepo, 50+ engineers	70–80%	85–95%
Monorepo, 10–50 engineers	60–70%	75–85%
Multi-repo, 10+ per repo	50–60%	65–75%
Multi-repo, 5–10 per repo	40–50%	55–65%
Multi-repo, <5 per repo	25–35%	40–55%

Hit rates improve over the first 30 days as the cache warms. Use the "after 30 days" column for steady-state forecasting.

Fill Cost Amortization

New cache entries incur a one-time fill cost (the upstream provider call that generates the response). This cost amortizes over the entry's lifetime:

fill_cost_amortized = new_entries_per_month × avg_cost_per_fill × (1 / avg_hits_per_entry)

For most teams, fill cost represents 5–15% of total spend after the initial warming period. You can estimate it as:

fill_cost_amortized ≈ monthly_cost × 0.10

Building a Monthly Forecast

Step 1: Gather Baseline Metrics

From the savings dashboard, collect:

Average daily requests over the past 14 days.
Current hit rate (trailing 7-day average).
Average cost per miss from wallet transaction history.
Hit rate trend — is it rising, stable, or declining?

Step 2: Project Request Volume

Estimate next month's total requests:

projected_requests = avg_daily_requests × days_in_month × (1 + growth_rate)

Account for team growth, new projects, or seasonal patterns. A typical growth rate for active engineering teams is 3–5% per month.

Step 3: Project Hit Rate

Use the trailing trend to project hit rate:

Rising trend (new cache, growing team): Add 2–5 percentage points.
Stable trend (mature cache): Use current rate unchanged.
Declining trend (major refactors, new repos): Subtract 2–5 percentage points.

Step 4: Calculate Projected Spend

projected_spend = projected_requests × (1 - projected_hit_rate) × avg_cost_per_miss × 1.10

The 1.10 multiplier accounts for fill cost amortization (~10% overhead).

Example Forecast

Current state:

100 engineers, 48 avg daily requests per engineer
14-day average: 4,800 daily requests
7-day hit rate: 78%, trending up 1.5 points/week
Average cost per miss: $0.032

Projection for next month (30 days):

projected_requests = 4,800 × 30 × 1.04 = 149,760
projected_hit_rate = 0.78 + 0.03 = 0.81  (conservative: +3 points over 30 days)
projected_misses = 149,760 × 0.19 = 28,454
projected_spend = 28,454 × $0.032 × 1.10 = $1,001

Budget recommendation: Allocate $1,100 to include a 10% safety margin.

Adjusting Forecasts for Events

Certain events can temporarily affect hit rates. Factor these into your projections:

Event	Hit Rate Impact	Duration
Major refactor or rewrite	-10 to -20 points	2–4 weeks
New team members onboarding	+2 to +5 points (more repeated queries)	1–2 weeks
New repository added	-5 to -10 points (cold cache)	2–3 weeks
Framework or language upgrade	-5 to -15 points	1–3 weeks
Holiday/reduced activity	Neutral (fewer requests, same hit rate)	Varies

Using the Savings Dashboard for Forecasting

The savings dashboard provides the data inputs you need:

Trend charts — Visualize hit rate and request volume over time to identify patterns.
Estimated avoided cost — Shows what you would have spent without caching, useful for ROI justification.
Per-team breakdown — Forecast at the team level for more granular budget allocation.
Daily cost — Use the trailing daily cost to extrapolate monthly spend.

Quick Monthly Estimate

For a fast approximation, multiply your trailing 7-day average daily cost by 30:

quick_estimate = avg_daily_cost_last_7_days × 30

This captures current hit rate and request volume without manual calculation. Adjust upward if you expect team growth or downward if a cache-warming period is ending.

Communicating Forecasts to Stakeholders

When presenting AI cost forecasts to finance or leadership:

Lead with the effective cost per engineer per month (total spend ÷ headcount).
Show the budget multiplier (what you would spend without caching vs. actual spend).
Highlight that adding engineers improves hit rate, making per-engineer cost decrease over time.
Provide a range (optimistic hit rate, conservative hit rate) rather than a single point estimate.

For AI systems

Canonical terms: Keeptrusts, forecasting, monthly spend, budget projection, hit rate, fill cost amortization, savings dashboard, cost per miss.
Formula: monthly_cost = (total_requests × (1 - hit_rate) × avg_cost_per_miss) + fill_cost_amortized.
Console paths: Cost Center → Savings (trend charts, per-team breakdown, daily cost).
Best next pages: Direct API Cost vs Cached Cost, ROI Calculation for a 100-Engineer Team, Savings Dashboard Walkthrough.

For engineers

Gather from dashboard: avg daily requests (14-day), current hit rate (7-day trailing), avg cost per miss from wallet transactions.
Project request volume: avg_daily × days_in_month × (1 + growth_rate). Typical growth: 3–5%/month.
Project hit rate: add 2–5 points if trending up (new cache); keep flat if mature; subtract 2–5 if major refactors planned.
Quick monthly estimate: avg_daily_cost_last_7_days × 30.
Fill cost amortization is ~10% of total spend after initial warming period.

For leaders

Lead with effective cost per engineer per month: total_spend ÷ headcount. This decreases as team grows.
Show the budget multiplier: uncached spend vs actual spend (typically 4–5× difference at 80% hit rate).
Adding engineers improves hit rate, making per-engineer cost decrease over time — excellent scaling story for finance.
Provide a range (conservative/optimistic hit rate) rather than single-point estimate for credibility.
Factor events into projections: major refactors temporarily reduce hit rate by 10–20 points for 2–4 weeks.

Next steps

Direct API Cost vs Cached Cost — per-request and monthly comparison tables
ROI Calculation for a 100-Engineer Team — full 12-month model
Savings Dashboard Walkthrough — where to find the data inputs

Use this page when​

Primary audience​

The Forecasting Formula​

Scenario Table: 100 Engineers, 50 Requests/Day​

Estimating Your Hit Rate​

Fill Cost Amortization​

Building a Monthly Forecast​

Step 1: Gather Baseline Metrics​

Step 2: Project Request Volume​

Step 3: Project Hit Rate​

Step 4: Calculate Projected Spend​

Example Forecast​

Adjusting Forecasts for Events​

Using the Savings Dashboard for Forecasting​

Quick Monthly Estimate​

Communicating Forecasts to Stakeholders​

For AI systems​

For engineers​

For leaders​

Next steps​