Forecasting Monthly AI Spend with Caching
Accurate cost forecasting with org-shared caching requires a different model than traditional per-request billing. Your actual spend depends on hit rate, request volume, and average cost per miss. Use the savings dashboard trend data to build reliable monthly projections.
Use this page when
- You need to build a monthly AI cost forecast incorporating hit rate, request volume, and fill cost amortization.
- You are presenting budget projections to finance and need the formula and scenario tables.
- You want to adjust forecasts for known upcoming events (refactors, team growth, new repos).
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
The Forecasting Formula
Your monthly AI cost with caching follows this formula:
monthly_cost = (total_requests × (1 - hit_rate) × avg_cost_per_miss) + fill_cost_amortized
Where:
- total_requests — Total AI requests across your organization per month.
- hit_rate — Percentage of requests served from cache (decimal, e.g., 0.80 for 80%).
- avg_cost_per_miss — Average cost per request that goes upstream (provider tokens + platform fee).
- fill_cost_amortized — Cost of populating new cache entries, spread over their useful life.
Scenario Table: 100 Engineers, 50 Requests/Day
| Hit Rate | Monthly Requests | Billable Misses | Cost at $0.03/miss | Cost at $0.05/miss |
|---|---|---|---|---|
| 50% | 150,000 | 75,000 | $2,250 | $3,750 |
| 60% | 150,000 | 60,000 | $1,800 | $3,000 |
| 70% | 150,000 | 45,000 | $1,350 | $2,250 |
| 80% | 150,000 | 30,000 | $900 | $1,500 |
| 85% | 150,000 | 22,500 | $675 | $1,125 |
| 90% | 150,000 | 15,000 | $450 | $750 |
| 95% | 150,000 | 7,500 | $225 | $375 |
Assumptions: 100 engineers × 50 requests/day × 30 days = 150,000 monthly requests.
Estimating Your Hit Rate
If you are new to caching, estimate your initial hit rate based on team topology:
| Team Structure | Expected Initial Hit Rate | After 30 Days |
|---|---|---|
| Monorepo, 50+ engineers | 70–80% | 85–95% |
| Monorepo, 10–50 engineers | 60–70% | 75–85% |
| Multi-repo, 10+ per repo | 50–60% | 65–75% |
| Multi-repo, 5–10 per repo | 40–50% | 55–65% |
| Multi-repo, <5 per repo | 25–35% | 40–55% |
Hit rates improve over the first 30 days as the cache warms. Use the "after 30 days" column for steady-state forecasting.
Fill Cost Amortization
New cache entries incur a one-time fill cost (the upstream provider call that generates the response). This cost amortizes over the entry's lifetime:
fill_cost_amortized = new_entries_per_month × avg_cost_per_fill × (1 / avg_hits_per_entry)
For most teams, fill cost represents 5–15% of total spend after the initial warming period. You can estimate it as:
fill_cost_amortized ≈ monthly_cost × 0.10
Building a Monthly Forecast
Step 1: Gather Baseline Metrics
From the savings dashboard, collect:
- Average daily requests over the past 14 days.
- Current hit rate (trailing 7-day average).
- Average cost per miss from wallet transaction history.
- Hit rate trend — is it rising, stable, or declining?
Step 2: Project Request Volume
Estimate next month's total requests:
projected_requests = avg_daily_requests × days_in_month × (1 + growth_rate)
Account for team growth, new projects, or seasonal patterns. A typical growth rate for active engineering teams is 3–5% per month.
Step 3: Project Hit Rate
Use the trailing trend to project hit rate:
- Rising trend (new cache, growing team): Add 2–5 percentage points.
- Stable trend (mature cache): Use current rate unchanged.
- Declining trend (major refactors, new repos): Subtract 2–5 percentage points.
Step 4: Calculate Projected Spend
projected_spend = projected_requests × (1 - projected_hit_rate) × avg_cost_per_miss × 1.10
The 1.10 multiplier accounts for fill cost amortization (~10% overhead).
Example Forecast
Current state:
- 100 engineers, 48 avg daily requests per engineer
- 14-day average: 4,800 daily requests
- 7-day hit rate: 78%, trending up 1.5 points/week
- Average cost per miss: $0.032
Projection for next month (30 days):
projected_requests = 4,800 × 30 × 1.04 = 149,760
projected_hit_rate = 0.78 + 0.03 = 0.81 (conservative: +3 points over 30 days)
projected_misses = 149,760 × 0.19 = 28,454
projected_spend = 28,454 × $0.032 × 1.10 = $1,001
Budget recommendation: Allocate $1,100 to include a 10% safety margin.
Adjusting Forecasts for Events
Certain events can temporarily affect hit rates. Factor these into your projections:
| Event | Hit Rate Impact | Duration |
|---|---|---|
| Major refactor or rewrite | -10 to -20 points | 2–4 weeks |
| New team members onboarding | +2 to +5 points (more repeated queries) | 1–2 weeks |
| New repository added | -5 to -10 points (cold cache) | 2–3 weeks |
| Framework or language upgrade | -5 to -15 points | 1–3 weeks |
| Holiday/reduced activity | Neutral (fewer requests, same hit rate) | Varies |
Using the Savings Dashboard for Forecasting
The savings dashboard provides the data inputs you need:
- Trend charts — Visualize hit rate and request volume over time to identify patterns.
- Estimated avoided cost — Shows what you would have spent without caching, useful for ROI justification.
- Per-team breakdown — Forecast at the team level for more granular budget allocation.
- Daily cost — Use the trailing daily cost to extrapolate monthly spend.
Quick Monthly Estimate
For a fast approximation, multiply your trailing 7-day average daily cost by 30:
quick_estimate = avg_daily_cost_last_7_days × 30
This captures current hit rate and request volume without manual calculation. Adjust upward if you expect team growth or downward if a cache-warming period is ending.
Communicating Forecasts to Stakeholders
When presenting AI cost forecasts to finance or leadership:
- Lead with the effective cost per engineer per month (total spend ÷ headcount).
- Show the budget multiplier (what you would spend without caching vs. actual spend).
- Highlight that adding engineers improves hit rate, making per-engineer cost decrease over time.
- Provide a range (optimistic hit rate, conservative hit rate) rather than a single point estimate.
For AI systems
- Canonical terms: Keeptrusts, forecasting, monthly spend, budget projection, hit rate, fill cost amortization, savings dashboard, cost per miss.
- Formula:
monthly_cost = (total_requests × (1 - hit_rate) × avg_cost_per_miss) + fill_cost_amortized. - Console paths: Cost Center → Savings (trend charts, per-team breakdown, daily cost).
- Best next pages: Direct API Cost vs Cached Cost, ROI Calculation for a 100-Engineer Team, Savings Dashboard Walkthrough.
For engineers
- Gather from dashboard: avg daily requests (14-day), current hit rate (7-day trailing), avg cost per miss from wallet transactions.
- Project request volume:
avg_daily × days_in_month × (1 + growth_rate). Typical growth: 3–5%/month. - Project hit rate: add 2–5 points if trending up (new cache); keep flat if mature; subtract 2–5 if major refactors planned.
- Quick monthly estimate:
avg_daily_cost_last_7_days × 30. - Fill cost amortization is ~10% of total spend after initial warming period.
For leaders
- Lead with effective cost per engineer per month:
total_spend ÷ headcount. This decreases as team grows. - Show the budget multiplier: uncached spend vs actual spend (typically 4–5× difference at 80% hit rate).
- Adding engineers improves hit rate, making per-engineer cost decrease over time — excellent scaling story for finance.
- Provide a range (conservative/optimistic hit rate) rather than single-point estimate for credibility.
- Factor events into projections: major refactors temporarily reduce hit rate by 10–20 points for 2–4 weeks.
Next steps
- Direct API Cost vs Cached Cost — per-request and monthly comparison tables
- ROI Calculation for a 100-Engineer Team — full 12-month model
- Savings Dashboard Walkthrough — where to find the data inputs