Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Forecasting Monthly AI Spend with Caching

Accurate cost forecasting with org-shared caching requires a different model than traditional per-request billing. Your actual spend depends on hit rate, request volume, and average cost per miss. Use the savings dashboard trend data to build reliable monthly projections.

Use this page when

  • You need to build a monthly AI cost forecast incorporating hit rate, request volume, and fill cost amortization.
  • You are presenting budget projections to finance and need the formula and scenario tables.
  • You want to adjust forecasts for known upcoming events (refactors, team growth, new repos).

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The Forecasting Formula

Your monthly AI cost with caching follows this formula:

monthly_cost = (total_requests × (1 - hit_rate) × avg_cost_per_miss) + fill_cost_amortized

Where:

  • total_requests — Total AI requests across your organization per month.
  • hit_rate — Percentage of requests served from cache (decimal, e.g., 0.80 for 80%).
  • avg_cost_per_miss — Average cost per request that goes upstream (provider tokens + platform fee).
  • fill_cost_amortized — Cost of populating new cache entries, spread over their useful life.

Scenario Table: 100 Engineers, 50 Requests/Day

Hit RateMonthly RequestsBillable MissesCost at $0.03/missCost at $0.05/miss
50%150,00075,000$2,250$3,750
60%150,00060,000$1,800$3,000
70%150,00045,000$1,350$2,250
80%150,00030,000$900$1,500
85%150,00022,500$675$1,125
90%150,00015,000$450$750
95%150,0007,500$225$375

Assumptions: 100 engineers × 50 requests/day × 30 days = 150,000 monthly requests.

Estimating Your Hit Rate

If you are new to caching, estimate your initial hit rate based on team topology:

Team StructureExpected Initial Hit RateAfter 30 Days
Monorepo, 50+ engineers70–80%85–95%
Monorepo, 10–50 engineers60–70%75–85%
Multi-repo, 10+ per repo50–60%65–75%
Multi-repo, 5–10 per repo40–50%55–65%
Multi-repo, <5 per repo25–35%40–55%

Hit rates improve over the first 30 days as the cache warms. Use the "after 30 days" column for steady-state forecasting.

Fill Cost Amortization

New cache entries incur a one-time fill cost (the upstream provider call that generates the response). This cost amortizes over the entry's lifetime:

fill_cost_amortized = new_entries_per_month × avg_cost_per_fill × (1 / avg_hits_per_entry)

For most teams, fill cost represents 5–15% of total spend after the initial warming period. You can estimate it as:

fill_cost_amortized ≈ monthly_cost × 0.10

Building a Monthly Forecast

Step 1: Gather Baseline Metrics

From the savings dashboard, collect:

  • Average daily requests over the past 14 days.
  • Current hit rate (trailing 7-day average).
  • Average cost per miss from wallet transaction history.
  • Hit rate trend — is it rising, stable, or declining?

Step 2: Project Request Volume

Estimate next month's total requests:

projected_requests = avg_daily_requests × days_in_month × (1 + growth_rate)

Account for team growth, new projects, or seasonal patterns. A typical growth rate for active engineering teams is 3–5% per month.

Step 3: Project Hit Rate

Use the trailing trend to project hit rate:

  • Rising trend (new cache, growing team): Add 2–5 percentage points.
  • Stable trend (mature cache): Use current rate unchanged.
  • Declining trend (major refactors, new repos): Subtract 2–5 percentage points.

Step 4: Calculate Projected Spend

projected_spend = projected_requests × (1 - projected_hit_rate) × avg_cost_per_miss × 1.10

The 1.10 multiplier accounts for fill cost amortization (~10% overhead).

Example Forecast

Current state:

  • 100 engineers, 48 avg daily requests per engineer
  • 14-day average: 4,800 daily requests
  • 7-day hit rate: 78%, trending up 1.5 points/week
  • Average cost per miss: $0.032

Projection for next month (30 days):

projected_requests = 4,800 × 30 × 1.04 = 149,760
projected_hit_rate = 0.78 + 0.03 = 0.81 (conservative: +3 points over 30 days)
projected_misses = 149,760 × 0.19 = 28,454
projected_spend = 28,454 × $0.032 × 1.10 = $1,001

Budget recommendation: Allocate $1,100 to include a 10% safety margin.

Adjusting Forecasts for Events

Certain events can temporarily affect hit rates. Factor these into your projections:

EventHit Rate ImpactDuration
Major refactor or rewrite-10 to -20 points2–4 weeks
New team members onboarding+2 to +5 points (more repeated queries)1–2 weeks
New repository added-5 to -10 points (cold cache)2–3 weeks
Framework or language upgrade-5 to -15 points1–3 weeks
Holiday/reduced activityNeutral (fewer requests, same hit rate)Varies

Using the Savings Dashboard for Forecasting

The savings dashboard provides the data inputs you need:

  • Trend charts — Visualize hit rate and request volume over time to identify patterns.
  • Estimated avoided cost — Shows what you would have spent without caching, useful for ROI justification.
  • Per-team breakdown — Forecast at the team level for more granular budget allocation.
  • Daily cost — Use the trailing daily cost to extrapolate monthly spend.

Quick Monthly Estimate

For a fast approximation, multiply your trailing 7-day average daily cost by 30:

quick_estimate = avg_daily_cost_last_7_days × 30

This captures current hit rate and request volume without manual calculation. Adjust upward if you expect team growth or downward if a cache-warming period is ending.

Communicating Forecasts to Stakeholders

When presenting AI cost forecasts to finance or leadership:

  • Lead with the effective cost per engineer per month (total spend ÷ headcount).
  • Show the budget multiplier (what you would spend without caching vs. actual spend).
  • Highlight that adding engineers improves hit rate, making per-engineer cost decrease over time.
  • Provide a range (optimistic hit rate, conservative hit rate) rather than a single point estimate.

For AI systems

For engineers

  • Gather from dashboard: avg daily requests (14-day), current hit rate (7-day trailing), avg cost per miss from wallet transactions.
  • Project request volume: avg_daily × days_in_month × (1 + growth_rate). Typical growth: 3–5%/month.
  • Project hit rate: add 2–5 points if trending up (new cache); keep flat if mature; subtract 2–5 if major refactors planned.
  • Quick monthly estimate: avg_daily_cost_last_7_days × 30.
  • Fill cost amortization is ~10% of total spend after initial warming period.

For leaders

  • Lead with effective cost per engineer per month: total_spend ÷ headcount. This decreases as team grows.
  • Show the budget multiplier: uncached spend vs actual spend (typically 4–5× difference at 80% hit rate).
  • Adding engineers improves hit rate, making per-engineer cost decrease over time — excellent scaling story for finance.
  • Provide a range (conservative/optimistic hit rate) rather than single-point estimate for credibility.
  • Factor events into projections: major refactors temporarily reduce hit rate by 10–20 points for 2–4 weeks.

Next steps