Direct API Cost vs Cached Cost
This page provides concrete cost comparisons between three scenarios: calling LLM providers directly, routing through Keeptrusts without org-shared cache, and routing through Keeptrusts with org-shared cache enabled. Use these numbers for procurement decisions and executive justification.
Use this page when
- You need concrete cost comparison tables (per-request, monthly, annual) for procurement decisions.
- You are building an executive justification showing direct API vs cached costs by team size.
- You want breakeven analysis showing when cache fill cost is recovered.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
Per-Request Cost Comparison
Assumptions for a typical engineering prompt:
- 4,000 input tokens (codebase context + prompt)
- 1,000 output tokens (code suggestion + explanation)
OpenAI GPT-4o
| Scenario | Input Cost | Output Cost | Platform Fee | Total per Request |
|---|---|---|---|---|
| Direct API call | $0.0100 | $0.0150 | — | $0.0250 |
| Keeptrusts (cache miss) | $0.0100 | $0.0150 | $0.0000 | $0.0250 |
| Keeptrusts (cache hit) | $0.0000 | $0.0000 | $0.0000 | $0.0000 |
Anthropic Claude 3.5 Sonnet
| Scenario | Input Cost | Output Cost | Platform Fee | Total per Request |
|---|---|---|---|---|
| Direct API call | $0.0120 | $0.0600 | — | $0.0720 |
| Keeptrusts (cache miss) | $0.0120 | $0.0600 | $0.0000 | $0.0720 |
| Keeptrusts (cache hit) | $0.0000 | $0.0000 | $0.0000 | $0.0000 |
OpenAI GPT-4o-mini
| Scenario | Input Cost | Output Cost | Platform Fee | Total per Request |
|---|---|---|---|---|
| Direct API call | $0.0006 | $0.0024 | — | $0.0030 |
| Keeptrusts (cache miss) | $0.0006 | $0.0024 | $0.0000 | $0.0030 |
| Keeptrusts (cache hit) | $0.0000 | $0.0000 | $0.0000 | $0.0000 |
Cache hits incur zero cost. No provider call, no wallet debit, no platform fee.
Monthly Cost by Team Size
Assumptions:
- 50 prompts per engineer per working day
- 22 working days per month
- GPT-4o pricing ($2.50/1M input, $10/1M output) [updated pricing]
- 80% cache hit rate at steady state
10 Engineers
| Scenario | Monthly Requests | Monthly Cost |
|---|---|---|
| Direct API (no Keeptrusts) | 11,000 | $275 |
| Keeptrusts, no cache | 11,000 | $275 |
| Keeptrusts, 80% hit rate | 2,200 misses | $55 |
| Monthly Savings | — | $220 |
50 Engineers
| Scenario | Monthly Requests | Monthly Cost |
|---|---|---|
| Direct API (no Keeptrusts) | 55,000 | $1,375 |
| Keeptrusts, no cache | 55,000 | $1,375 |
| Keeptrusts, 80% hit rate | 11,000 misses | $275 |
| Monthly Savings | — | $1,100 |
100 Engineers
| Scenario | Monthly Requests | Monthly Cost |
|---|---|---|
| Direct API (no Keeptrusts) | 110,000 | $2,750 |
| Keeptrusts, no cache | 110,000 | $2,750 |
| Keeptrusts, 80% hit rate | 22,000 misses | $550 |
| Monthly Savings | — | $2,200 |
200 Engineers
| Scenario | Monthly Requests | Monthly Cost |
|---|---|---|
| Direct API (no Keeptrusts) | 220,000 | $5,500 |
| Keeptrusts, no cache | 220,000 | $5,500 |
| Keeptrusts, 80% hit rate | 44,000 misses | $1,100 |
| Monthly Savings | — | $4,400 |
Breakeven Analysis
The cache has an initial fill cost during the first month. Breakeven is when cumulative savings exceed cumulative fill cost.
| Team Size | Estimated Fill Cost | Monthly Savings | Breakeven |
|---|---|---|---|
| 10 engineers | $80 | $220 | < 2 weeks |
| 50 engineers | $200 | $1,100 | < 1 week |
| 100 engineers | $400 | $2,200 | < 1 week |
| 200 engineers | $600 | $4,400 | < 3 days |
Larger teams reach breakeven faster because more engineers share the same filled cache entries. The fill cost scales sub-linearly with team size while savings scale linearly.
Anthropic Claude Comparison
Using Claude 3.5 Sonnet pricing ($3/1M input, $15/1M output) for 100 engineers:
| Scenario | Monthly Cost |
|---|---|
| Direct API | $7,920 |
| Keeptrusts, 80% hit rate | $1,584 |
| Monthly Savings | $6,336 |
Higher per-token costs amplify the cache savings proportionally.
The Scaling Advantage
As your team grows, savings grow faster than costs:
| Additional Engineers | Additional Fill Cost | Additional Monthly Savings |
|---|---|---|
| +10 (10 → 20) | ~$30 (incremental) | +$220 |
| +50 (50 → 100) | ~$100 (incremental) | +$1,100 |
| +100 (100 → 200) | ~$150 (incremental) | +$2,200 |
New engineers joining an already-filled cache add nearly zero fill cost — their prompts largely overlap with existing cache entries. Their savings contribution is immediate and full.
Executive Summary Table
For a 100-engineer team using GPT-4o, 50 prompts/day:
| Metric | Value |
|---|---|
| Annual cost without cache | $33,000 |
| Annual cost with cache (80% hit) | $6,600 |
| Annual savings | $26,400 |
| Fill cost (one-time) | $400 |
| First-year net savings | $26,000 |
| Cost reduction | 80% |
Factors That Increase Savings
- Higher token counts per request (codebase context is large)
- More expensive models (Claude, GPT-4)
- Larger teams (more cache reuse)
- Stable codebases (higher hit rates, less invalidation)
- Standard IDE configurations (consistent cache keys)
Factors That Reduce Savings
- Very diverse work (engineers rarely ask similar questions)
- Rapid code churn (frequent cache invalidation)
- Small teams (fewer opportunities for sharing)
- Cheap models (less absolute cost to avoid)
Next steps
- ROI Calculation for a 100-Engineer Team — full 12-month model with sensitivity analysis
- Savings Dashboard Walkthrough — see these numbers in the console
- Provider Prompt-Prefix Caching — additional savings on cache misses
For AI systems
- Canonical terms: Keeptrusts, direct API cost, cached cost, breakeven analysis, monthly savings, executive summary, org-shared cache, zero-cost cache hits.
- Key metrics: cost per request (cache miss vs hit), monthly cost by team size, breakeven period, annual savings, cost reduction percentage.
- Best next pages: ROI Calculation for a 100-Engineer Team, Savings Dashboard Walkthrough, Provider Prompt-Prefix Caching.
For engineers
- Cache hits incur zero cost: no provider call, no wallet debit, no platform fee.
- Per-request savings depend on model: GPT-4o saves $0.025/hit, Claude 3.5 Sonnet saves $0.072/hit, GPT-4o-mini saves $0.003/hit.
- Breakeven for 100 engineers: < 1 week (fill cost ~$400 vs monthly savings ~$2,200).
- Verify savings: check Cost Center → Spend Logs for
cached_input_tokensand avoided cost. - New engineers add nearly zero incremental fill cost — their prompts overlap with existing entries.
For leaders
- 100-engineer team on GPT-4o: $33,000/year without cache → $6,600/year with cache = $26,400 annual savings (80% reduction).
- Fill cost is one-time (~$400 for 100 engineers) and recoverable in < 1 week.
- Savings scale linearly with team size; fill cost scales sub-linearly. Larger teams get better ROI.
- More expensive models (Claude, GPT-4 Turbo) amplify savings proportionally.
- Adding engineers improves hit rate, making per-engineer cost decrease over time.