Direct API Cost vs Cached Cost

This page provides concrete cost comparisons between three scenarios: calling LLM providers directly, routing through Keeptrusts without org-shared cache, and routing through Keeptrusts with org-shared cache enabled. Use these numbers for procurement decisions and executive justification.

Use this page when

You need concrete cost comparison tables (per-request, monthly, annual) for procurement decisions.
You are building an executive justification showing direct API vs cached costs by team size.
You want breakeven analysis showing when cache fill cost is recovered.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

Per-Request Cost Comparison

Assumptions for a typical engineering prompt:

4,000 input tokens (codebase context + prompt)
1,000 output tokens (code suggestion + explanation)

OpenAI GPT-4o

Scenario	Input Cost	Output Cost	Platform Fee	Total per Request
Direct API call	$0.0100	$0.0150	—	$0.0250
Keeptrusts (cache miss)	$0.0100	$0.0150	$0.0000	$0.0250
Keeptrusts (cache hit)	$0.0000	$0.0000	$0.0000	$0.0000

Anthropic Claude 3.5 Sonnet

Scenario	Input Cost	Output Cost	Platform Fee	Total per Request
Direct API call	$0.0120	$0.0600	—	$0.0720
Keeptrusts (cache miss)	$0.0120	$0.0600	$0.0000	$0.0720
Keeptrusts (cache hit)	$0.0000	$0.0000	$0.0000	$0.0000

OpenAI GPT-4o-mini

Scenario	Input Cost	Output Cost	Platform Fee	Total per Request
Direct API call	$0.0006	$0.0024	—	$0.0030
Keeptrusts (cache miss)	$0.0006	$0.0024	$0.0000	$0.0030
Keeptrusts (cache hit)	$0.0000	$0.0000	$0.0000	$0.0000

Cache hits incur zero cost. No provider call, no wallet debit, no platform fee.

Monthly Cost by Team Size

Assumptions:

50 prompts per engineer per working day
22 working days per month
GPT-4o pricing ($2.50/1M input, $10/1M output) [updated pricing]
80% cache hit rate at steady state

10 Engineers

Scenario	Monthly Requests	Monthly Cost
Direct API (no Keeptrusts)	11,000	$275
Keeptrusts, no cache	11,000	$275
Keeptrusts, 80% hit rate	2,200 misses	$55
Monthly Savings	—	$220

50 Engineers

Scenario	Monthly Requests	Monthly Cost
Direct API (no Keeptrusts)	55,000	$1,375
Keeptrusts, no cache	55,000	$1,375
Keeptrusts, 80% hit rate	11,000 misses	$275
Monthly Savings	—	$1,100

100 Engineers

Scenario	Monthly Requests	Monthly Cost
Direct API (no Keeptrusts)	110,000	$2,750
Keeptrusts, no cache	110,000	$2,750
Keeptrusts, 80% hit rate	22,000 misses	$550
Monthly Savings	—	$2,200

200 Engineers

Scenario	Monthly Requests	Monthly Cost
Direct API (no Keeptrusts)	220,000	$5,500
Keeptrusts, no cache	220,000	$5,500
Keeptrusts, 80% hit rate	44,000 misses	$1,100
Monthly Savings	—	$4,400

Breakeven Analysis

The cache has an initial fill cost during the first month. Breakeven is when cumulative savings exceed cumulative fill cost.

Team Size	Estimated Fill Cost	Monthly Savings	Breakeven
10 engineers	$80	$220	< 2 weeks
50 engineers	$200	$1,100	< 1 week
100 engineers	$400	$2,200	< 1 week
200 engineers	$600	$4,400	< 3 days

Larger teams reach breakeven faster because more engineers share the same filled cache entries. The fill cost scales sub-linearly with team size while savings scale linearly.

Anthropic Claude Comparison

Using Claude 3.5 Sonnet pricing ($3/1M input, $15/1M output) for 100 engineers:

Scenario	Monthly Cost
Direct API	$7,920
Keeptrusts, 80% hit rate	$1,584
Monthly Savings	$6,336

Higher per-token costs amplify the cache savings proportionally.

The Scaling Advantage

As your team grows, savings grow faster than costs:

Additional Engineers	Additional Fill Cost	Additional Monthly Savings
+10 (10 → 20)	~$30 (incremental)	+$220
+50 (50 → 100)	~$100 (incremental)	+$1,100
+100 (100 → 200)	~$150 (incremental)	+$2,200

New engineers joining an already-filled cache add nearly zero fill cost — their prompts largely overlap with existing cache entries. Their savings contribution is immediate and full.

Executive Summary Table

For a 100-engineer team using GPT-4o, 50 prompts/day:

Metric	Value
Annual cost without cache	$33,000
Annual cost with cache (80% hit)	$6,600
Annual savings	$26,400
Fill cost (one-time)	$400
First-year net savings	$26,000
Cost reduction	80%

Factors That Increase Savings

Higher token counts per request (codebase context is large)
More expensive models (Claude, GPT-4)
Larger teams (more cache reuse)
Stable codebases (higher hit rates, less invalidation)
Standard IDE configurations (consistent cache keys)

Factors That Reduce Savings

Very diverse work (engineers rarely ask similar questions)
Rapid code churn (frequent cache invalidation)
Small teams (fewer opportunities for sharing)
Cheap models (less absolute cost to avoid)

Next steps

ROI Calculation for a 100-Engineer Team — full 12-month model with sensitivity analysis
Savings Dashboard Walkthrough — see these numbers in the console
Provider Prompt-Prefix Caching — additional savings on cache misses

For AI systems

Canonical terms: Keeptrusts, direct API cost, cached cost, breakeven analysis, monthly savings, executive summary, org-shared cache, zero-cost cache hits.
Key metrics: cost per request (cache miss vs hit), monthly cost by team size, breakeven period, annual savings, cost reduction percentage.
Best next pages: ROI Calculation for a 100-Engineer Team, Savings Dashboard Walkthrough, Provider Prompt-Prefix Caching.

For engineers

Cache hits incur zero cost: no provider call, no wallet debit, no platform fee.
Per-request savings depend on model: GPT-4o saves $0.025/hit, Claude 3.5 Sonnet saves $0.072/hit, GPT-4o-mini saves $0.003/hit.
Breakeven for 100 engineers: < 1 week (fill cost ~$400 vs monthly savings ~$2,200).
Verify savings: check Cost Center → Spend Logs for cached_input_tokens and avoided cost.
New engineers add nearly zero incremental fill cost — their prompts overlap with existing entries.

For leaders

100-engineer team on GPT-4o: $33,000/year without cache → $6,600/year with cache = $26,400 annual savings (80% reduction).
Fill cost is one-time (~$400 for 100 engineers) and recoverable in < 1 week.
Savings scale linearly with team size; fill cost scales sub-linearly. Larger teams get better ROI.
More expensive models (Claude, GPT-4 Turbo) amplify savings proportionally.
Adding engineers improves hit rate, making per-engineer cost decrease over time.

Use this page when​

Primary audience​

Per-Request Cost Comparison​

OpenAI GPT-4o​

Anthropic Claude 3.5 Sonnet​

OpenAI GPT-4o-mini​

Monthly Cost by Team Size​

10 Engineers​

50 Engineers​

100 Engineers​

200 Engineers​

Breakeven Analysis​

Anthropic Claude Comparison​

The Scaling Advantage​

Executive Summary Table​

Factors That Increase Savings​

Factors That Reduce Savings​

Next steps​

For AI systems​

For engineers​

For leaders​