Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Direct API Cost vs Cached Cost

This page provides concrete cost comparisons between three scenarios: calling LLM providers directly, routing through Keeptrusts without org-shared cache, and routing through Keeptrusts with org-shared cache enabled. Use these numbers for procurement decisions and executive justification.

Use this page when

  • You need concrete cost comparison tables (per-request, monthly, annual) for procurement decisions.
  • You are building an executive justification showing direct API vs cached costs by team size.
  • You want breakeven analysis showing when cache fill cost is recovered.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

Per-Request Cost Comparison

Assumptions for a typical engineering prompt:

  • 4,000 input tokens (codebase context + prompt)
  • 1,000 output tokens (code suggestion + explanation)

OpenAI GPT-4o

ScenarioInput CostOutput CostPlatform FeeTotal per Request
Direct API call$0.0100$0.0150$0.0250
Keeptrusts (cache miss)$0.0100$0.0150$0.0000$0.0250
Keeptrusts (cache hit)$0.0000$0.0000$0.0000$0.0000

Anthropic Claude 3.5 Sonnet

ScenarioInput CostOutput CostPlatform FeeTotal per Request
Direct API call$0.0120$0.0600$0.0720
Keeptrusts (cache miss)$0.0120$0.0600$0.0000$0.0720
Keeptrusts (cache hit)$0.0000$0.0000$0.0000$0.0000

OpenAI GPT-4o-mini

ScenarioInput CostOutput CostPlatform FeeTotal per Request
Direct API call$0.0006$0.0024$0.0030
Keeptrusts (cache miss)$0.0006$0.0024$0.0000$0.0030
Keeptrusts (cache hit)$0.0000$0.0000$0.0000$0.0000

Cache hits incur zero cost. No provider call, no wallet debit, no platform fee.

Monthly Cost by Team Size

Assumptions:

  • 50 prompts per engineer per working day
  • 22 working days per month
  • GPT-4o pricing ($2.50/1M input, $10/1M output) [updated pricing]
  • 80% cache hit rate at steady state

10 Engineers

ScenarioMonthly RequestsMonthly Cost
Direct API (no Keeptrusts)11,000$275
Keeptrusts, no cache11,000$275
Keeptrusts, 80% hit rate2,200 misses$55
Monthly Savings$220

50 Engineers

ScenarioMonthly RequestsMonthly Cost
Direct API (no Keeptrusts)55,000$1,375
Keeptrusts, no cache55,000$1,375
Keeptrusts, 80% hit rate11,000 misses$275
Monthly Savings$1,100

100 Engineers

ScenarioMonthly RequestsMonthly Cost
Direct API (no Keeptrusts)110,000$2,750
Keeptrusts, no cache110,000$2,750
Keeptrusts, 80% hit rate22,000 misses$550
Monthly Savings$2,200

200 Engineers

ScenarioMonthly RequestsMonthly Cost
Direct API (no Keeptrusts)220,000$5,500
Keeptrusts, no cache220,000$5,500
Keeptrusts, 80% hit rate44,000 misses$1,100
Monthly Savings$4,400

Breakeven Analysis

The cache has an initial fill cost during the first month. Breakeven is when cumulative savings exceed cumulative fill cost.

Team SizeEstimated Fill CostMonthly SavingsBreakeven
10 engineers$80$220< 2 weeks
50 engineers$200$1,100< 1 week
100 engineers$400$2,200< 1 week
200 engineers$600$4,400< 3 days

Larger teams reach breakeven faster because more engineers share the same filled cache entries. The fill cost scales sub-linearly with team size while savings scale linearly.

Anthropic Claude Comparison

Using Claude 3.5 Sonnet pricing ($3/1M input, $15/1M output) for 100 engineers:

ScenarioMonthly Cost
Direct API$7,920
Keeptrusts, 80% hit rate$1,584
Monthly Savings$6,336

Higher per-token costs amplify the cache savings proportionally.

The Scaling Advantage

As your team grows, savings grow faster than costs:

Additional EngineersAdditional Fill CostAdditional Monthly Savings
+10 (10 → 20)~$30 (incremental)+$220
+50 (50 → 100)~$100 (incremental)+$1,100
+100 (100 → 200)~$150 (incremental)+$2,200

New engineers joining an already-filled cache add nearly zero fill cost — their prompts largely overlap with existing cache entries. Their savings contribution is immediate and full.

Executive Summary Table

For a 100-engineer team using GPT-4o, 50 prompts/day:

MetricValue
Annual cost without cache$33,000
Annual cost with cache (80% hit)$6,600
Annual savings$26,400
Fill cost (one-time)$400
First-year net savings$26,000
Cost reduction80%

Factors That Increase Savings

  • Higher token counts per request (codebase context is large)
  • More expensive models (Claude, GPT-4)
  • Larger teams (more cache reuse)
  • Stable codebases (higher hit rates, less invalidation)
  • Standard IDE configurations (consistent cache keys)

Factors That Reduce Savings

  • Very diverse work (engineers rarely ask similar questions)
  • Rapid code churn (frequent cache invalidation)
  • Small teams (fewer opportunities for sharing)
  • Cheap models (less absolute cost to avoid)

Next steps

For AI systems

For engineers

  • Cache hits incur zero cost: no provider call, no wallet debit, no platform fee.
  • Per-request savings depend on model: GPT-4o saves $0.025/hit, Claude 3.5 Sonnet saves $0.072/hit, GPT-4o-mini saves $0.003/hit.
  • Breakeven for 100 engineers: < 1 week (fill cost ~$400 vs monthly savings ~$2,200).
  • Verify savings: check Cost Center → Spend Logs for cached_input_tokens and avoided cost.
  • New engineers add nearly zero incremental fill cost — their prompts overlap with existing entries.

For leaders

  • 100-engineer team on GPT-4o: $33,000/year without cache → $6,600/year with cache = $26,400 annual savings (80% reduction).
  • Fill cost is one-time (~$400 for 100 engineers) and recoverable in < 1 week.
  • Savings scale linearly with team size; fill cost scales sub-linearly. Larger teams get better ROI.
  • More expensive models (Claude, GPT-4 Turbo) amplify savings proportionally.
  • Adding engineers improves hit rate, making per-engineer cost decrease over time.