Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Zero-Cost Cache Hits: The No-Fee Policy

Cache hits are completely free. When a request resolves from the org-shared cache, three costs are eliminated simultaneously: no wallet debit, no platform fee, and no upstream provider call. You only pay for cache fills (first-time queries that populate the cache) and cache misses.

Use this page when

  • You want to understand the no-fee policy: why cache hits cost zero (no wallet debit, no platform fee, no upstream call).
  • You are building a cost model or budget plan and need to understand the fill-only pricing model.
  • You need to verify that cache hits are not debiting your wallet.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The Three-Zero Guarantee

Every cache hit delivers three zeros:

Cost ComponentCache MissCache Hit
Upstream provider callCharged (tokens × rate)$0.00
Wallet balance debitReserved and settled$0.00
Platform feeApplied per request$0.00

This is not a discount or a reduced rate — it is a complete elimination of all costs for that request.

Why Cache Hits Are Free

The economic logic is straightforward:

  1. No upstream call — The LLM provider is never contacted, so no token-based charges accrue.
  2. No compute consumed — Serving a cached response requires only a lookup and similarity comparison, not inference.
  3. No platform fee — Keeptrusts does not charge for responses that avoid upstream providers. The platform fee applies only to requests that transit through to a provider.

The only cost associated with a cached response is the original fill request that populated the cache entry — and that cost was already paid by whoever triggered the initial query.

What Gets Recorded

Although cache hits cost nothing, Keeptrusts records metadata for observability:

  • estimated_avoided_cost — The dollar amount that would have been charged if the request had gone upstream. This powers savings dashboards and ROI reporting.
  • cache_hit: true — Marks the event as a cache-served response.
  • original_fill_timestamp — When the cached response was first generated.
  • similarity_score — How closely the request matched the cached entry.

No billing event, no wallet transaction, and no provider usage record is created.

Cost Model: You Only Pay for Fills and Misses

Your effective monthly cost follows a simple formula:

monthly_cost = total_requests × (1 - hit_rate) × cost_per_request

All costs concentrate on the minority of requests that miss the cache:

Total Monthly RequestsHit RateBillable RequestsCost at $0.03/req
100,00060%40,000$1,200
100,00070%30,000$900
100,00080%20,000$600
100,00090%10,000$300
100,00095%5,000$150

A team generating 100,000 AI requests per month at a 90% hit rate pays for only 10,000 requests — the other 90,000 are served at zero cost.

Key Differentiator

Most AI platforms charge for every request regardless of whether the response was previously generated. Keeptrusts takes a different approach:

  • Other platforms: Pay per request, even for duplicate queries across your team.
  • Keeptrusts with caching: Pay once when the response is first generated. Every subsequent match across your entire organization is free.

This transforms your cost model from linear (cost scales with headcount) to sublinear (cost scales with unique query diversity, not team size). Adding the 101st engineer to your team adds nearly zero incremental cost if they ask questions already answered for the first 100.

Fill Cost Amortization

The cost of a cache fill is amortized across all future hits on that entry:

effective_cost_per_response = fill_cost / (1 + number_of_hits_on_entry)

A single fill at $0.03 that serves 50 subsequent hits has an effective cost of:

$0.03 / 51 = $0.000588 per response

High-traffic cache entries that serve hundreds of hits approach zero effective cost per response.

Verifying Zero-Cost Behavior

You can confirm that cache hits are not debiting your wallet:

  1. Check the savings dashboard — it shows estimated_avoided_cost accumulating without corresponding wallet debits.
  2. Review wallet transaction history — cache hits produce no transaction entries.
  3. Inspect event metadata — events with cache_hit: true have no associated wallet_transaction_id.

Implications for Budget Planning

The no-fee policy on cache hits changes how you plan AI budgets:

  • Budget for misses only — Allocate wallet balance based on expected miss volume, not total request volume.
  • Invest in hit rate — Every percentage point of hit rate improvement directly reduces spend. Improving from 80% to 90% cuts costs in half.
  • Scale teams without scaling cost — Adding engineers increases hits more than misses, so per-engineer cost decreases as teams grow.
  • Front-load cache warming — Early investment in cache fills pays dividends as the team grows.

When Costs Do Apply

To be clear, costs apply in these scenarios:

  • Cache miss — Request has no matching cache entry. Full upstream call, wallet reserve/settle, and platform fee apply.
  • Cache fill — A miss that successfully populates the cache. Same cost as a miss, but future matches benefit.
  • TTL expiry and re-fill — When a cached entry expires and must be regenerated. Set TTLs appropriately to balance freshness with cost.
  • Similarity below threshold — When a request is semantically close but below the configured similarity threshold, it is treated as a miss.

For AI systems

  • Canonical terms: Keeptrusts, zero-cost cache hit, no-fee policy, estimated_avoided_cost, fill cost amortization, cost model, cache hit free.
  • Exact feature/config names: cache_hit: true event flag, estimated_avoided_cost field, wallet_debited: false, platform_fee: 0, original_fill_timestamp, similarity_score.
  • Best next pages: Wallet Integration with Cache Hits, Tracking Avoided Cost, Single-Flight Fill.

For engineers

  • Verify zero-cost behavior: check savings dashboard for estimated_avoided_cost accumulating without wallet debits.
  • Confirm in wallet transaction history — cache hits produce no transaction entries.
  • Inspect event metadata: events with cache_hit: true have no associated wallet_transaction_id.
  • Costs apply on: cache miss, cache fill, TTL expiry re-fill, and similarity below configured threshold.

For leaders

  • Cache hits are completely free — no wallet debit, no platform fee, no upstream provider call. This is an elimination, not a discount.
  • Budget planning shifts from total request volume to miss volume: monthly_cost = total_requests × (1 - hit_rate) × cost_per_request.
  • Adding engineers increases hits more than misses, so per-engineer cost decreases as teams grow — cost scales with query diversity, not headcount.
  • Front-load cache warming investment: early fill cost pays compound dividends as team size and usage grow.

Next steps