Zero-Cost Cache Hits: The No-Fee Policy
Cache hits are completely free. When a request resolves from the org-shared cache, three costs are eliminated simultaneously: no wallet debit, no platform fee, and no upstream provider call. You only pay for cache fills (first-time queries that populate the cache) and cache misses.
Use this page when
- You want to understand the no-fee policy: why cache hits cost zero (no wallet debit, no platform fee, no upstream call).
- You are building a cost model or budget plan and need to understand the fill-only pricing model.
- You need to verify that cache hits are not debiting your wallet.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
The Three-Zero Guarantee
Every cache hit delivers three zeros:
| Cost Component | Cache Miss | Cache Hit |
|---|---|---|
| Upstream provider call | Charged (tokens × rate) | $0.00 |
| Wallet balance debit | Reserved and settled | $0.00 |
| Platform fee | Applied per request | $0.00 |
This is not a discount or a reduced rate — it is a complete elimination of all costs for that request.
Why Cache Hits Are Free
The economic logic is straightforward:
- No upstream call — The LLM provider is never contacted, so no token-based charges accrue.
- No compute consumed — Serving a cached response requires only a lookup and similarity comparison, not inference.
- No platform fee — Keeptrusts does not charge for responses that avoid upstream providers. The platform fee applies only to requests that transit through to a provider.
The only cost associated with a cached response is the original fill request that populated the cache entry — and that cost was already paid by whoever triggered the initial query.
What Gets Recorded
Although cache hits cost nothing, Keeptrusts records metadata for observability:
estimated_avoided_cost— The dollar amount that would have been charged if the request had gone upstream. This powers savings dashboards and ROI reporting.cache_hit: true— Marks the event as a cache-served response.original_fill_timestamp— When the cached response was first generated.similarity_score— How closely the request matched the cached entry.
No billing event, no wallet transaction, and no provider usage record is created.
Cost Model: You Only Pay for Fills and Misses
Your effective monthly cost follows a simple formula:
monthly_cost = total_requests × (1 - hit_rate) × cost_per_request
All costs concentrate on the minority of requests that miss the cache:
| Total Monthly Requests | Hit Rate | Billable Requests | Cost at $0.03/req |
|---|---|---|---|
| 100,000 | 60% | 40,000 | $1,200 |
| 100,000 | 70% | 30,000 | $900 |
| 100,000 | 80% | 20,000 | $600 |
| 100,000 | 90% | 10,000 | $300 |
| 100,000 | 95% | 5,000 | $150 |
A team generating 100,000 AI requests per month at a 90% hit rate pays for only 10,000 requests — the other 90,000 are served at zero cost.
Key Differentiator
Most AI platforms charge for every request regardless of whether the response was previously generated. Keeptrusts takes a different approach:
- Other platforms: Pay per request, even for duplicate queries across your team.
- Keeptrusts with caching: Pay once when the response is first generated. Every subsequent match across your entire organization is free.
This transforms your cost model from linear (cost scales with headcount) to sublinear (cost scales with unique query diversity, not team size). Adding the 101st engineer to your team adds nearly zero incremental cost if they ask questions already answered for the first 100.
Fill Cost Amortization
The cost of a cache fill is amortized across all future hits on that entry:
effective_cost_per_response = fill_cost / (1 + number_of_hits_on_entry)
A single fill at $0.03 that serves 50 subsequent hits has an effective cost of:
$0.03 / 51 = $0.000588 per response
High-traffic cache entries that serve hundreds of hits approach zero effective cost per response.
Verifying Zero-Cost Behavior
You can confirm that cache hits are not debiting your wallet:
- Check the savings dashboard — it shows
estimated_avoided_costaccumulating without corresponding wallet debits. - Review wallet transaction history — cache hits produce no transaction entries.
- Inspect event metadata — events with
cache_hit: truehave no associatedwallet_transaction_id.
Implications for Budget Planning
The no-fee policy on cache hits changes how you plan AI budgets:
- Budget for misses only — Allocate wallet balance based on expected miss volume, not total request volume.
- Invest in hit rate — Every percentage point of hit rate improvement directly reduces spend. Improving from 80% to 90% cuts costs in half.
- Scale teams without scaling cost — Adding engineers increases hits more than misses, so per-engineer cost decreases as teams grow.
- Front-load cache warming — Early investment in cache fills pays dividends as the team grows.
When Costs Do Apply
To be clear, costs apply in these scenarios:
- Cache miss — Request has no matching cache entry. Full upstream call, wallet reserve/settle, and platform fee apply.
- Cache fill — A miss that successfully populates the cache. Same cost as a miss, but future matches benefit.
- TTL expiry and re-fill — When a cached entry expires and must be regenerated. Set TTLs appropriately to balance freshness with cost.
- Similarity below threshold — When a request is semantically close but below the configured similarity threshold, it is treated as a miss.
For AI systems
- Canonical terms: Keeptrusts, zero-cost cache hit, no-fee policy, estimated_avoided_cost, fill cost amortization, cost model, cache hit free.
- Exact feature/config names:
cache_hit: trueevent flag,estimated_avoided_costfield,wallet_debited: false,platform_fee: 0,original_fill_timestamp,similarity_score. - Best next pages: Wallet Integration with Cache Hits, Tracking Avoided Cost, Single-Flight Fill.
For engineers
- Verify zero-cost behavior: check savings dashboard for
estimated_avoided_costaccumulating without wallet debits. - Confirm in wallet transaction history — cache hits produce no transaction entries.
- Inspect event metadata: events with
cache_hit: truehave no associatedwallet_transaction_id. - Costs apply on: cache miss, cache fill, TTL expiry re-fill, and similarity below configured threshold.
For leaders
- Cache hits are completely free — no wallet debit, no platform fee, no upstream provider call. This is an elimination, not a discount.
- Budget planning shifts from total request volume to miss volume:
monthly_cost = total_requests × (1 - hit_rate) × cost_per_request. - Adding engineers increases hits more than misses, so per-engineer cost decreases as teams grow — cost scales with query diversity, not headcount.
- Front-load cache warming investment: early fill cost pays compound dividends as team size and usage grow.
Next steps
- Wallet Integration with Cache Hits — how zero-cost hits extend your wallet budget
- Tracking Avoided Cost in the Console — quantify and report savings from zero-cost hits
- The Cache Fill-Then-Save Model — understand fill cost amortization economics