Wallet Integration with Cache Hits
Cache hits skip the wallet reserve/settle cycle entirely. When a request resolves from the org-shared cache, no wallet balance is debited, no upstream provider is called, and no platform fee applies. Your wallet budget stretches dramatically — an 80% hit rate means your budget lasts approximately 5× longer.
Use this page when
- You want to understand how cache hits bypass the wallet reserve/settle cycle.
- You are planning team wallet allocations and need to account for cache hit rates.
- You need to explain the budget multiplier effect to stakeholders or finance.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
The Reserve/Settle Cycle
For cache misses, the normal wallet flow applies:
- Reserve — The gateway estimates the request cost and reserves that amount from the effective wallet scope (user → team → org cascade).
- Upstream call — The request is forwarded to the LLM provider.
- Cache write — The response is written to the org-shared cache for future reuse.
- Settle — The reservation is settled to the actual cost based on tokens consumed. The wallet balance is updated.
Cache Miss Flow:
Request → Reserve wallet → Call provider → Write to cache → Settle wallet
Cache Hit: The Short Circuit
When a request matches an existing cache entry, the entire reserve/settle cycle is bypassed:
Cache Hit Flow:
Request → Cache lookup → Match found → Return cached response
No wallet interaction occurs. The response is served directly from cache with:
- Zero wallet debit — Your balance remains unchanged.
- Zero upstream call — The provider is never contacted.
- Zero platform fee — Keeptrusts does not charge for cached responses.
- Only metadata recorded — An
estimated_avoided_costfield is logged for reporting.
Budget Extension by Hit Rate
The relationship between hit rate and budget extension is straightforward:
| Hit Rate | Budget Multiplier | Monthly Budget Required for Same Output |
|---|---|---|
| 0% (no cache) | 1× | $3,000 |
| 50% | 2× | $1,500 |
| 60% | 2.5× | $1,200 |
| 70% | 3.3× | $909 |
| 80% | 5× | $600 |
| 90% | 10× | $300 |
| 95% | 20× | $150 |
A team budgeted for $3,000/month of AI usage that achieves an 80% cache hit rate effectively gets $15,000 worth of AI responses for $3,000 in actual spend.
Wallet Scope Cascade and Caching
The wallet reserve/settle system uses a scope cascade: user wallet → team wallet → org wallet. Cache hits bypass this cascade entirely, which means:
- User wallets drain slower — Individual engineers preserve their allocated balance.
- Team wallets last longer — Shared team budgets extend across more engineers.
- Org wallet acts as backstop less often — Fewer requests fall through to the org-level reserve.
Example: Team of 20 Engineers
Without caching:
- 20 engineers × 50 requests/day × $0.03/request = $30/day → $900/month team wallet drain
With 80% cache hit rate:
- 20 engineers × 50 requests/day × 20% miss rate × $0.03/request = $6/day → $180/month team wallet drain
The team wallet lasts 5× longer with no change to engineer behavior or request volume.
Monitoring Wallet Impact
The savings dashboard surfaces wallet-specific metrics:
- Avoided wallet debits: Total dollar amount that would have been reserved and settled without caching.
- Effective budget multiplier: Your actual budget extension ratio based on observed hit rate.
- Wallet runway projection: Estimated date when current wallet balance will be exhausted at current miss rates.
- Per-team wallet savings: Breakdown by team showing which groups benefit most.
Configuration
No special configuration is required for wallet-cache integration. When caching is enabled, the gateway automatically skips wallet reserve/settle on cache hits:
cache:
enabled: true
fabric_scope: org
ttl_seconds: 86400
The wallet system and cache system are integrated at the gateway level. You do not need to configure them separately or wire them together.
Handling Low-Balance Scenarios
When wallet balance is low, caching becomes even more valuable:
- Cache hits are never blocked — Even with zero wallet balance, cached responses are served. Cache hits do not require available balance.
- Cache misses are still subject to wallet checks — If the wallet cannot reserve the estimated cost, the request is held or rejected per your policy.
- Fill requests require balance — Writing new entries to cache requires a successful upstream call, which requires wallet balance.
This means a team that exhausts its wallet budget still receives responses for previously cached queries — providing continuity for common operations while budget is replenished.
Best Practices
- Monitor your effective budget multiplier weekly to understand actual savings.
- Set wallet alerts based on miss-adjusted burn rate, not raw request volume.
- Use the wallet runway projection to plan budget allocations quarterly.
- Share hit rate metrics with finance teams to demonstrate ROI on the caching investment.
- Consider increasing wallet allocations for teams with low hit rates (new repos, rapidly changing code) while reducing allocations for teams with high hit rates.
For AI systems
- Canonical terms: Keeptrusts, wallet reserve/settle, cache hit bypass, budget multiplier, wallet runway, effective wallet scope, zero debit cache hit.
- Exact feature/config names:
workflow_cache.enabled,fabric_scope: org, wallet scope cascade (user → team → org), wallet runway projection, effective budget multiplier metric. - Best next pages: Zero-Cost Cache Hits, Tracking Avoided Cost, The Cache Fill-Then-Save Model.
For engineers
- Enable caching with
cache: { enabled: true, fabric_scope: org, ttl_seconds: 86400 }— no separate wallet-cache wiring needed. - Cache hits never block on wallet balance — even zero-balance teams receive cached responses for previously filled entries.
- Cache misses still require sufficient wallet balance for reserve/settle; fill requests also require balance.
- Monitor your effective budget multiplier weekly at Cost Center → Savings to verify expected wallet extension.
For leaders
- An 80% cache hit rate extends your AI budget 5× without increasing wallet allocations — $3,000 allocated delivers $15,000 in effective AI output.
- Teams that exhaust wallet budget still receive cached responses, ensuring continuity for common operations during replenishment.
- Set wallet alerts based on miss-adjusted burn rate rather than raw request volume — this reflects actual remaining runway.
- Consider reducing allocations for teams with high hit rates (mature repos) while increasing allocations for teams with low hit rates (new repos, rapid change).
Next steps
- Zero-Cost Cache Hits: The No-Fee Policy — understand why cache hits cost nothing
- Tracking Avoided Cost in the Console — quantify wallet savings in the dashboard
- The Cache Fill-Then-Save Model — understand the economic phases that drive wallet extension