Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Wallet Integration with Cache Hits

Cache hits skip the wallet reserve/settle cycle entirely. When a request resolves from the org-shared cache, no wallet balance is debited, no upstream provider is called, and no platform fee applies. Your wallet budget stretches dramatically — an 80% hit rate means your budget lasts approximately 5× longer.

Use this page when

  • You want to understand how cache hits bypass the wallet reserve/settle cycle.
  • You are planning team wallet allocations and need to account for cache hit rates.
  • You need to explain the budget multiplier effect to stakeholders or finance.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The Reserve/Settle Cycle

For cache misses, the normal wallet flow applies:

  1. Reserve — The gateway estimates the request cost and reserves that amount from the effective wallet scope (user → team → org cascade).
  2. Upstream call — The request is forwarded to the LLM provider.
  3. Cache write — The response is written to the org-shared cache for future reuse.
  4. Settle — The reservation is settled to the actual cost based on tokens consumed. The wallet balance is updated.
Cache Miss Flow:
Request → Reserve wallet → Call provider → Write to cache → Settle wallet

Cache Hit: The Short Circuit

When a request matches an existing cache entry, the entire reserve/settle cycle is bypassed:

Cache Hit Flow:
Request → Cache lookup → Match found → Return cached response

No wallet interaction occurs. The response is served directly from cache with:

  • Zero wallet debit — Your balance remains unchanged.
  • Zero upstream call — The provider is never contacted.
  • Zero platform fee — Keeptrusts does not charge for cached responses.
  • Only metadata recorded — An estimated_avoided_cost field is logged for reporting.

Budget Extension by Hit Rate

The relationship between hit rate and budget extension is straightforward:

Hit RateBudget MultiplierMonthly Budget Required for Same Output
0% (no cache)$3,000
50%$1,500
60%2.5×$1,200
70%3.3×$909
80%$600
90%10×$300
95%20×$150

A team budgeted for $3,000/month of AI usage that achieves an 80% cache hit rate effectively gets $15,000 worth of AI responses for $3,000 in actual spend.

Wallet Scope Cascade and Caching

The wallet reserve/settle system uses a scope cascade: user wallet → team wallet → org wallet. Cache hits bypass this cascade entirely, which means:

  • User wallets drain slower — Individual engineers preserve their allocated balance.
  • Team wallets last longer — Shared team budgets extend across more engineers.
  • Org wallet acts as backstop less often — Fewer requests fall through to the org-level reserve.

Example: Team of 20 Engineers

Without caching:

  • 20 engineers × 50 requests/day × $0.03/request = $30/day → $900/month team wallet drain

With 80% cache hit rate:

  • 20 engineers × 50 requests/day × 20% miss rate × $0.03/request = $6/day → $180/month team wallet drain

The team wallet lasts 5× longer with no change to engineer behavior or request volume.

Monitoring Wallet Impact

The savings dashboard surfaces wallet-specific metrics:

  • Avoided wallet debits: Total dollar amount that would have been reserved and settled without caching.
  • Effective budget multiplier: Your actual budget extension ratio based on observed hit rate.
  • Wallet runway projection: Estimated date when current wallet balance will be exhausted at current miss rates.
  • Per-team wallet savings: Breakdown by team showing which groups benefit most.

Configuration

No special configuration is required for wallet-cache integration. When caching is enabled, the gateway automatically skips wallet reserve/settle on cache hits:

cache:
enabled: true
fabric_scope: org
ttl_seconds: 86400

The wallet system and cache system are integrated at the gateway level. You do not need to configure them separately or wire them together.

Handling Low-Balance Scenarios

When wallet balance is low, caching becomes even more valuable:

  • Cache hits are never blocked — Even with zero wallet balance, cached responses are served. Cache hits do not require available balance.
  • Cache misses are still subject to wallet checks — If the wallet cannot reserve the estimated cost, the request is held or rejected per your policy.
  • Fill requests require balance — Writing new entries to cache requires a successful upstream call, which requires wallet balance.

This means a team that exhausts its wallet budget still receives responses for previously cached queries — providing continuity for common operations while budget is replenished.

Best Practices

  • Monitor your effective budget multiplier weekly to understand actual savings.
  • Set wallet alerts based on miss-adjusted burn rate, not raw request volume.
  • Use the wallet runway projection to plan budget allocations quarterly.
  • Share hit rate metrics with finance teams to demonstrate ROI on the caching investment.
  • Consider increasing wallet allocations for teams with low hit rates (new repos, rapidly changing code) while reducing allocations for teams with high hit rates.

For AI systems

  • Canonical terms: Keeptrusts, wallet reserve/settle, cache hit bypass, budget multiplier, wallet runway, effective wallet scope, zero debit cache hit.
  • Exact feature/config names: workflow_cache.enabled, fabric_scope: org, wallet scope cascade (user → team → org), wallet runway projection, effective budget multiplier metric.
  • Best next pages: Zero-Cost Cache Hits, Tracking Avoided Cost, The Cache Fill-Then-Save Model.

For engineers

  • Enable caching with cache: { enabled: true, fabric_scope: org, ttl_seconds: 86400 } — no separate wallet-cache wiring needed.
  • Cache hits never block on wallet balance — even zero-balance teams receive cached responses for previously filled entries.
  • Cache misses still require sufficient wallet balance for reserve/settle; fill requests also require balance.
  • Monitor your effective budget multiplier weekly at Cost Center → Savings to verify expected wallet extension.

For leaders

  • An 80% cache hit rate extends your AI budget 5× without increasing wallet allocations — $3,000 allocated delivers $15,000 in effective AI output.
  • Teams that exhaust wallet budget still receive cached responses, ensuring continuity for common operations during replenishment.
  • Set wallet alerts based on miss-adjusted burn rate rather than raw request volume — this reflects actual remaining runway.
  • Consider reducing allocations for teams with high hit rates (mature repos) while increasing allocations for teams with low hit rates (new repos, rapid change).

Next steps