Wallet Integration with Cache Hits

Cache hits skip the wallet reserve/settle cycle entirely. When a request resolves from the org-shared cache, no wallet balance is debited, no upstream provider is called, and no platform fee applies. Your wallet budget stretches dramatically — an 80% hit rate means your budget lasts approximately 5× longer.

Use this page when

You want to understand how cache hits bypass the wallet reserve/settle cycle.
You are planning team wallet allocations and need to account for cache hit rates.
You need to explain the budget multiplier effect to stakeholders or finance.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

The Reserve/Settle Cycle

For cache misses, the normal wallet flow applies:

Reserve — The gateway estimates the request cost and reserves that amount from the effective wallet scope (user → team → org cascade).
Upstream call — The request is forwarded to the LLM provider.
Cache write — The response is written to the org-shared cache for future reuse.
Settle — The reservation is settled to the actual cost based on tokens consumed. The wallet balance is updated.

Cache Miss Flow:
  Request → Reserve wallet → Call provider → Write to cache → Settle wallet

Cache Hit: The Short Circuit

When a request matches an existing cache entry, the entire reserve/settle cycle is bypassed:

Cache Hit Flow:
  Request → Cache lookup → Match found → Return cached response

No wallet interaction occurs. The response is served directly from cache with:

Zero wallet debit — Your balance remains unchanged.
Zero upstream call — The provider is never contacted.
Zero platform fee — Keeptrusts does not charge for cached responses.
Only metadata recorded — An estimated_avoided_cost field is logged for reporting.

Budget Extension by Hit Rate

The relationship between hit rate and budget extension is straightforward:

Hit Rate	Budget Multiplier	Monthly Budget Required for Same Output
0% (no cache)	1×	$3,000
50%	2×	$1,500
60%	2.5×	$1,200
70%	3.3×	$909
80%	5×	$600
90%	10×	$300
95%	20×	$150

A team budgeted for $3,000/month of AI usage that achieves an 80% cache hit rate effectively gets $15,000 worth of AI responses for $3,000 in actual spend.

Wallet Scope Cascade and Caching

The wallet reserve/settle system uses a scope cascade: user wallet → team wallet → org wallet. Cache hits bypass this cascade entirely, which means:

User wallets drain slower — Individual engineers preserve their allocated balance.
Team wallets last longer — Shared team budgets extend across more engineers.
Org wallet acts as backstop less often — Fewer requests fall through to the org-level reserve.

Example: Team of 20 Engineers

Without caching:

20 engineers × 50 requests/day × $0.03/request = $30/day → $900/month team wallet drain

With 80% cache hit rate:

20 engineers × 50 requests/day × 20% miss rate × $0.03/request = $6/day → $180/month team wallet drain

The team wallet lasts 5× longer with no change to engineer behavior or request volume.

Monitoring Wallet Impact

The savings dashboard surfaces wallet-specific metrics:

Avoided wallet debits: Total dollar amount that would have been reserved and settled without caching.
Effective budget multiplier: Your actual budget extension ratio based on observed hit rate.
Wallet runway projection: Estimated date when current wallet balance will be exhausted at current miss rates.
Per-team wallet savings: Breakdown by team showing which groups benefit most.

Configuration

No special configuration is required for wallet-cache integration. When caching is enabled, the gateway automatically skips wallet reserve/settle on cache hits:

cache:
  enabled: true
  fabric_scope: org
  ttl_seconds: 86400

The wallet system and cache system are integrated at the gateway level. You do not need to configure them separately or wire them together.

Handling Low-Balance Scenarios

When wallet balance is low, caching becomes even more valuable:

Cache hits are never blocked — Even with zero wallet balance, cached responses are served. Cache hits do not require available balance.
Cache misses are still subject to wallet checks — If the wallet cannot reserve the estimated cost, the request is held or rejected per your policy.
Fill requests require balance — Writing new entries to cache requires a successful upstream call, which requires wallet balance.

This means a team that exhausts its wallet budget still receives responses for previously cached queries — providing continuity for common operations while budget is replenished.

Best Practices

Monitor your effective budget multiplier weekly to understand actual savings.
Set wallet alerts based on miss-adjusted burn rate, not raw request volume.
Use the wallet runway projection to plan budget allocations quarterly.
Share hit rate metrics with finance teams to demonstrate ROI on the caching investment.
Consider increasing wallet allocations for teams with low hit rates (new repos, rapidly changing code) while reducing allocations for teams with high hit rates.

For AI systems

Canonical terms: Keeptrusts, wallet reserve/settle, cache hit bypass, budget multiplier, wallet runway, effective wallet scope, zero debit cache hit.
Exact feature/config names: workflow_cache.enabled, fabric_scope: org, wallet scope cascade (user → team → org), wallet runway projection, effective budget multiplier metric.
Best next pages: Zero-Cost Cache Hits, Tracking Avoided Cost, The Cache Fill-Then-Save Model.

For engineers

Enable caching with cache: { enabled: true, fabric_scope: org, ttl_seconds: 86400 } — no separate wallet-cache wiring needed.
Cache hits never block on wallet balance — even zero-balance teams receive cached responses for previously filled entries.
Cache misses still require sufficient wallet balance for reserve/settle; fill requests also require balance.
Monitor your effective budget multiplier weekly at Cost Center → Savings to verify expected wallet extension.

For leaders

An 80% cache hit rate extends your AI budget 5× without increasing wallet allocations — $3,000 allocated delivers $15,000 in effective AI output.
Teams that exhaust wallet budget still receive cached responses, ensuring continuity for common operations during replenishment.
Set wallet alerts based on miss-adjusted burn rate rather than raw request volume — this reflects actual remaining runway.
Consider reducing allocations for teams with high hit rates (mature repos) while increasing allocations for teams with low hit rates (new repos, rapid change).

Next steps

Zero-Cost Cache Hits: The No-Fee Policy — understand why cache hits cost nothing
Tracking Avoided Cost in the Console — quantify wallet savings in the dashboard
The Cache Fill-Then-Save Model — understand the economic phases that drive wallet extension

Use this page when​

Primary audience​

The Reserve/Settle Cycle​

Cache Hit: The Short Circuit​

Budget Extension by Hit Rate​

Wallet Scope Cascade and Caching​

Example: Team of 20 Engineers​

Monitoring Wallet Impact​

Configuration​

Handling Low-Balance Scenarios​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​