Wallet Architecture: How Reserve/Settle Prevents Cost Overruns
When teams talk about AI cost control, they often focus on monthly budgets, provider discounts, or dashboard visibility. Those matter, but they do not answer the core runtime question: what stops an over-budget request from being sent in the first place? In Keeptrusts, the answer is the wallet architecture. The gateway estimates cost before dispatch, reserves against the effective wallet scope, forwards only when that reserve succeeds, and settles to actual cost after the provider responds. That reserve-and-settle loop is what turns a budget from a reporting target into an enforceable control.
Use this page when
- You need to explain how Keeptrusts prevents overspend under concurrent real-world traffic.
- You are designing team or org wallet allocations and want to understand how the cascade behaves.
- You want a technical explanation of why reserve/settle is safer than after-the-fact billing reconciliation.
Primary audience
- Primary: Technical Engineers
- Secondary: Technical Leaders, FinOps owners
The problem
Simple budget accounting breaks down fast when LLM traffic is live.
The first issue is concurrency. Ten requests can arrive at nearly the same time from the same team. If the system only checks spend after the provider responds, each request assumes the remaining budget is still available. By the time the ledger updates, the team has already crossed the limit.
The second issue is estimation drift. A pre-dispatch estimate is necessary because the gateway needs to know whether it can afford to send the request at all. But the estimate is not the final number. Actual prompt tokens, completion tokens, and provider pricing determine the real charge. If the platform cannot reconcile estimate and actual cleanly, wallet balances become untrustworthy.
The third issue is allocation ambiguity. Many organizations need more than one budget scope. An individual might have a small personal allowance, a team might have a larger operating budget, and the organization might maintain a shared backstop wallet. Without a defined cascade, nobody knows which budget is funding the request.
The fourth issue is exception handling. Legitimate work still appears after a team has exhausted its wallet. If the system has no controlled exception path, people bypass governance entirely by creating direct provider keys or using unmanaged apps.
The solution
Keeptrusts solves those problems with a synchronous reserve-and-settle architecture.
Before forwarding any governed request, the gateway computes an estimated cost. It then resolves the effective wallet scope using the documented cascade: user wallet first, team wallet second, organization wallet third. The first scope with enough available balance becomes the funding source for that request.
Once the scope is chosen, the gateway reserves the estimated amount. That reservation is what protects against concurrent overspend. Other requests arriving at the same time see the updated available balance, not a stale number that ignores in-flight traffic.
After the provider responds, Keeptrusts settles the reservation to the actual cost. If the request cost less than the estimate, the surplus is released. If the actual charge is higher, the wallet ledger is adjusted to the real amount. The important point is that settlement uses the actual provider cost, not the guess.
If no eligible wallet has enough balance, the gateway does not forward the request. Instead, it issues a cost ticket and holds the request until the balance is replenished or the ticket is approved. That creates an exception path without turning budget enforcement into a soft suggestion.
Implementation
The reserve-and-settle model starts with an explicit mapping from traffic identity to wallet scope.
pack:
name: wallet-architecture
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-main
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
consumer_groups:
- name: support
api_key: kt_cg_support_prod
wallet_team_id: team_support
- name: product
api_key: kt_cg_product_prod
wallet_team_id: team_product
cost_tracking:
enabled: true
wallet_enforcement: true
budget_alerts:
- threshold_percent: 50
action: notify
- threshold_percent: 80
action: notify
- threshold_percent: 95
action: notify
- threshold_percent: 100
action: block
api:
url: http://localhost:41002
token_env: KEEPTRUSTS_API_TOKEN
That configuration tells the gateway where to charge traffic and when to stop it. The consumer group binds the request stream to a team wallet, and wallet_enforcement: true ensures the request does not leave the gateway when balance is insufficient.
At runtime, operators usually pair that config with wallet checks and allocations through the API or CLI. GET /v1/wallets/balance shows the current cascade summary, while POST /v1/wallets/allocate replenishes or reallocates funds. For teams rolling this out the first time, the most important operational habit is watching both remaining balance and reserve-to-settle accuracy, not just month-end totals.
Results and impact
Reserve and settle prevents a class of cost incidents that dashboards alone cannot stop.
Imagine a support team with a $4,000 monthly wallet and several automations running at the same time. Without a reservation step, bursts of concurrent requests can all assume budget is still available and push the team beyond its cap before the first response returns. With Keeptrusts, the available balance is reduced at reservation time, so the sixth or seventh request may be held before the overspend happens.
It also improves attribution quality. Because the gateway knows which wallet funded the request and later settles to actual cost, finance gets a clean ledger tied to real usage rather than a rough post-hoc allocation model. That matters for chargeback, monthly reallocation, and executive reporting.
The final benefit is behavioral. Teams learn that budgets are real but not arbitrary. If they hit a limit, they see a cost ticket and a defined remediation path rather than silent failure or uncontrolled spend. That makes it easier to scale governed AI adoption without driving people back to unmanaged provider keys.
Key takeaways
- Reserve protects against concurrent overspend because balance is checked and held before dispatch.
- Settle preserves accounting accuracy because the wallet is updated to actual provider cost after the response arrives.
- The wallet cascade supports layered funding models without changing application integrations.
- Cost tickets create a governed exception path when balance runs out.
- Wallet architecture is the difference between budget reporting and budget enforcement.