Capacity Management & Cost Allocation
AI workloads can generate unpredictable costs. Keeptrusts provides wallet-based budgeting, cost center controls, and spend alerting so platform teams can allocate capacity across teams without surprise bills.
Use this page when
- You need to set up wallet-based team budgets or cost center controls for AI workloads
- You are designing a chargeback or showback model for multi-team AI usage
- You want to configure spend alerting, resource quotas, or PayPal self-service top-ups
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Wallet Architecture
Keeptrusts wallets operate on a reserve-settle model. When the gateway routes an LLM request, it reserves the estimated cost against the effective wallet scope before forwarding upstream. On provider response, the reservation settles to the actual cost.
The wallet cascade evaluates scopes in order:
- User wallet — individual contributor budget
- Team wallet — shared team allocation
- Organization wallet — top-level fallback
If no scope has sufficient balance, a cost ticket is queued and the request is held until balance is replenished or the ticket is denied.
Allocating Team Budgets
Use the wallet API to allocate credits to a team:
curl -X POST https://api.example.com/v1/wallets/allocate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team_engineering",
"amount": 5000.00,
"currency": "USD",
"note": "Q2 2026 AI budget allocation"
}'
Verify the allocation:
curl https://api.example.com/v1/wallets/balance?team_id=team_engineering \
-H "Authorization: Bearer $ADMIN_TOKEN"
Spend and wallet surfaces
The current Usage and wallet funding surfaces provide a real-time view of:
- Active reservations — in-flight requests with estimated costs
- Settled transactions — completed requests with actual costs
- Wallet balances — remaining budget per scope
- Burn rate — projected depletion date based on trailing usage
Platform administrators can drill into per-team and per-user breakdowns to identify cost drivers.
Chargeback Models
Keeptrusts supports three chargeback approaches:
Direct Allocation
Each team receives a fixed credit balance. Spend is deducted in real time. When the balance reaches zero, requests are queued or rejected based on policy.
Proportional Sharing
A shared organizational wallet covers all costs. Monthly reports break down usage by team for internal billing. Configure this with:
# policy-config.yaml
cost_policy:
model: proportional
reporting_interval: monthly
shared_wallet: org_default
Tiered Quotas
Teams receive a base allocation with burst capacity at a higher internal rate:
cost_policy:
model: tiered
tiers:
- limit: 1000
rate: 1.0
- limit: 2000
rate: 1.5
- limit: unlimited
rate: 2.0
requires_approval: true
Resource Quotas
Beyond cost controls, configure request-level quotas to prevent a single team from consuming disproportionate gateway capacity:
| Quota Type | Scope | Example |
|---|---|---|
| Requests per minute | Team | 500 RPM |
| Tokens per hour | User | 100,000 TPH |
| Concurrent requests | Team | 50 |
| Max tokens per request | Global | 8,192 |
Set quotas through the API:
curl -X PUT https://api.example.com/v1/quotas/team_engineering \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"requests_per_minute": 500,
"tokens_per_hour": 100000,
"concurrent_requests": 50
}'
Spend Alerting
Configure alerts to notify teams before budgets are exhausted:
curl -X POST https://api.example.com/v1/wallets/alerts \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team_engineering",
"thresholds": [
{ "percent_remaining": 50, "channel": "slack", "target": "#platform-alerts" },
{ "percent_remaining": 20, "channel": "email", "target": "platform-leads@example.com" },
{ "percent_remaining": 5, "channel": "pagerduty", "target": "ai-platform-oncall" }
]
}'
Alerts fire once per threshold crossing per billing period.
PayPal Top-Up Integration
For self-service budget replenishment, Keeptrusts integrates with PayPal checkout:
- The console or chat creates a checkout order through a server-side BFF route
- The user approves the PayPal order
- The BFF captures the order against
/v1/payments/capture-order - The webhook at
/v1/payments/webhookreconciles the final state
Platform administrators configure PayPal integration through:
# Enable PayPal payments
curl -X PUT https://api.example.com/v1/admin/payments/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "provider": "paypal", "enabled": true }'
# Configure payment settings
curl -X PUT https://api.example.com/v1/payments/settings \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "min_topup": 10.00, "max_topup": 10000.00 }'
Model Pricing
Accurate cost tracking depends on up-to-date model pricing. Seed pricing data for your environment:
scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKEN
Verify with:
curl https://api.example.com/v1/model-pricing \
-H "Authorization: Bearer $ADMIN_TOKEN"
Custom model pricing can be added for private or fine-tuned models.
Monitoring Cost Metrics
Export cost data for external analysis:
kt events export --format csv --filter 'cost > 0' --output costs.csv
Key metrics to track:
- Cost per request — average and p99 by model and team
- Budget utilization — percentage consumed vs. allocated
- Reservation-to-settlement ratio — estimation accuracy
- Denied requests — volume blocked due to insufficient balance
Next steps
- Set up Monitoring & Alerting for cost metric dashboards
- Configure Secret Management for provider API keys
- Review Multi-Tenant Gateway for team isolation patterns
For AI systems
- Canonical terms: wallet, reserve-settle, cost center, spend alerting, chargeback, resource quotas, PayPal top-up, model pricing
- Key API endpoints:
POST /v1/wallets/allocate,GET /v1/wallets/balance,PUT /v1/quotas/{team},POST /v1/wallets/alerts,PUT /v1/admin/payments/config,GET /v1/model-pricing - CLI commands:
kt events export --filter 'cost > 0',scripts/seed-model-pricing.sh - Related pages: Monitoring & Alerting, Multi-Tenant Gateway
For engineers
- Prerequisites: Running API with
KEEPTRUSTS_SECRET_ENCRYPTION_KEYset, admin bearer token, test Postgres - Allocate team budgets with
POST /v1/wallets/allocateand verify withGET /v1/wallets/balance?team_id=... - Seed model pricing via
scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKENbefore cost tracking works - Configure PayPal with
PUT /v1/admin/payments/configthenPUT /v1/payments/settings - Validate: send a request through the gateway and confirm the wallet balance decreases by the settled cost
For leaders
- Wallet budgets enforce hard spending limits per team — requests are queued or rejected when exhausted
- Chargeback models (direct allocation, proportional, tiered) map to different organizational cost governance styles
- Spend alerting with escalating thresholds (Slack → email → PagerDuty) prevents surprise bills
- PayPal self-service top-up removes platform team as a bottleneck for budget replenishment
- Model pricing accuracy directly affects cost attribution — seed and maintain pricing data quarterly