Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Capacity Management & Cost Allocation

AI workloads can generate unpredictable costs. Keeptrusts provides wallet-based budgeting, cost center controls, and spend alerting so platform teams can allocate capacity across teams without surprise bills.

Use this page when

  • You need to set up wallet-based team budgets or cost center controls for AI workloads
  • You are designing a chargeback or showback model for multi-team AI usage
  • You want to configure spend alerting, resource quotas, or PayPal self-service top-ups

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Wallet Architecture

Keeptrusts wallets operate on a reserve-settle model. When the gateway routes an LLM request, it reserves the estimated cost against the effective wallet scope before forwarding upstream. On provider response, the reservation settles to the actual cost.

The wallet cascade evaluates scopes in order:

  1. User wallet — individual contributor budget
  2. Team wallet — shared team allocation
  3. Organization wallet — top-level fallback

If no scope has sufficient balance, a cost ticket is queued and the request is held until balance is replenished or the ticket is denied.

Allocating Team Budgets

Use the wallet API to allocate credits to a team:

curl -X POST https://api.example.com/v1/wallets/allocate \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team_engineering",
"amount": 5000.00,
"currency": "USD",
"note": "Q2 2026 AI budget allocation"
}'

Verify the allocation:

curl https://api.example.com/v1/wallets/balance?team_id=team_engineering \
-H "Authorization: Bearer $ADMIN_TOKEN"

Spend and wallet surfaces

The current Usage and wallet funding surfaces provide a real-time view of:

  • Active reservations — in-flight requests with estimated costs
  • Settled transactions — completed requests with actual costs
  • Wallet balances — remaining budget per scope
  • Burn rate — projected depletion date based on trailing usage

Platform administrators can drill into per-team and per-user breakdowns to identify cost drivers.

Chargeback Models

Keeptrusts supports three chargeback approaches:

Direct Allocation

Each team receives a fixed credit balance. Spend is deducted in real time. When the balance reaches zero, requests are queued or rejected based on policy.

Proportional Sharing

A shared organizational wallet covers all costs. Monthly reports break down usage by team for internal billing. Configure this with:

# policy-config.yaml
cost_policy:
model: proportional
reporting_interval: monthly
shared_wallet: org_default

Tiered Quotas

Teams receive a base allocation with burst capacity at a higher internal rate:

cost_policy:
model: tiered
tiers:
- limit: 1000
rate: 1.0
- limit: 2000
rate: 1.5
- limit: unlimited
rate: 2.0
requires_approval: true

Resource Quotas

Beyond cost controls, configure request-level quotas to prevent a single team from consuming disproportionate gateway capacity:

Quota TypeScopeExample
Requests per minuteTeam500 RPM
Tokens per hourUser100,000 TPH
Concurrent requestsTeam50
Max tokens per requestGlobal8,192

Set quotas through the API:

curl -X PUT https://api.example.com/v1/quotas/team_engineering \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"requests_per_minute": 500,
"tokens_per_hour": 100000,
"concurrent_requests": 50
}'

Spend Alerting

Configure alerts to notify teams before budgets are exhausted:

curl -X POST https://api.example.com/v1/wallets/alerts \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "team_engineering",
"thresholds": [
{ "percent_remaining": 50, "channel": "slack", "target": "#platform-alerts" },
{ "percent_remaining": 20, "channel": "email", "target": "platform-leads@example.com" },
{ "percent_remaining": 5, "channel": "pagerduty", "target": "ai-platform-oncall" }
]
}'

Alerts fire once per threshold crossing per billing period.

PayPal Top-Up Integration

For self-service budget replenishment, Keeptrusts integrates with PayPal checkout:

  1. The console or chat creates a checkout order through a server-side BFF route
  2. The user approves the PayPal order
  3. The BFF captures the order against /v1/payments/capture-order
  4. The webhook at /v1/payments/webhook reconciles the final state

Platform administrators configure PayPal integration through:

# Enable PayPal payments
curl -X PUT https://api.example.com/v1/admin/payments/config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "provider": "paypal", "enabled": true }'

# Configure payment settings
curl -X PUT https://api.example.com/v1/payments/settings \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "min_topup": 10.00, "max_topup": 10000.00 }'

Model Pricing

Accurate cost tracking depends on up-to-date model pricing. Seed pricing data for your environment:

scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKEN

Verify with:

curl https://api.example.com/v1/model-pricing \
-H "Authorization: Bearer $ADMIN_TOKEN"

Custom model pricing can be added for private or fine-tuned models.

Monitoring Cost Metrics

Export cost data for external analysis:

kt events export --format csv --filter 'cost > 0' --output costs.csv

Key metrics to track:

  • Cost per request — average and p99 by model and team
  • Budget utilization — percentage consumed vs. allocated
  • Reservation-to-settlement ratio — estimation accuracy
  • Denied requests — volume blocked due to insufficient balance

Next steps

For AI systems

  • Canonical terms: wallet, reserve-settle, cost center, spend alerting, chargeback, resource quotas, PayPal top-up, model pricing
  • Key API endpoints: POST /v1/wallets/allocate, GET /v1/wallets/balance, PUT /v1/quotas/{team}, POST /v1/wallets/alerts, PUT /v1/admin/payments/config, GET /v1/model-pricing
  • CLI commands: kt events export --filter 'cost > 0', scripts/seed-model-pricing.sh
  • Related pages: Monitoring & Alerting, Multi-Tenant Gateway

For engineers

  • Prerequisites: Running API with KEEPTRUSTS_SECRET_ENCRYPTION_KEY set, admin bearer token, test Postgres
  • Allocate team budgets with POST /v1/wallets/allocate and verify with GET /v1/wallets/balance?team_id=...
  • Seed model pricing via scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKEN before cost tracking works
  • Configure PayPal with PUT /v1/admin/payments/config then PUT /v1/payments/settings
  • Validate: send a request through the gateway and confirm the wallet balance decreases by the settled cost

For leaders

  • Wallet budgets enforce hard spending limits per team — requests are queued or rejected when exhausted
  • Chargeback models (direct allocation, proportional, tiered) map to different organizational cost governance styles
  • Spend alerting with escalating thresholds (Slack → email → PagerDuty) prevents surprise bills
  • PayPal self-service top-up removes platform team as a bottleneck for budget replenishment
  • Model pricing accuracy directly affects cost attribution — seed and maintain pricing data quarterly