Capacity Management & Cost Allocation

AI workloads can generate unpredictable costs. Keeptrusts provides wallet-based budgeting, cost center controls, and spend alerting so platform teams can allocate capacity across teams without surprise bills.

Use this page when

You need to set up wallet-based team budgets or cost center controls for AI workloads
You are designing a chargeback or showback model for multi-team AI usage
You want to configure spend alerting, resource quotas, or PayPal self-service top-ups

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Wallet Architecture

Keeptrusts wallets operate on a reserve-settle model. When the gateway routes an LLM request, it reserves the estimated cost against the effective wallet scope before forwarding upstream. On provider response, the reservation settles to the actual cost.

The wallet cascade evaluates scopes in order:

User wallet — individual contributor budget
Team wallet — shared team allocation
Organization wallet — top-level fallback

If no scope has sufficient balance, a cost ticket is queued and the request is held until balance is replenished or the ticket is denied.

Allocating Team Budgets

Use the wallet API to allocate credits to a team:

curl -X POST https://api.example.com/v1/wallets/allocate \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team_engineering",
    "amount": 5000.00,
    "currency": "USD",
    "note": "Q2 2026 AI budget allocation"
  }'

Verify the allocation:

curl https://api.example.com/v1/wallets/balance?team_id=team_engineering \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Spend and wallet surfaces

The current Usage and wallet funding surfaces provide a real-time view of:

Active reservations — in-flight requests with estimated costs
Settled transactions — completed requests with actual costs
Wallet balances — remaining budget per scope
Burn rate — projected depletion date based on trailing usage

Platform administrators can drill into per-team and per-user breakdowns to identify cost drivers.

Chargeback Models

Keeptrusts supports three chargeback approaches:

Direct Allocation

Each team receives a fixed credit balance. Spend is deducted in real time. When the balance reaches zero, requests are queued or rejected based on policy.

A shared organizational wallet covers all costs. Monthly reports break down usage by team for internal billing. Configure this with:

# policy-config.yaml
cost_policy:
  model: proportional
  reporting_interval: monthly
  shared_wallet: org_default

Tiered Quotas

Teams receive a base allocation with burst capacity at a higher internal rate:

cost_policy:
  model: tiered
  tiers:
    - limit: 1000
      rate: 1.0
    - limit: 2000
      rate: 1.5
    - limit: unlimited
      rate: 2.0
      requires_approval: true

Resource Quotas

Beyond cost controls, configure request-level quotas to prevent a single team from consuming disproportionate gateway capacity:

Quota Type	Scope	Example
Requests per minute	Team	500 RPM
Tokens per hour	User	100,000 TPH
Concurrent requests	Team	50
Max tokens per request	Global	8,192

Set quotas through the API:

curl -X PUT https://api.example.com/v1/quotas/team_engineering \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "requests_per_minute": 500,
    "tokens_per_hour": 100000,
    "concurrent_requests": 50
  }'

Spend Alerting

Configure alerts to notify teams before budgets are exhausted:

curl -X POST https://api.example.com/v1/wallets/alerts \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "team_engineering",
    "thresholds": [
      { "percent_remaining": 50, "channel": "slack", "target": "#platform-alerts" },
      { "percent_remaining": 20, "channel": "email", "target": "platform-leads@example.com" },
      { "percent_remaining": 5, "channel": "pagerduty", "target": "ai-platform-oncall" }
    ]
  }'

Alerts fire once per threshold crossing per billing period.

PayPal Top-Up Integration

For self-service budget replenishment, Keeptrusts integrates with PayPal checkout:

The console or chat creates a checkout order through a server-side BFF route
The user approves the PayPal order
The BFF captures the order against /v1/payments/capture-order
The webhook at /v1/payments/webhook reconciles the final state

Platform administrators configure PayPal integration through:

# Enable PayPal payments
curl -X PUT https://api.example.com/v1/admin/payments/config \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "provider": "paypal", "enabled": true }'

# Configure payment settings
curl -X PUT https://api.example.com/v1/payments/settings \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "min_topup": 10.00, "max_topup": 10000.00 }'

Model Pricing

Accurate cost tracking depends on up-to-date model pricing. Seed pricing data for your environment:

scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKEN

Verify with:

curl https://api.example.com/v1/model-pricing \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Custom model pricing can be added for private or fine-tuned models.

Monitoring Cost Metrics

Export cost data for external analysis:

kt events export --format csv --filter 'cost > 0' --output costs.csv

Key metrics to track:

Cost per request — average and p99 by model and team
Budget utilization — percentage consumed vs. allocated
Reservation-to-settlement ratio — estimation accuracy
Denied requests — volume blocked due to insufficient balance

Next steps

Set up Monitoring & Alerting for cost metric dashboards
Configure Secret Management for provider API keys
Review Multi-Tenant Gateway for team isolation patterns

For AI systems

Canonical terms: wallet, reserve-settle, cost center, spend alerting, chargeback, resource quotas, PayPal top-up, model pricing
Key API endpoints: POST /v1/wallets/allocate, GET /v1/wallets/balance, PUT /v1/quotas/{team}, POST /v1/wallets/alerts, PUT /v1/admin/payments/config, GET /v1/model-pricing
CLI commands: kt events export --filter 'cost > 0', scripts/seed-model-pricing.sh
Related pages: Monitoring & Alerting, Multi-Tenant Gateway

For engineers

Prerequisites: Running API with KEEPTRUSTS_SECRET_ENCRYPTION_KEY set, admin bearer token, test Postgres
Allocate team budgets with POST /v1/wallets/allocate and verify with GET /v1/wallets/balance?team_id=...
Seed model pricing via scripts/seed-model-pricing.sh $API_URL $ADMIN_TOKEN before cost tracking works
Configure PayPal with PUT /v1/admin/payments/config then PUT /v1/payments/settings
Validate: send a request through the gateway and confirm the wallet balance decreases by the settled cost

For leaders

Wallet budgets enforce hard spending limits per team — requests are queued or rejected when exhausted
Chargeback models (direct allocation, proportional, tiered) map to different organizational cost governance styles
Spend alerting with escalating thresholds (Slack → email → PagerDuty) prevents surprise bills
PayPal self-service top-up removes platform team as a bottleneck for budget replenishment
Model pricing accuracy directly affects cost attribution — seed and maintain pricing data quarterly

Use this page when​

Primary audience​

Wallet Architecture​

Allocating Team Budgets​

Spend and wallet surfaces​

Chargeback Models​

Direct Allocation​

Proportional Sharing​

Tiered Quotas​

Resource Quotas​

Spend Alerting​

PayPal Top-Up Integration​

Model Pricing​

Monitoring Cost Metrics​

Next steps​

For AI systems​

For engineers​

For leaders​