Skip to main content
Browse docs

Cost and Spend

Keeptrusts automatically tracks token usage and cost for every request that passes through the gateway. The Spend surface and associated settings give you end-to-end visibility into where money is going and controls to cap it before it becomes a problem.

Use this page when

  • You need to configure per-request cost tracking for your gateway targets.
  • You want to understand how the wallet reserve/settle mechanism controls AI spend.
  • You are setting up pricing blocks in policy-config.yaml for accurate cost attribution.
  • You need to use max_price routing to prevent expensive requests from reaching high-cost providers.
  • You want to understand how cost tickets (HTTP 402) work when wallet balance is insufficient.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Workflow map

How spend tracking works

When the gateway receives a response from an upstream provider, it reads the usage object in the response body (e.g., prompt_tokens, completion_tokens, cached_tokens). It then computes a cost breakdown using whichever source of pricing is available:

  1. Provider-supplied cost fields — if the upstream response includes cost fields directly, those values are used as-is.
  2. Declarative config pricing — if your policy-config.yaml declares pricing for the provider target (providers.targets[].pricing), those rates are applied to the token counts.
  3. No pricing declared — if neither an upstream cost field nor a declarative pricing block is available, token counts are still tracked but all cost fields in the spend log are recorded as zero.

After computing the cost, the gateway emits a spend log to POST /v1/spend/log in a fire-and-forget manner so the gateway latency is not affected.

Pricing recommendation

Explicit pricing blocks on your targets are preferred because they reflect your exact contractual rates. Without them, token counts are tracked but cost fields will be zero.

Declaring pricing in the declarative config

The gateway computes per-request costs from the pricing block on each providers.targets[] entry in your policy-config.yaml. Without this block, token counts are tracked but all cost fields in spend logs are zero.

Pricing fields

All rates are USD per 1 million tokens.

FieldDescription
input_price_per_millionCost per 1M input (prompt) tokens
cached_input_price_per_millionCost per 1M cached input tokens — typically lower than the full input rate
output_price_per_millionCost per 1M output (completion) tokens
input_multiplierScale factor applied to input token count before billing (default 1.0)
cached_input_multiplierScale factor for cached input tokens (default 1.0)
output_multiplierScale factor for output tokens (default 1.0)

Legacy field names (prompt / completion) are also accepted and map to input_price_per_million and output_price_per_million respectively. Prefer the canonical names for new configs.

Minimal example

pack:
name: cost-and-spend-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-gpt4o
provider: openai
model: gpt-4o
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Multipliers

Use multipliers when a provider bills at a non-standard token ratio. For example, a provider that bills audio at 4× the standard rate:

pricing:
input_price_per_million: 0.006
output_price_per_million: 0.024
input_multiplier: 4.0 # 1 audio token billed as 4 text tokens

The gateway computes cost as:

cost = (tokens * multiplier / 1,000,000) * rate_per_million

Per-model pricing

When a single provider target serves multiple models, declare per-model pricing in the models array. The gateway matches the requested model against model_id and aliases, then uses that model's pricing block (falling back to the target-level pricing if no match is found).

pack:
name: cost-and-spend-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider: openai
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Cost-based routing with max_price

Once pricing is declared on at least one target, you can add cost ceilings to the routing section. The gateway evaluates the estimated prompt cost using the token count of the incoming request and skips any provider whose projected cost would exceed the ceiling.

providers:
routing:
strategy: ordered
allow_fallbacks: true
max_price:
prompt: 0.01
completion: 0.04
request: 0.05
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
FieldWhat it caps
max_price.promptPrompt (input) token cost alone
max_price.completionCompletion (output) token cost alone
max_price.requestTotal per-request cost
Set max_price.request alongside a budget to get two-layer protection: per-request routing skips expensive providers before the request is made, and a budget hard-stops spend at the aggregate level.
Policy linter

If a max_price rule is present but no target in providers.targets has a pricing block, kt policy lint emits a warning. Cost ceilings are silently ignored for un-priced targets on the routing path.


Spend page

Navigate to Spend in the console sidebar to see a real-time breakdown of your AI usage costs.

Summary cards

The top of the page shows three summary cards for the selected date range:

CardWhat it shows
Total costSum of all total_cost values in the selected window
Total tokensSum of all total_tokens values in the selected window
Top providerThe provider with the highest total cost

View modes

Use the toggle in the filter row to switch between:

  • Summary — cost broken down by provider, with request count and total tokens per row.
  • Logs — a paginated table of individual spend log entries, each showing provider, model, pricing source, token counts, cost, and timestamp.

Filters

FilterDescription
ProviderFilter by provider name (e.g. openai, anthropic).
From / ToDate range using ISO 8601 date strings.

Pagination is available in the Logs view with Previous and Next controls. The default page size is 50 entries, up to 200 per page via the API.


Cost attribution

Spend logs capture multiple attribution dimensions in a single record:

FieldDescription
providerProvider name (e.g. openai)
modelActual model used
requested_modelModel name the client originally requested
requested_providerProvider name the client originally requested
key_idGateway key used, if any
user_idUser identity, if passed via X-User-Id
team_idTeam identity, if passed via X-Team-Id
provider_target_idDeclarative config target ID
pricing_sourceupstream or config_declared
metadataArbitrary key-value pairs from the request

Filtering on GET /v1/spend/logs supports any combination of key_id, user_id, team_id, provider, and a date range.


Wallet & Credits

Keeptrusts includes a credit wallet system that works alongside spend tracking to give your organization prepaid financial control over every LLM request.

How balance is checked before each request

When the gateway receives an LLM request, it makes a synchronous reserve call to the control-plane API (POST /v1/gateway/wallets/reserve) before forwarding the request to the upstream provider. The reserve:

  1. Estimates the request cost using the model_pricing catalog (input + max-completion tokens × per-token rates, converted to the org's currency at the current exchange rate, plus a configurable buffer percentage).
  2. Walks the wallet cascade bottom-up: user wallet → team wallet → organization wallet.
  3. Holds the estimated amount in the first wallet with sufficient balance and returns a reservation_id.
  4. On upstream response, settles the reservation to the actual cost (releasing any surplus from cached tokens or shorter completions back to the wallet).

This synchronous reserve keeps the wallet balance authoritative at all times — there is no gateway-side credit cache. The ~5–15 ms added latency is negligible relative to typical provider response times (500 ms–5 s).

Relationship to spend logs

The wallet system and spend logs are parallel records. Every settled wallet transaction is also recorded in spend_logs (with pricing_source: wallet_catalog). Wallet debits are the financial source of truth; spend logs are the audit record.

Cost ticket flow

When all wallets in the cascade are insufficient to cover the estimated cost, the gateway:

  1. Returns HTTP 402 with a cost_ticket payload in the response body.
  2. The cost ticket captures the estimated cost, provider, model, a SHA-256 hash of the original request body, and a 24-hour expiry.
  3. The chat client or API caller can display a "top up needed" message to the user.
  4. After the user or admin replenishes balance, the caller resends the original request with the ticket ID in the X-Cost-Ticket header.
  5. The gateway calls POST /v1/gateway/wallets/redeem-ticket, which skips re-estimation and reserves using the ticket's frozen cost. The request is then forwarded to the upstream provider as normal.

Tickets expire after 24 hours. If a ticket is expired or the request body hash does not match, the gateway issues a fresh estimate.

Fail mode when the wallet API is unreachable

If the wallet control-plane API is unreachable, the gateway behaves according to the org's wallet_fail_mode:

  • closed (default) — the request is rejected immediately. This is the safe default for organizations where every request must be financially authorized.
  • open — the request is forwarded without a reserve. Use with caution; it allows spend to accrue even if the API is temporarily down.

Org admins configure wallet_fail_mode from Settings → Wallets in the console.

Viewing balance in the chat sidebar

When accessing an AI assistant through a Keeptrusts-managed gateway, the current wallet balance for your scope is visible in the chat sidebar under Credits. It shows available balance, in-flight reservations, and the org currency. A warning banner appears when your balance drops near the configured alert threshold.

See Wallets and Credits for the full guide to wallet scopes, allocation, top-up, and agent usage constraints.


For AI systems

  • Canonical terms: Keeptrusts Cost and Spend, spend tracking, pricing block, input_price_per_million, output_price_per_million, cached_input_price_per_million, multipliers, max_price routing, wallet reserve/settle, cost ticket, spend logs, cost attribution.
  • Console surface: Spend page (summary cards + logs view with provider/date filters).
  • Config fields: providers.targets[].pricing, providers.routing.max_price, providers.targets[].models[].pricing.
  • API endpoints: POST /v1/spend/log (gateway emits), GET /v1/spend/logs (query with filters), POST /v1/gateway/wallets/reserve, POST /v1/gateway/wallets/redeem-ticket.
  • Wallet cascade: user wallet → team wallet → organization wallet (first with sufficient balance wins).
  • Related pages: Wallets and Credits, Declarative Config Reference, Billing and Plans.

For engineers

  • Add a pricing block to each providers.targets[] entry in policy-config.yaml to enable cost tracking. Without it, token counts are tracked but cost fields are zero.
  • All rates are USD per 1 million tokens. Use input_price_per_million, cached_input_price_per_million, and output_price_per_million.
  • Use max_price.request in the routing section to skip expensive providers before the request is even made.
  • If kt policy lint warns about missing pricing on a max_price route, add pricing to the relevant targets.
  • The wallet reserve call adds ~5–15 ms latency before forwarding. This is synchronous and cannot be disabled when wallets are enabled.
  • If you receive HTTP 402, the caller must top up the wallet and resend with the ticket ID in the X-Cost-Ticket header.
  • Query spend logs via GET /v1/spend/logs with key_id, user_id, team_id, provider, and date range filters.

For leaders

  • Cost tracking provides request-level spend attribution — you can see exactly which team, user, agent, or API key incurred what cost, on which model.
  • The wallet reserve/settle mechanism prevents runaway spend: every request must be financially authorized before it reaches a provider.
  • max_price routing gives you two-layer protection: per-request caps skip expensive providers automatically, and wallet budgets hard-stop aggregate spend.
  • Cost tickets (HTTP 402) create a natural pause point when budgets are exhausted, rather than silently failing or allowing overspend.
  • The wallet_fail_mode setting lets you choose between safety (reject requests when the wallet API is down) and availability (allow requests through), depending on your risk tolerance.
  • Full spend attribution by team, user, and agent supports internal chargeback and departmental AI budget governance.

Next steps