Cost and Spend

Keeptrusts automatically tracks token usage and cost for every request that passes through the gateway. The Spend surface and associated settings give you end-to-end visibility into where money is going and controls to cap it before it becomes a problem.

Use this page when

You need to configure per-request cost tracking for your gateway targets.
You want to understand how the wallet reserve/settle mechanism controls AI spend.
You are setting up pricing blocks in policy-config.yaml for accurate cost attribution.
You need to use max_price routing to prevent expensive requests from reaching high-cost providers.
You want to understand how cost tickets (HTTP 402) work when wallet balance is insufficient.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Workflow map

How spend tracking works

When the gateway receives a response from an upstream provider, it reads the usage object in the response body (e.g., prompt_tokens, completion_tokens, cached_tokens). It then computes a cost breakdown using whichever source of pricing is available:

Provider-supplied cost fields — if the upstream response includes cost fields directly, those values are used as-is.
Declarative config pricing — if your policy-config.yaml declares pricing for the provider target (providers.targets[].pricing), those rates are applied to the token counts.
No pricing declared — if neither an upstream cost field nor a declarative pricing block is available, token counts are still tracked but all cost fields in the spend log are recorded as zero.

After computing the cost, the gateway emits a spend log to POST /v1/spend/log in a fire-and-forget manner so the gateway latency is not affected.

Pricing recommendation

Explicit pricing blocks on your targets are preferred because they reflect your exact contractual rates. Without them, token counts are tracked but cost fields will be zero.

Declaring pricing in the declarative config

The gateway computes per-request costs from the pricing block on each providers.targets[] entry in your policy-config.yaml. Without this block, token counts are tracked but all cost fields in spend logs are zero.

Pricing fields

All rates are USD per 1 million tokens.

Field	Description
`input_price_per_million`	Cost per 1M input (prompt) tokens
`cached_input_price_per_million`	Cost per 1M cached input tokens — typically lower than the full input rate
`output_price_per_million`	Cost per 1M output (completion) tokens
`input_multiplier`	Scale factor applied to input token count before billing (default `1.0`)
`cached_input_multiplier`	Scale factor for cached input tokens (default `1.0`)
`output_multiplier`	Scale factor for output tokens (default `1.0`)

Legacy field names (prompt / completion) are also accepted and map to input_price_per_million and output_price_per_million respectively. Prefer the canonical names for new configs.

Minimal example

pack:
  name: cost-and-spend-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-gpt4o
    provider: openai
    model: gpt-4o
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Multipliers

Use multipliers when a provider bills at a non-standard token ratio. For example, a provider that bills audio at 4× the standard rate:

pricing:
  input_price_per_million: 0.006
  output_price_per_million: 0.024
  input_multiplier: 4.0     # 1 audio token billed as 4 text tokens

The gateway computes cost as:

cost = (tokens * multiplier / 1,000,000) * rate_per_million

Per-model pricing

When a single provider target serves multiple models, declare per-model pricing in the models array. The gateway matches the requested model against model_id and aliases, then uses that model's pricing block (falling back to the target-level pricing if no match is found).

pack:
  name: cost-and-spend-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: openai
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Cost-based routing with `max_price`

Once pricing is declared on at least one target, you can add cost ceilings to the routing section. The gateway evaluates the estimated prompt cost using the token count of the incoming request and skips any provider whose projected cost would exceed the ceiling.

providers:
  routing:
    strategy: ordered
    allow_fallbacks: true
    max_price:
      prompt: 0.01
      completion: 0.04
      request: 0.05
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Field	What it caps
`max_price.prompt`	Prompt (input) token cost alone
`max_price.completion`	Completion (output) token cost alone
`max_price.request`	Total per-request cost

Set max_price.request alongside a budget to get two-layer protection: per-request routing skips expensive providers before the request is made, and a budget hard-stops spend at the aggregate level.

Policy linter

If a max_price rule is present but no target in providers.targets has a pricing block, kt policy lint emits a warning. Cost ceilings are silently ignored for un-priced targets on the routing path.

Spend page

Navigate to Spend in the console sidebar to see a real-time breakdown of your AI usage costs.

Summary cards

The top of the page shows three summary cards for the selected date range:

Card	What it shows
Total cost	Sum of all `total_cost` values in the selected window
Total tokens	Sum of all `total_tokens` values in the selected window
Top provider	The provider with the highest total cost

View modes

Use the toggle in the filter row to switch between:

Summary — cost broken down by provider, with request count and total tokens per row.
Logs — a paginated table of individual spend log entries, each showing provider, model, pricing source, token counts, cost, and timestamp.

Filters

Filter	Description
Provider	Filter by provider name (e.g. `openai`, `anthropic`).
From / To	Date range using ISO 8601 date strings.

Pagination is available in the Logs view with Previous and Next controls. The default page size is 50 entries, up to 200 per page via the API.

Cost attribution

Spend logs capture multiple attribution dimensions in a single record:

Field	Description
`provider`	Provider name (e.g. `openai`)
`model`	Actual model used
`requested_model`	Model name the client originally requested
`requested_provider`	Provider name the client originally requested
`key_id`	Gateway key used, if any
`user_id`	User identity, if passed via `X-User-Id`
`team_id`	Team identity, if passed via `X-Team-Id`
`provider_target_id`	Declarative config target ID
`pricing_source`	`upstream` or `config_declared`
`metadata`	Arbitrary key-value pairs from the request

Filtering on GET /v1/spend/logs supports any combination of key_id, user_id, team_id, provider, and a date range.

Wallet & Credits

Keeptrusts includes a credit wallet system that works alongside spend tracking to give your organization prepaid financial control over every LLM request.

How balance is checked before each request

When the gateway receives an LLM request, it makes a synchronous reserve call to the control-plane API (POST /v1/gateway/wallets/reserve) before forwarding the request to the upstream provider. The reserve:

Estimates the request cost using the model_pricing catalog (input + max-completion tokens × per-token rates, converted to the org's currency at the current exchange rate, plus a configurable buffer percentage).
Walks the wallet cascade bottom-up: user wallet → team wallet → organization wallet.
Holds the estimated amount in the first wallet with sufficient balance and returns a reservation_id.
On upstream response, settles the reservation to the actual cost (releasing any surplus from cached tokens or shorter completions back to the wallet).

This synchronous reserve keeps the wallet balance authoritative at all times — there is no gateway-side credit cache. The ~5–15 ms added latency is negligible relative to typical provider response times (500 ms–5 s).

Relationship to spend logs

The wallet system and spend logs are parallel records. Every settled wallet transaction is also recorded in spend_logs (with pricing_source: wallet_catalog). Wallet debits are the financial source of truth; spend logs are the audit record.

Cost ticket flow

When all wallets in the cascade are insufficient to cover the estimated cost, the gateway:

Returns HTTP 402 with a cost_ticket payload in the response body.
The cost ticket captures the estimated cost, provider, model, a SHA-256 hash of the original request body, and a 24-hour expiry.
The chat client or API caller can display a "top up needed" message to the user.
After the user or admin replenishes balance, the caller resends the original request with the ticket ID in the X-Cost-Ticket header.
The gateway calls POST /v1/gateway/wallets/redeem-ticket, which skips re-estimation and reserves using the ticket's frozen cost. The request is then forwarded to the upstream provider as normal.

Tickets expire after 24 hours. If a ticket is expired or the request body hash does not match, the gateway issues a fresh estimate.

Fail mode when the wallet API is unreachable

If the wallet control-plane API is unreachable, the gateway behaves according to the org's wallet_fail_mode:

closed (default) — the request is rejected immediately. This is the safe default for organizations where every request must be financially authorized.
open — the request is forwarded without a reserve. Use with caution; it allows spend to accrue even if the API is temporarily down.

Org admins configure wallet_fail_mode from Settings → Wallets in the console.

When accessing an AI assistant through a Keeptrusts-managed gateway, the current wallet balance for your scope is visible in the chat sidebar under Credits. It shows available balance, in-flight reservations, and the org currency. A warning banner appears when your balance drops near the configured alert threshold.

See Wallets and Credits for the full guide to wallet scopes, allocation, top-up, and agent usage constraints.

For AI systems

Canonical terms: Keeptrusts Cost and Spend, spend tracking, pricing block, input_price_per_million, output_price_per_million, cached_input_price_per_million, multipliers, max_price routing, wallet reserve/settle, cost ticket, spend logs, cost attribution.
Console surface: Spend page (summary cards + logs view with provider/date filters).
Config fields: providers.targets[].pricing, providers.routing.max_price, providers.targets[].models[].pricing.
API endpoints: POST /v1/spend/log (gateway emits), GET /v1/spend/logs (query with filters), POST /v1/gateway/wallets/reserve, POST /v1/gateway/wallets/redeem-ticket.
Wallet cascade: user wallet → team wallet → organization wallet (first with sufficient balance wins).
Related pages: Wallets and Credits, Declarative Config Reference, Billing and Plans.

For engineers

Add a pricing block to each providers.targets[] entry in policy-config.yaml to enable cost tracking. Without it, token counts are tracked but cost fields are zero.
All rates are USD per 1 million tokens. Use input_price_per_million, cached_input_price_per_million, and output_price_per_million.
Use max_price.request in the routing section to skip expensive providers before the request is even made.
If kt policy lint warns about missing pricing on a max_price route, add pricing to the relevant targets.
The wallet reserve call adds ~5–15 ms latency before forwarding. This is synchronous and cannot be disabled when wallets are enabled.
If you receive HTTP 402, the caller must top up the wallet and resend with the ticket ID in the X-Cost-Ticket header.
Query spend logs via GET /v1/spend/logs with key_id, user_id, team_id, provider, and date range filters.

For leaders

Cost tracking provides request-level spend attribution — you can see exactly which team, user, agent, or API key incurred what cost, on which model.
The wallet reserve/settle mechanism prevents runaway spend: every request must be financially authorized before it reaches a provider.
max_price routing gives you two-layer protection: per-request caps skip expensive providers automatically, and wallet budgets hard-stop aggregate spend.
Cost tickets (HTTP 402) create a natural pause point when budgets are exhausted, rather than silently failing or allowing overspend.
The wallet_fail_mode setting lets you choose between safety (reject requests when the wallet API is down) and availability (allow requests through), depending on your risk tolerance.
Full spend attribution by team, user, and agent supports internal chargeback and departmental AI budget governance.

Next steps

Wallets and Credits — wallet scopes, cascade, allocation, top-up, and agent usage constraints
Declarative Config Reference — pricing declaration and max-price routing
Advanced: Rate Limiting — token-per-minute and request-per-minute limits per key

Use this page when​

Primary audience​

Workflow map​

How spend tracking works​

Declaring pricing in the declarative config​

Pricing fields​

Minimal example​

Multipliers​

Per-model pricing​

Cost-based routing with max_price​

Spend page​

Summary cards​

View modes​

Filters​

Cost attribution​

Wallet & Credits​

How balance is checked before each request​

Cost ticket flow​

Fail mode when the wallet API is unreachable​

Viewing balance in the chat sidebar​

For AI systems​

For engineers​

For leaders​

Next steps​