Cost and Spend
Keeptrusts automatically tracks token usage and cost for every request that passes through the gateway. The Spend surface and associated settings give you end-to-end visibility into where money is going and controls to cap it before it becomes a problem.
Use this page when
- You need to configure per-request cost tracking for your gateway targets.
- You want to understand how the wallet reserve/settle mechanism controls AI spend.
- You are setting up
pricingblocks inpolicy-config.yamlfor accurate cost attribution. - You need to use
max_pricerouting to prevent expensive requests from reaching high-cost providers. - You want to understand how cost tickets (HTTP 402) work when wallet balance is insufficient.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Workflow map
How spend tracking works
When the gateway receives a response from an upstream provider, it reads the usage object in the response body (e.g., prompt_tokens, completion_tokens, cached_tokens). It then computes a cost breakdown using whichever source of pricing is available:
- Provider-supplied cost fields — if the upstream response includes cost fields directly, those values are used as-is.
- Declarative config pricing — if your
policy-config.yamldeclares pricing for the provider target (providers.targets[].pricing), those rates are applied to the token counts. - No pricing declared — if neither an upstream cost field nor a declarative pricing block is available, token counts are still tracked but all cost fields in the spend log are recorded as zero.
After computing the cost, the gateway emits a spend log to POST /v1/spend/log in a fire-and-forget manner so the gateway latency is not affected.
Explicit pricing blocks on your targets are preferred because they reflect your exact contractual rates. Without them, token counts are tracked but cost fields will be zero.
Declaring pricing in the declarative config
The gateway computes per-request costs from the pricing block on each providers.targets[] entry in your policy-config.yaml. Without this block, token counts are tracked but all cost fields in spend logs are zero.
Pricing fields
All rates are USD per 1 million tokens.
| Field | Description |
|---|---|
input_price_per_million | Cost per 1M input (prompt) tokens |
cached_input_price_per_million | Cost per 1M cached input tokens — typically lower than the full input rate |
output_price_per_million | Cost per 1M output (completion) tokens |
input_multiplier | Scale factor applied to input token count before billing (default 1.0) |
cached_input_multiplier | Scale factor for cached input tokens (default 1.0) |
output_multiplier | Scale factor for output tokens (default 1.0) |
Legacy field names (prompt / completion) are also accepted and map to input_price_per_million and output_price_per_million respectively. Prefer the canonical names for new configs.
Minimal example
pack:
name: cost-and-spend-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-gpt4o
provider: openai
model: gpt-4o
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Multipliers
Use multipliers when a provider bills at a non-standard token ratio. For example, a provider that bills audio at 4× the standard rate:
pricing:
input_price_per_million: 0.006
output_price_per_million: 0.024
input_multiplier: 4.0 # 1 audio token billed as 4 text tokens
The gateway computes cost as:
cost = (tokens * multiplier / 1,000,000) * rate_per_million
Per-model pricing
When a single provider target serves multiple models, declare per-model pricing in the models array. The gateway matches the requested model against model_id and aliases, then uses that model's pricing block (falling back to the target-level pricing if no match is found).
pack:
name: cost-and-spend-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider: openai
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Cost-based routing with max_price
Once pricing is declared on at least one target, you can add cost ceilings to the routing section. The gateway evaluates the estimated prompt cost using the token count of the incoming request and skips any provider whose projected cost would exceed the ceiling.
providers:
routing:
strategy: ordered
allow_fallbacks: true
max_price:
prompt: 0.01
completion: 0.04
request: 0.05
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
| Field | What it caps |
|---|---|
max_price.prompt | Prompt (input) token cost alone |
max_price.completion | Completion (output) token cost alone |
max_price.request | Total per-request cost |
max_price.request alongside a budget to get two-layer protection: per-request routing skips expensive providers before the request is made, and a budget hard-stops spend at the aggregate level.If a max_price rule is present but no target in providers.targets has a pricing block, kt policy lint emits a warning. Cost ceilings are silently ignored for un-priced targets on the routing path.
Spend page
Navigate to Spend in the console sidebar to see a real-time breakdown of your AI usage costs.
Summary cards
The top of the page shows three summary cards for the selected date range:
| Card | What it shows |
|---|---|
| Total cost | Sum of all total_cost values in the selected window |
| Total tokens | Sum of all total_tokens values in the selected window |
| Top provider | The provider with the highest total cost |
View modes
Use the toggle in the filter row to switch between:
- Summary — cost broken down by provider, with request count and total tokens per row.
- Logs — a paginated table of individual spend log entries, each showing provider, model, pricing source, token counts, cost, and timestamp.
Filters
| Filter | Description |
|---|---|
| Provider | Filter by provider name (e.g. openai, anthropic). |
| From / To | Date range using ISO 8601 date strings. |
Pagination is available in the Logs view with Previous and Next controls. The default page size is 50 entries, up to 200 per page via the API.
Cost attribution
Spend logs capture multiple attribution dimensions in a single record:
| Field | Description |
|---|---|
provider | Provider name (e.g. openai) |
model | Actual model used |
requested_model | Model name the client originally requested |
requested_provider | Provider name the client originally requested |
key_id | Gateway key used, if any |
user_id | User identity, if passed via X-User-Id |
team_id | Team identity, if passed via X-Team-Id |
provider_target_id | Declarative config target ID |
pricing_source | upstream or config_declared |
metadata | Arbitrary key-value pairs from the request |
Filtering on GET /v1/spend/logs supports any combination of key_id, user_id, team_id, provider, and a date range.
Wallet & Credits
Keeptrusts includes a credit wallet system that works alongside spend tracking to give your organization prepaid financial control over every LLM request.
How balance is checked before each request
When the gateway receives an LLM request, it makes a synchronous reserve call to the control-plane API (POST /v1/gateway/wallets/reserve) before forwarding the request to the upstream provider. The reserve:
- Estimates the request cost using the
model_pricingcatalog (input + max-completion tokens × per-token rates, converted to the org's currency at the current exchange rate, plus a configurable buffer percentage). - Walks the wallet cascade bottom-up: user wallet → team wallet → organization wallet.
- Holds the estimated amount in the first wallet with sufficient balance and returns a
reservation_id. - On upstream response, settles the reservation to the actual cost (releasing any surplus from cached tokens or shorter completions back to the wallet).
This synchronous reserve keeps the wallet balance authoritative at all times — there is no gateway-side credit cache. The ~5–15 ms added latency is negligible relative to typical provider response times (500 ms–5 s).
The wallet system and spend logs are parallel records. Every settled wallet transaction is also recorded in spend_logs (with pricing_source: wallet_catalog). Wallet debits are the financial source of truth; spend logs are the audit record.
Cost ticket flow
When all wallets in the cascade are insufficient to cover the estimated cost, the gateway:
- Returns HTTP 402 with a
cost_ticketpayload in the response body. - The cost ticket captures the estimated cost, provider, model, a SHA-256 hash of the original request body, and a 24-hour expiry.
- The chat client or API caller can display a "top up needed" message to the user.
- After the user or admin replenishes balance, the caller resends the original request with the ticket ID in the
X-Cost-Ticketheader. - The gateway calls
POST /v1/gateway/wallets/redeem-ticket, which skips re-estimation and reserves using the ticket's frozen cost. The request is then forwarded to the upstream provider as normal.
Tickets expire after 24 hours. If a ticket is expired or the request body hash does not match, the gateway issues a fresh estimate.
Fail mode when the wallet API is unreachable
If the wallet control-plane API is unreachable, the gateway behaves according to the org's wallet_fail_mode:
closed(default) — the request is rejected immediately. This is the safe default for organizations where every request must be financially authorized.open— the request is forwarded without a reserve. Use with caution; it allows spend to accrue even if the API is temporarily down.
Org admins configure wallet_fail_mode from Settings → Wallets in the console.
Viewing balance in the chat sidebar
When accessing an AI assistant through a Keeptrusts-managed gateway, the current wallet balance for your scope is visible in the chat sidebar under Credits. It shows available balance, in-flight reservations, and the org currency. A warning banner appears when your balance drops near the configured alert threshold.
See Wallets and Credits for the full guide to wallet scopes, allocation, top-up, and agent usage constraints.
For AI systems
- Canonical terms: Keeptrusts Cost and Spend, spend tracking, pricing block,
input_price_per_million,output_price_per_million,cached_input_price_per_million, multipliers,max_pricerouting, wallet reserve/settle, cost ticket, spend logs, cost attribution. - Console surface: Spend page (summary cards + logs view with provider/date filters).
- Config fields:
providers.targets[].pricing,providers.routing.max_price,providers.targets[].models[].pricing. - API endpoints:
POST /v1/spend/log(gateway emits),GET /v1/spend/logs(query with filters),POST /v1/gateway/wallets/reserve,POST /v1/gateway/wallets/redeem-ticket. - Wallet cascade: user wallet → team wallet → organization wallet (first with sufficient balance wins).
- Related pages: Wallets and Credits, Declarative Config Reference, Billing and Plans.
For engineers
- Add a
pricingblock to eachproviders.targets[]entry inpolicy-config.yamlto enable cost tracking. Without it, token counts are tracked but cost fields are zero. - All rates are USD per 1 million tokens. Use
input_price_per_million,cached_input_price_per_million, andoutput_price_per_million. - Use
max_price.requestin the routing section to skip expensive providers before the request is even made. - If
kt policy lintwarns about missing pricing on amax_priceroute, add pricing to the relevant targets. - The wallet reserve call adds ~5–15 ms latency before forwarding. This is synchronous and cannot be disabled when wallets are enabled.
- If you receive HTTP 402, the caller must top up the wallet and resend with the ticket ID in the
X-Cost-Ticketheader. - Query spend logs via
GET /v1/spend/logswithkey_id,user_id,team_id,provider, and date range filters.
For leaders
- Cost tracking provides request-level spend attribution — you can see exactly which team, user, agent, or API key incurred what cost, on which model.
- The wallet reserve/settle mechanism prevents runaway spend: every request must be financially authorized before it reaches a provider.
max_pricerouting gives you two-layer protection: per-request caps skip expensive providers automatically, and wallet budgets hard-stop aggregate spend.- Cost tickets (HTTP 402) create a natural pause point when budgets are exhausted, rather than silently failing or allowing overspend.
- The
wallet_fail_modesetting lets you choose between safety (reject requests when the wallet API is down) and availability (allow requests through), depending on your risk tolerance. - Full spend attribution by team, user, and agent supports internal chargeback and departmental AI budget governance.
Next steps
- Wallets and Credits — wallet scopes, cascade, allocation, top-up, and agent usage constraints
- Declarative Config Reference — pricing declaration and max-price routing
- Advanced: Rate Limiting — token-per-minute and request-per-minute limits per key