Estimate vs Actual Cost Reconciliation

After the LLM provider returns a response, Keeptrusts reconciles the pre-dispatch estimate with the actual cost. This reconciliation drives wallet settlement, accuracy reporting, and continuous improvement of future estimates.

Use this page when

You need to understand how Keeptrusts reconciles pre-dispatch estimates with actual provider-reported costs.
You are configuring reconciliation alerts, feedback loops, or wallet settlement behavior.
You want to use the Cost Center reconciliation view to identify estimation drift or adapter miscalibration.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

The Reconciliation Flow

When a response arrives from the provider, the gateway performs these steps:

Extract actual usage — The gateway reads token usage from the provider's response headers or response metadata (e.g., usage.prompt_tokens and usage.completion_tokens in OpenAI responses).
Calculate actual cost — Using the model-pricing catalog rates and the actual token counts, the gateway computes the real cost.
Compare with estimate — The gateway calculates the variance between estimated and actual costs.
Settle the wallet — The wallet balance is debited by the actual cost, not the estimate. Any reserved amount from the pre-dispatch phase is adjusted.
Log reconciliation data — The full estimate-vs-actual comparison is recorded for reporting and feedback.
Trigger alerts if needed — If variance exceeds the configured threshold, a reconciliation alert is logged.

Reconciliation Data Fields

Each reconciled request produces the following data:

Field	Type	Description
`actual_input_tokens`	integer	Provider-reported input token count
`actual_output_tokens`	integer	Provider-reported output token count
`actual_input_cost`	decimal	Real cost for input tokens
`actual_output_cost`	decimal	Real cost for output tokens
`actual_total_cost`	decimal	Total actual cost charged to the wallet
`estimate_variance`	decimal	Percentage difference between estimated and actual total cost
`cache_actual_savings`	decimal	Actual cost saved through cache hits

The estimate_variance is calculated as:

estimate_variance = ((estimated_total_cost - actual_total_cost) / actual_total_cost) * 100

A positive variance means you overestimated (the actual cost was lower). A negative variance means you underestimated (the actual cost was higher).

Wallet Settlement

Wallet settlement always uses the actual cost from the provider, never the estimate. The settlement flow works as follows:

Pre-dispatch reservation — When the estimate is calculated, the gateway reserves the estimated amount from your wallet balance. This prevents over-spending during concurrent requests.
Response arrival — Once the actual cost is known, the reservation is released.
Actual debit — The wallet is debited by the actual cost.
Balance update — Your available balance reflects the actual spend.

If the actual cost exceeds the reserved amount, the wallet absorbs the difference. If your wallet has insufficient balance for the overage, the request still completes (the response is not withheld), but a balance-exceeded event is recorded.

Reconciliation Alerts

You can configure alerts for when estimates consistently diverge from actuals. The alert threshold is controlled by:

reconciliation:
  alert_threshold_percent: 20
  alert_window_requests: 10
  alert_channel: "cost-alerts"

Setting	Default	Description
`alert_threshold_percent`	`20`	Variance percentage that triggers an alert for a single request
`alert_window_requests`	`10`	Number of recent requests to evaluate for persistent drift
`alert_channel`	none	Notification channel for reconciliation alerts

An alert fires when:

A single request exceeds the alert_threshold_percent variance, OR
The average variance across the last alert_window_requests requests exceeds half the threshold (indicating persistent drift rather than a one-off spike).

Viewing Reconciliation Data in the Console

The Cost Center in the console provides a reconciliation view:

Navigate to Cost Center in the console sidebar.
Select the Reconciliation tab.
Filter by model, team, time range, or variance threshold.

The reconciliation view shows:

Per-request breakdown — Estimated vs. actual cost for each request, with variance highlighted.
Model accuracy trends — A time-series chart showing average variance per model over your selected period.
Drift alerts — Active alerts for models or adapters with persistent estimation drift.
Aggregate statistics — Total estimated spend vs. actual spend, overall accuracy percentage, and cumulative savings from improved estimation.

Setting Up Alerts for Persistent Estimation Drift

To receive notifications when estimation accuracy degrades:

Navigate to Settings > Notifications in the console.
Create a new alert rule with the trigger type Cost Estimation Drift.
Configure the threshold percentage and evaluation window.
Select your notification channel (webhook, email, or Slack).

Alert payloads include the model ID, average variance, sample request IDs, and a suggested remediation action (typically updating the tokenizer family or output multiplier).

The Feedback Loop

Reconciliation data feeds back into the estimation model to improve future accuracy:

How It Works

Data collection — Each reconciled request stores the model ID, context pattern (prompt length bucket, presence of KB context, fabric usage), estimated tokens, and actual tokens.
Pattern matching — When a new request matches a previously seen model+context pattern, the gateway checks historical accuracy for that pattern.
Multiplier adjustment — If historical data shows the output multiplier consistently over- or under-estimates for a pattern, the gateway applies a learned correction factor.
Confidence update — As more data accumulates for a model, the confidence level may increase from low to medium or from medium to high.

Feedback Loop Boundaries

The feedback loop operates within these constraints:

Corrections are bounded to ±30% of the base estimate to prevent runaway adjustments.
A minimum of 20 reconciled requests for a model+pattern combination is required before corrections are applied.
Corrections decay over time (30-day half-life) to adapt to model behavior changes after provider updates.
You can reset the feedback data for a model through the declarative reconciliation configuration.

Configuration Reference

reconciliation:
  enabled: true
  alert_threshold_percent: 20
  alert_window_requests: 10
  alert_channel: "cost-alerts"
  feedback_loop:
    enabled: true
    min_samples: 20
    max_correction_percent: 30
    decay_half_life_days: 30
  store_reconciliation_data: true
  retention_days: 90

Setting	Default	Description
`enabled`	`true`	Enable reconciliation processing
`store_reconciliation_data`	`true`	Persist per-request reconciliation records
`retention_days`	`90`	How long to retain reconciliation records
`feedback_loop.enabled`	`true`	Enable automatic estimation corrections
`feedback_loop.min_samples`	`20`	Minimum requests before applying corrections
`feedback_loop.max_correction_percent`	`30`	Maximum correction factor applied
`feedback_loop.decay_half_life_days`	`30`	Half-life for correction decay

Next steps

Pre-Dispatch Prompt Cost Estimates — understand how estimates are calculated before dispatch.
Token Estimation Across Providers — learn how token counts are estimated for different models.
Cache and Fabric Cost Adjustments — see how cache and fabric affect both estimates and actuals.

For AI systems

Canonical terms: Keeptrusts, cost reconciliation, estimate variance, wallet settlement, actual cost, feedback loop, reconciliation alerts, estimation drift, Cost Center reconciliation view.
Feature/config names: reconciliation.enabled, reconciliation.alert_threshold_percent, reconciliation.alert_window_requests, reconciliation.alert_channel, reconciliation.feedback_loop.enabled, reconciliation.feedback_loop.min_samples, reconciliation.feedback_loop.max_correction_percent, reconciliation.feedback_loop.decay_half_life_days, reconciliation.retention_days, estimate_variance, actual_total_cost.
Best next pages: Pre-Dispatch Prompt Cost Estimates, Token Estimation Across Providers, Cache and Fabric Cost Adjustments.

For engineers

Wallet settlement always uses actual provider cost, not the estimate. Reservations are adjusted on response arrival.
Configure alerts: set alert_threshold_percent: 20 and alert_window_requests: 10 to detect persistent estimation drift.
View reconciliation data in the console: Cost Center → Reconciliation tab → filter by model and time range.
Feedback loop: after 20+ reconciled requests per model+pattern, the system auto-corrects estimates within ±30%. Reset via Cost Center settings if corrections become stale.

For leaders

Reconciliation ensures wallet balances reflect real spend, not estimates — no hidden cost overruns from estimation inaccuracy.
Persistent drift alerts surface adapter miscalibration early, before it impacts budget accuracy.
The feedback loop continuously improves estimation accuracy with zero manual intervention (bounded corrections, automatic decay).
Retention of reconciliation data (default 90 days) supports financial reporting and audit requirements.

Use this page when​

Primary audience​

The Reconciliation Flow​

Reconciliation Data Fields​

Wallet Settlement​

Reconciliation Alerts​

Viewing Reconciliation Data in the Console​

Setting Up Alerts for Persistent Estimation Drift​

The Feedback Loop​

How It Works​

Feedback Loop Boundaries​

Configuration Reference​

Next steps​

For AI systems​

For engineers​

For leaders​