Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Estimate vs Actual Cost Reconciliation

After the LLM provider returns a response, Keeptrusts reconciles the pre-dispatch estimate with the actual cost. This reconciliation drives wallet settlement, accuracy reporting, and continuous improvement of future estimates.

Use this page when

  • You need to understand how Keeptrusts reconciles pre-dispatch estimates with actual provider-reported costs.
  • You are configuring reconciliation alerts, feedback loops, or wallet settlement behavior.
  • You want to use the Cost Center reconciliation view to identify estimation drift or adapter miscalibration.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

The Reconciliation Flow

When a response arrives from the provider, the gateway performs these steps:

  1. Extract actual usage — The gateway reads token usage from the provider's response headers or response metadata (e.g., usage.prompt_tokens and usage.completion_tokens in OpenAI responses).
  2. Calculate actual cost — Using the model-pricing catalog rates and the actual token counts, the gateway computes the real cost.
  3. Compare with estimate — The gateway calculates the variance between estimated and actual costs.
  4. Settle the wallet — The wallet balance is debited by the actual cost, not the estimate. Any reserved amount from the pre-dispatch phase is adjusted.
  5. Log reconciliation data — The full estimate-vs-actual comparison is recorded for reporting and feedback.
  6. Trigger alerts if needed — If variance exceeds the configured threshold, a reconciliation alert is logged.

Reconciliation Data Fields

Each reconciled request produces the following data:

FieldTypeDescription
actual_input_tokensintegerProvider-reported input token count
actual_output_tokensintegerProvider-reported output token count
actual_input_costdecimalReal cost for input tokens
actual_output_costdecimalReal cost for output tokens
actual_total_costdecimalTotal actual cost charged to the wallet
estimate_variancedecimalPercentage difference between estimated and actual total cost
cache_actual_savingsdecimalActual cost saved through cache hits

The estimate_variance is calculated as:

estimate_variance = ((estimated_total_cost - actual_total_cost) / actual_total_cost) * 100

A positive variance means you overestimated (the actual cost was lower). A negative variance means you underestimated (the actual cost was higher).

Wallet Settlement

Wallet settlement always uses the actual cost from the provider, never the estimate. The settlement flow works as follows:

  1. Pre-dispatch reservation — When the estimate is calculated, the gateway reserves the estimated amount from your wallet balance. This prevents over-spending during concurrent requests.
  2. Response arrival — Once the actual cost is known, the reservation is released.
  3. Actual debit — The wallet is debited by the actual cost.
  4. Balance update — Your available balance reflects the actual spend.

If the actual cost exceeds the reserved amount, the wallet absorbs the difference. If your wallet has insufficient balance for the overage, the request still completes (the response is not withheld), but a balance-exceeded event is recorded.

Reconciliation Alerts

You can configure alerts for when estimates consistently diverge from actuals. The alert threshold is controlled by:

reconciliation:
alert_threshold_percent: 20
alert_window_requests: 10
alert_channel: "cost-alerts"
SettingDefaultDescription
alert_threshold_percent20Variance percentage that triggers an alert for a single request
alert_window_requests10Number of recent requests to evaluate for persistent drift
alert_channelnoneNotification channel for reconciliation alerts

An alert fires when:

  • A single request exceeds the alert_threshold_percent variance, OR
  • The average variance across the last alert_window_requests requests exceeds half the threshold (indicating persistent drift rather than a one-off spike).

Viewing Reconciliation Data in the Console

The Cost Center in the console provides a reconciliation view:

  1. Navigate to Cost Center in the console sidebar.
  2. Select the Reconciliation tab.
  3. Filter by model, team, time range, or variance threshold.

The reconciliation view shows:

  • Per-request breakdown — Estimated vs. actual cost for each request, with variance highlighted.
  • Model accuracy trends — A time-series chart showing average variance per model over your selected period.
  • Drift alerts — Active alerts for models or adapters with persistent estimation drift.
  • Aggregate statistics — Total estimated spend vs. actual spend, overall accuracy percentage, and cumulative savings from improved estimation.

Setting Up Alerts for Persistent Estimation Drift

To receive notifications when estimation accuracy degrades:

  1. Navigate to Settings > Notifications in the console.
  2. Create a new alert rule with the trigger type Cost Estimation Drift.
  3. Configure the threshold percentage and evaluation window.
  4. Select your notification channel (webhook, email, or Slack).

Alert payloads include the model ID, average variance, sample request IDs, and a suggested remediation action (typically updating the tokenizer family or output multiplier).

The Feedback Loop

Reconciliation data feeds back into the estimation model to improve future accuracy:

How It Works

  1. Data collection — Each reconciled request stores the model ID, context pattern (prompt length bucket, presence of KB context, fabric usage), estimated tokens, and actual tokens.
  2. Pattern matching — When a new request matches a previously seen model+context pattern, the gateway checks historical accuracy for that pattern.
  3. Multiplier adjustment — If historical data shows the output multiplier consistently over- or under-estimates for a pattern, the gateway applies a learned correction factor.
  4. Confidence update — As more data accumulates for a model, the confidence level may increase from low to medium or from medium to high.

Feedback Loop Boundaries

The feedback loop operates within these constraints:

  • Corrections are bounded to ±30% of the base estimate to prevent runaway adjustments.
  • A minimum of 20 reconciled requests for a model+pattern combination is required before corrections are applied.
  • Corrections decay over time (30-day half-life) to adapt to model behavior changes after provider updates.
  • You can reset the feedback data for a model through the declarative reconciliation configuration.

Configuration Reference

reconciliation:
enabled: true
alert_threshold_percent: 20
alert_window_requests: 10
alert_channel: "cost-alerts"
feedback_loop:
enabled: true
min_samples: 20
max_correction_percent: 30
decay_half_life_days: 30
store_reconciliation_data: true
retention_days: 90
SettingDefaultDescription
enabledtrueEnable reconciliation processing
store_reconciliation_datatruePersist per-request reconciliation records
retention_days90How long to retain reconciliation records
feedback_loop.enabledtrueEnable automatic estimation corrections
feedback_loop.min_samples20Minimum requests before applying corrections
feedback_loop.max_correction_percent30Maximum correction factor applied
feedback_loop.decay_half_life_days30Half-life for correction decay

Next steps

For AI systems

  • Canonical terms: Keeptrusts, cost reconciliation, estimate variance, wallet settlement, actual cost, feedback loop, reconciliation alerts, estimation drift, Cost Center reconciliation view.
  • Feature/config names: reconciliation.enabled, reconciliation.alert_threshold_percent, reconciliation.alert_window_requests, reconciliation.alert_channel, reconciliation.feedback_loop.enabled, reconciliation.feedback_loop.min_samples, reconciliation.feedback_loop.max_correction_percent, reconciliation.feedback_loop.decay_half_life_days, reconciliation.retention_days, estimate_variance, actual_total_cost.
  • Best next pages: Pre-Dispatch Prompt Cost Estimates, Token Estimation Across Providers, Cache and Fabric Cost Adjustments.

For engineers

  • Wallet settlement always uses actual provider cost, not the estimate. Reservations are adjusted on response arrival.
  • Configure alerts: set alert_threshold_percent: 20 and alert_window_requests: 10 to detect persistent estimation drift.
  • View reconciliation data in the console: Cost Center → Reconciliation tab → filter by model and time range.
  • Feedback loop: after 20+ reconciled requests per model+pattern, the system auto-corrects estimates within ±30%. Reset via Cost Center settings if corrections become stale.

For leaders

  • Reconciliation ensures wallet balances reflect real spend, not estimates — no hidden cost overruns from estimation inaccuracy.
  • Persistent drift alerts surface adapter miscalibration early, before it impacts budget accuracy.
  • The feedback loop continuously improves estimation accuracy with zero manual intervention (bounded corrections, automatic decay).
  • Retention of reconciliation data (default 90 days) supports financial reporting and audit requirements.