Estimate vs Actual Cost Reconciliation
After the LLM provider returns a response, Keeptrusts reconciles the pre-dispatch estimate with the actual cost. This reconciliation drives wallet settlement, accuracy reporting, and continuous improvement of future estimates.
Use this page when
- You need to understand how Keeptrusts reconciles pre-dispatch estimates with actual provider-reported costs.
- You are configuring reconciliation alerts, feedback loops, or wallet settlement behavior.
- You want to use the Cost Center reconciliation view to identify estimation drift or adapter miscalibration.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
The Reconciliation Flow
When a response arrives from the provider, the gateway performs these steps:
- Extract actual usage — The gateway reads token usage from the provider's response headers or response metadata (e.g.,
usage.prompt_tokensandusage.completion_tokensin OpenAI responses). - Calculate actual cost — Using the model-pricing catalog rates and the actual token counts, the gateway computes the real cost.
- Compare with estimate — The gateway calculates the variance between estimated and actual costs.
- Settle the wallet — The wallet balance is debited by the actual cost, not the estimate. Any reserved amount from the pre-dispatch phase is adjusted.
- Log reconciliation data — The full estimate-vs-actual comparison is recorded for reporting and feedback.
- Trigger alerts if needed — If variance exceeds the configured threshold, a reconciliation alert is logged.
Reconciliation Data Fields
Each reconciled request produces the following data:
| Field | Type | Description |
|---|---|---|
actual_input_tokens | integer | Provider-reported input token count |
actual_output_tokens | integer | Provider-reported output token count |
actual_input_cost | decimal | Real cost for input tokens |
actual_output_cost | decimal | Real cost for output tokens |
actual_total_cost | decimal | Total actual cost charged to the wallet |
estimate_variance | decimal | Percentage difference between estimated and actual total cost |
cache_actual_savings | decimal | Actual cost saved through cache hits |
The estimate_variance is calculated as:
estimate_variance = ((estimated_total_cost - actual_total_cost) / actual_total_cost) * 100
A positive variance means you overestimated (the actual cost was lower). A negative variance means you underestimated (the actual cost was higher).
Wallet Settlement
Wallet settlement always uses the actual cost from the provider, never the estimate. The settlement flow works as follows:
- Pre-dispatch reservation — When the estimate is calculated, the gateway reserves the estimated amount from your wallet balance. This prevents over-spending during concurrent requests.
- Response arrival — Once the actual cost is known, the reservation is released.
- Actual debit — The wallet is debited by the actual cost.
- Balance update — Your available balance reflects the actual spend.
If the actual cost exceeds the reserved amount, the wallet absorbs the difference. If your wallet has insufficient balance for the overage, the request still completes (the response is not withheld), but a balance-exceeded event is recorded.
Reconciliation Alerts
You can configure alerts for when estimates consistently diverge from actuals. The alert threshold is controlled by:
reconciliation:
alert_threshold_percent: 20
alert_window_requests: 10
alert_channel: "cost-alerts"
| Setting | Default | Description |
|---|---|---|
alert_threshold_percent | 20 | Variance percentage that triggers an alert for a single request |
alert_window_requests | 10 | Number of recent requests to evaluate for persistent drift |
alert_channel | none | Notification channel for reconciliation alerts |
An alert fires when:
- A single request exceeds the
alert_threshold_percentvariance, OR - The average variance across the last
alert_window_requestsrequests exceeds half the threshold (indicating persistent drift rather than a one-off spike).
Viewing Reconciliation Data in the Console
The Cost Center in the console provides a reconciliation view:
- Navigate to Cost Center in the console sidebar.
- Select the Reconciliation tab.
- Filter by model, team, time range, or variance threshold.
The reconciliation view shows:
- Per-request breakdown — Estimated vs. actual cost for each request, with variance highlighted.
- Model accuracy trends — A time-series chart showing average variance per model over your selected period.
- Drift alerts — Active alerts for models or adapters with persistent estimation drift.
- Aggregate statistics — Total estimated spend vs. actual spend, overall accuracy percentage, and cumulative savings from improved estimation.
Setting Up Alerts for Persistent Estimation Drift
To receive notifications when estimation accuracy degrades:
- Navigate to Settings > Notifications in the console.
- Create a new alert rule with the trigger type Cost Estimation Drift.
- Configure the threshold percentage and evaluation window.
- Select your notification channel (webhook, email, or Slack).
Alert payloads include the model ID, average variance, sample request IDs, and a suggested remediation action (typically updating the tokenizer family or output multiplier).
The Feedback Loop
Reconciliation data feeds back into the estimation model to improve future accuracy:
How It Works
- Data collection — Each reconciled request stores the model ID, context pattern (prompt length bucket, presence of KB context, fabric usage), estimated tokens, and actual tokens.
- Pattern matching — When a new request matches a previously seen model+context pattern, the gateway checks historical accuracy for that pattern.
- Multiplier adjustment — If historical data shows the output multiplier consistently over- or under-estimates for a pattern, the gateway applies a learned correction factor.
- Confidence update — As more data accumulates for a model, the confidence level may increase from
lowtomediumor frommediumtohigh.
Feedback Loop Boundaries
The feedback loop operates within these constraints:
- Corrections are bounded to ±30% of the base estimate to prevent runaway adjustments.
- A minimum of 20 reconciled requests for a model+pattern combination is required before corrections are applied.
- Corrections decay over time (30-day half-life) to adapt to model behavior changes after provider updates.
- You can reset the feedback data for a model through the declarative reconciliation configuration.
Configuration Reference
reconciliation:
enabled: true
alert_threshold_percent: 20
alert_window_requests: 10
alert_channel: "cost-alerts"
feedback_loop:
enabled: true
min_samples: 20
max_correction_percent: 30
decay_half_life_days: 30
store_reconciliation_data: true
retention_days: 90
| Setting | Default | Description |
|---|---|---|
enabled | true | Enable reconciliation processing |
store_reconciliation_data | true | Persist per-request reconciliation records |
retention_days | 90 | How long to retain reconciliation records |
feedback_loop.enabled | true | Enable automatic estimation corrections |
feedback_loop.min_samples | 20 | Minimum requests before applying corrections |
feedback_loop.max_correction_percent | 30 | Maximum correction factor applied |
feedback_loop.decay_half_life_days | 30 | Half-life for correction decay |
Next steps
- Pre-Dispatch Prompt Cost Estimates — understand how estimates are calculated before dispatch.
- Token Estimation Across Providers — learn how token counts are estimated for different models.
- Cache and Fabric Cost Adjustments — see how cache and fabric affect both estimates and actuals.
For AI systems
- Canonical terms: Keeptrusts, cost reconciliation, estimate variance, wallet settlement, actual cost, feedback loop, reconciliation alerts, estimation drift, Cost Center reconciliation view.
- Feature/config names:
reconciliation.enabled,reconciliation.alert_threshold_percent,reconciliation.alert_window_requests,reconciliation.alert_channel,reconciliation.feedback_loop.enabled,reconciliation.feedback_loop.min_samples,reconciliation.feedback_loop.max_correction_percent,reconciliation.feedback_loop.decay_half_life_days,reconciliation.retention_days,estimate_variance,actual_total_cost. - Best next pages: Pre-Dispatch Prompt Cost Estimates, Token Estimation Across Providers, Cache and Fabric Cost Adjustments.
For engineers
- Wallet settlement always uses actual provider cost, not the estimate. Reservations are adjusted on response arrival.
- Configure alerts: set
alert_threshold_percent: 20andalert_window_requests: 10to detect persistent estimation drift. - View reconciliation data in the console: Cost Center → Reconciliation tab → filter by model and time range.
- Feedback loop: after 20+ reconciled requests per model+pattern, the system auto-corrects estimates within ±30%. Reset via Cost Center settings if corrections become stale.
For leaders
- Reconciliation ensures wallet balances reflect real spend, not estimates — no hidden cost overruns from estimation inaccuracy.
- Persistent drift alerts surface adapter miscalibration early, before it impacts budget accuracy.
- The feedback loop continuously improves estimation accuracy with zero manual intervention (bounded corrections, automatic decay).
- Retention of reconciliation data (default 90 days) supports financial reporting and audit requirements.