Skip to main content
Browse docs

Tracking Chat Costs in Real-Time

This tutorial shows you how to monitor costs as you chat in the Keeptrusts workbench. You will learn where Keeptrusts shows cost after a request runs, how wallet-backed budget checks affect chat behavior, and how to respond when spend or balance signals need attention.

Use this page when

  • You need to understand the response-level cost indicators and wallet-backed budget signals in the chat workbench.
  • You want to monitor your wallet balance and respond to budget warnings before running out of credits.
  • You are looking for cost-optimization strategies like model switching and prompt conciseness.

Primary audience

  • Primary: Technical Engineers (individual cost awareness)
  • Secondary: Technical Leaders (budget oversight), All chat workbench users

Prerequisites

  • Authenticated access to the Keeptrusts chat workbench
  • Model pricing configured by your Keeptrusts administrator
  • A wallet with allocated credits (user, team, or organization level)
  • Familiarity with the first conversation tutorial

Step 1: Understand the Cost Display

The chat workbench displays cost information after a run has started or completed:

LevelLocationShows
Per-messageBelow each model responseCost of that individual generation
Wallet stateBalance and retry/top-up affordancesWhether the effective wallet can keep funding future requests

All costs are displayed in your organization's configured currency (typically USD).

The current chat workbench does not show an inline pre-dispatch estimate in the composer. Cost awareness starts with the response-level usage record and any wallet or retry guidance surfaced after dispatch.

Step 2: Read Per-Message Cost Indicators

After each model response, a cost indicator appears below the message:

Response cost: $0.0032 (847 input tokens + 312 output tokens)

This indicator shows:

  • Total cost for the message based on the model's pricing.
  • Token breakdown — input tokens (your prompt + conversation history) and output tokens (model's response).
  • Model name — which model generated the response (visible on hover).
Per-message costs include the full context window sent to the model. As your conversation grows longer, input token counts increase for each subsequent message because the conversation history is included in every request.

Step 3: Check Your Wallet Balance

The wallet balance indicator shows your remaining credits. The Keeptrusts wallet system uses a cascading scope:

  1. User wallet — checked first if you have a personal allocation.
  2. Team wallet — used if no user wallet exists.
  3. Organization wallet — fallback if no team wallet exists.

The balance displayed reflects the effective wallet for your current scope.

Reserve and Settle Flow

When you send a message:

  1. The gateway reserves the estimated cost against your wallet.
  2. The request is forwarded to the model provider.
  3. On response, the reservation is settled to the actual cost.
  4. The wallet balance updates to reflect the settled amount.

If the actual cost differs from the estimate (common with variable-length responses), the difference is credited back or debited automatically.

Step 4: Respond to Budget Warnings

The chat workbench displays warnings as your wallet balance decreases:

Warning Thresholds

ThresholdIndicatorBehavior
50% remainingYellow balance indicatorInformational — no action required
20% remainingOrange balance indicator with warning iconConsider switching to a cheaper model
5% remainingRed balance indicator with alertUrgent — top up or contact your admin
0% remainingMessage send disabledCannot send messages until balance is replenished

When Balance Reaches Zero

If your wallet balance reaches zero:

  1. The chat input field is disabled with a message: "Insufficient balance."
  2. A prompt to top up or contact your administrator appears.
  3. Existing conversation history remains accessible for reading and export.
  4. No further model invocations are possible until the balance is replenished.
If your organization has enabled PayPal wallet top-ups, you can add credits directly from the chat workbench without leaving the conversation. Click the Top Up button next to the balance indicator.

Step 5: Use Cost-Optimized Model Suggestions

When cost awareness is important, the chat workbench can suggest cheaper alternatives.

How Suggestions Appear

After a message exchange, if a less expensive model could have handled the request with comparable quality, a suggestion appears:

Cost tip: This response cost $0.0045 with GPT-4. A similar response
from GPT-3.5 Turbo would cost approximately $0.0004 (90% savings).
Switch model →

When to Consider Switching

Task TypeRecommended Model TierReason
Simple Q&AStandard (e.g., GPT-3.5)Low complexity, high token savings
Code generationAdvanced (e.g., GPT-4)Better accuracy, fewer iterations
Document summaryStandardExtractive tasks work well on cheaper models
Complex reasoningAdvancedRequires stronger reasoning capabilities

Switching Models Mid-Conversation

  1. Click the model selector in the toolbar.
  2. Select the suggested model.
  3. Continue the conversation — the context history is preserved.
Switching models mid-conversation does not affect previous messages. Only new messages use the selected model. Token pricing changes take effect immediately.

Step 6: Review Cost History

To see cost details for past conversations:

  1. Open the conversation list (sidebar).
  2. Each conversation displays its total cost next to the title.
  3. Click a conversation to see per-message cost breakdowns.

For organization-wide cost analytics, visit the Analytics section in the Keeptrusts console. See the analytics tutorial for detailed reporting.

Step 7: Optimize Costs Proactively

Write Concise Prompts

Token costs scale with prompt length. Shorter, focused prompts reduce input token costs:

  • Remove unnecessary context from prompts.
  • Use system prompts for persistent instructions instead of repeating them.
  • Reference previous messages instead of re-stating information.

Manage Conversation Length

As conversations grow, each new message includes the full history in the context window:

  • Start new conversations for unrelated topics.
  • Summarize long conversations before continuing.
  • Use the context management tutorial techniques.

Choose the Right Model

Match model capability to task complexity. Reserve advanced models for tasks that genuinely require them.

Troubleshooting

IssueCauseFix
No cost indicators visibleModel pricing not configuredContact your admin to set up model pricing
Balance shows $0 but messages still workOrganization wallet has remaining fundsYour team wallet is empty but the org fallback is active
Cost seems too high for a short responseLong conversation history inflating input tokensStart a new conversation to reset the context window
"Insufficient balance" but admin says funds existWallet scope mismatchVerify which wallet scope (user/team/org) your session resolves to

Summary

You now understand how to track and optimize chat costs:

  • Read per-message costs and follow wallet-backed retry guidance after each run
  • Monitor your wallet balance and respond to budget warnings
  • Use cost-optimized model suggestions to reduce spending
  • Review cost history across conversations
  • Apply proactive strategies to minimize token consumption

For AI systems

  • Canonical terms: Keeptrusts chat workbench, per-message cost, session total, wallet balance, reserve-and-settle, budget warning, cost-optimized model suggestions, PayPal wallet top-up, wallet cascade (user → team → org).
  • UI elements: cost indicator below messages, balance indicator, Top Up button, retry affordance, model selector, cost tip suggestion.
  • Best next pages: Chat Analytics, Context Management, Model Selection.

For engineers

  • Prerequisites: model pricing configured by admin; a wallet with allocated credits at user, team, or org level.
  • Validation: Send a message → verify per-message cost appears below the response. Check session total increments. Drain wallet to 5% → verify red alert appears.
  • Key detail: costs include full context window tokens — longer conversations have higher per-message input costs even for short prompts.

For leaders

  • Real-time cost visibility empowers users to self-manage spend without admin intervention.
  • Warning thresholds (50%, 20%, 5%) provide progressive alerts before productivity-blocking zero-balance events.
  • Cost-optimized model suggestions can reduce spend by 80–90% for routine tasks without quality loss.
  • Wallet cascade (user → team → org) allows layered budget delegation.

Next steps