Tracking Chat Costs in Real-Time
This tutorial shows you how to monitor costs as you chat in the Keeptrusts workbench. You will learn where Keeptrusts shows cost after a request runs, how wallet-backed budget checks affect chat behavior, and how to respond when spend or balance signals need attention.
Use this page when
- You need to understand the response-level cost indicators and wallet-backed budget signals in the chat workbench.
- You want to monitor your wallet balance and respond to budget warnings before running out of credits.
- You are looking for cost-optimization strategies like model switching and prompt conciseness.
Primary audience
- Primary: Technical Engineers (individual cost awareness)
- Secondary: Technical Leaders (budget oversight), All chat workbench users
Prerequisites
- Authenticated access to the Keeptrusts chat workbench
- Model pricing configured by your Keeptrusts administrator
- A wallet with allocated credits (user, team, or organization level)
- Familiarity with the first conversation tutorial
Step 1: Understand the Cost Display
The chat workbench displays cost information after a run has started or completed:
| Level | Location | Shows |
|---|---|---|
| Per-message | Below each model response | Cost of that individual generation |
| Wallet state | Balance and retry/top-up affordances | Whether the effective wallet can keep funding future requests |
All costs are displayed in your organization's configured currency (typically USD).
The current chat workbench does not show an inline pre-dispatch estimate in the composer. Cost awareness starts with the response-level usage record and any wallet or retry guidance surfaced after dispatch.
Step 2: Read Per-Message Cost Indicators
After each model response, a cost indicator appears below the message:
Response cost: $0.0032 (847 input tokens + 312 output tokens)
This indicator shows:
- Total cost for the message based on the model's pricing.
- Token breakdown — input tokens (your prompt + conversation history) and output tokens (model's response).
- Model name — which model generated the response (visible on hover).
Step 3: Check Your Wallet Balance
The wallet balance indicator shows your remaining credits. The Keeptrusts wallet system uses a cascading scope:
- User wallet — checked first if you have a personal allocation.
- Team wallet — used if no user wallet exists.
- Organization wallet — fallback if no team wallet exists.
The balance displayed reflects the effective wallet for your current scope.
Reserve and Settle Flow
When you send a message:
- The gateway reserves the estimated cost against your wallet.
- The request is forwarded to the model provider.
- On response, the reservation is settled to the actual cost.
- The wallet balance updates to reflect the settled amount.
If the actual cost differs from the estimate (common with variable-length responses), the difference is credited back or debited automatically.
Step 4: Respond to Budget Warnings
The chat workbench displays warnings as your wallet balance decreases:
Warning Thresholds
| Threshold | Indicator | Behavior |
|---|---|---|
| 50% remaining | Yellow balance indicator | Informational — no action required |
| 20% remaining | Orange balance indicator with warning icon | Consider switching to a cheaper model |
| 5% remaining | Red balance indicator with alert | Urgent — top up or contact your admin |
| 0% remaining | Message send disabled | Cannot send messages until balance is replenished |
When Balance Reaches Zero
If your wallet balance reaches zero:
- The chat input field is disabled with a message: "Insufficient balance."
- A prompt to top up or contact your administrator appears.
- Existing conversation history remains accessible for reading and export.
- No further model invocations are possible until the balance is replenished.
Step 5: Use Cost-Optimized Model Suggestions
When cost awareness is important, the chat workbench can suggest cheaper alternatives.
How Suggestions Appear
After a message exchange, if a less expensive model could have handled the request with comparable quality, a suggestion appears:
Cost tip: This response cost $0.0045 with GPT-4. A similar response
from GPT-3.5 Turbo would cost approximately $0.0004 (90% savings).
Switch model →
When to Consider Switching
| Task Type | Recommended Model Tier | Reason |
|---|---|---|
| Simple Q&A | Standard (e.g., GPT-3.5) | Low complexity, high token savings |
| Code generation | Advanced (e.g., GPT-4) | Better accuracy, fewer iterations |
| Document summary | Standard | Extractive tasks work well on cheaper models |
| Complex reasoning | Advanced | Requires stronger reasoning capabilities |
Switching Models Mid-Conversation
- Click the model selector in the toolbar.
- Select the suggested model.
- Continue the conversation — the context history is preserved.
Step 6: Review Cost History
To see cost details for past conversations:
- Open the conversation list (sidebar).
- Each conversation displays its total cost next to the title.
- Click a conversation to see per-message cost breakdowns.
For organization-wide cost analytics, visit the Analytics section in the Keeptrusts console. See the analytics tutorial for detailed reporting.
Step 7: Optimize Costs Proactively
Write Concise Prompts
Token costs scale with prompt length. Shorter, focused prompts reduce input token costs:
- Remove unnecessary context from prompts.
- Use system prompts for persistent instructions instead of repeating them.
- Reference previous messages instead of re-stating information.
Manage Conversation Length
As conversations grow, each new message includes the full history in the context window:
- Start new conversations for unrelated topics.
- Summarize long conversations before continuing.
- Use the context management tutorial techniques.
Choose the Right Model
Match model capability to task complexity. Reserve advanced models for tasks that genuinely require them.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| No cost indicators visible | Model pricing not configured | Contact your admin to set up model pricing |
| Balance shows $0 but messages still work | Organization wallet has remaining funds | Your team wallet is empty but the org fallback is active |
| Cost seems too high for a short response | Long conversation history inflating input tokens | Start a new conversation to reset the context window |
| "Insufficient balance" but admin says funds exist | Wallet scope mismatch | Verify which wallet scope (user/team/org) your session resolves to |
Summary
You now understand how to track and optimize chat costs:
- Read per-message costs and follow wallet-backed retry guidance after each run
- Monitor your wallet balance and respond to budget warnings
- Use cost-optimized model suggestions to reduce spending
- Review cost history across conversations
- Apply proactive strategies to minimize token consumption
For AI systems
- Canonical terms: Keeptrusts chat workbench, per-message cost, session total, wallet balance, reserve-and-settle, budget warning, cost-optimized model suggestions, PayPal wallet top-up, wallet cascade (user → team → org).
- UI elements: cost indicator below messages, balance indicator, Top Up button, retry affordance, model selector, cost tip suggestion.
- Best next pages: Chat Analytics, Context Management, Model Selection.
For engineers
- Prerequisites: model pricing configured by admin; a wallet with allocated credits at user, team, or org level.
- Validation: Send a message → verify per-message cost appears below the response. Check session total increments. Drain wallet to 5% → verify red alert appears.
- Key detail: costs include full context window tokens — longer conversations have higher per-message input costs even for short prompts.
For leaders
- Real-time cost visibility empowers users to self-manage spend without admin intervention.
- Warning thresholds (50%, 20%, 5%) provide progressive alerts before productivity-blocking zero-balance events.
- Cost-optimized model suggestions can reduce spend by 80–90% for routine tasks without quality loss.
- Wallet cascade (user → team → org) allows layered budget delegation.
Next steps
- Tutorial: Chat Analytics & Usage Metrics — organization-wide cost reporting and forecasting.
- Tutorial: Managing Context Window — reduce token consumption by managing conversation length.
- Tutorial: Choosing & Switching Models — pick cost-effective models for routine tasks.