Tracking Chat Costs in Real-Time

This tutorial shows you how to monitor costs as you chat in the Keeptrusts workbench. You will learn where Keeptrusts shows cost after a request runs, how wallet-backed budget checks affect chat behavior, and how to respond when spend or balance signals need attention.

Use this page when

You need to understand the response-level cost indicators and wallet-backed budget signals in the chat workbench.
You want to monitor your wallet balance and respond to budget warnings before running out of credits.
You are looking for cost-optimization strategies like model switching and prompt conciseness.

Primary audience

Primary: Technical Engineers (individual cost awareness)
Secondary: Technical Leaders (budget oversight), All chat workbench users

Prerequisites

Authenticated access to the Keeptrusts chat workbench
Model pricing configured by your Keeptrusts administrator
A wallet with allocated credits (user, team, or organization level)
Familiarity with the first conversation tutorial

Step 1: Understand the Cost Display

The chat workbench displays cost information after a run has started or completed:

Level	Location	Shows
Per-message	Below each model response	Cost of that individual generation
Wallet state	Balance and retry/top-up affordances	Whether the effective wallet can keep funding future requests

All costs are displayed in your organization's configured currency (typically USD).

The current chat workbench does not show an inline pre-dispatch estimate in the composer. Cost awareness starts with the response-level usage record and any wallet or retry guidance surfaced after dispatch.

Step 2: Read Per-Message Cost Indicators

After each model response, a cost indicator appears below the message:

Response cost: $0.0032 (847 input tokens + 312 output tokens)

This indicator shows:

Total cost for the message based on the model's pricing.
Token breakdown — input tokens (your prompt + conversation history) and output tokens (model's response).
Model name — which model generated the response (visible on hover).

Per-message costs include the full context window sent to the model. As your conversation grows longer, input token counts increase for each subsequent message because the conversation history is included in every request.

Step 3: Check Your Wallet Balance

The wallet balance indicator shows your remaining credits. The Keeptrusts wallet system uses a cascading scope:

User wallet — checked first if you have a personal allocation.
Team wallet — used if no user wallet exists.
Organization wallet — fallback if no team wallet exists.

The balance displayed reflects the effective wallet for your current scope.

Reserve and Settle Flow

When you send a message:

The gateway reserves the estimated cost against your wallet.
The request is forwarded to the model provider.
On response, the reservation is settled to the actual cost.
The wallet balance updates to reflect the settled amount.

If the actual cost differs from the estimate (common with variable-length responses), the difference is credited back or debited automatically.

Step 4: Respond to Budget Warnings

The chat workbench displays warnings as your wallet balance decreases:

Warning Thresholds

Threshold	Indicator	Behavior
50% remaining	Yellow balance indicator	Informational — no action required
20% remaining	Orange balance indicator with warning icon	Consider switching to a cheaper model
5% remaining	Red balance indicator with alert	Urgent — top up or contact your admin
0% remaining	Message send disabled	Cannot send messages until balance is replenished

When Balance Reaches Zero

If your wallet balance reaches zero:

The chat input field is disabled with a message: "Insufficient balance."
A prompt to top up or contact your administrator appears.
Existing conversation history remains accessible for reading and export.
No further model invocations are possible until the balance is replenished.

If your organization has enabled PayPal wallet top-ups, you can add credits directly from the chat workbench without leaving the conversation. Click the Top Up button next to the balance indicator.

Step 5: Use Cost-Optimized Model Suggestions

When cost awareness is important, the chat workbench can suggest cheaper alternatives.

How Suggestions Appear

After a message exchange, if a less expensive model could have handled the request with comparable quality, a suggestion appears:

Cost tip: This response cost $0.0045 with GPT-4. A similar response
from GPT-3.5 Turbo would cost approximately $0.0004 (90% savings).
Switch model →

When to Consider Switching

Task Type	Recommended Model Tier	Reason
Simple Q&A	Standard (e.g., GPT-3.5)	Low complexity, high token savings
Code generation	Advanced (e.g., GPT-4)	Better accuracy, fewer iterations
Document summary	Standard	Extractive tasks work well on cheaper models
Complex reasoning	Advanced	Requires stronger reasoning capabilities

Switching Models Mid-Conversation

Click the model selector in the toolbar.
Select the suggested model.
Continue the conversation — the context history is preserved.

Switching models mid-conversation does not affect previous messages. Only new messages use the selected model. Token pricing changes take effect immediately.

Step 6: Review Cost History

To see cost details for past conversations:

Open the conversation list (sidebar).
Each conversation displays its total cost next to the title.
Click a conversation to see per-message cost breakdowns.

For organization-wide cost analytics, visit the Analytics section in the Keeptrusts console. See the analytics tutorial for detailed reporting.

Step 7: Optimize Costs Proactively

Write Concise Prompts

Token costs scale with prompt length. Shorter, focused prompts reduce input token costs:

Remove unnecessary context from prompts.
Use system prompts for persistent instructions instead of repeating them.
Reference previous messages instead of re-stating information.

Manage Conversation Length

As conversations grow, each new message includes the full history in the context window:

Start new conversations for unrelated topics.
Summarize long conversations before continuing.
Use the context management tutorial techniques.

Choose the Right Model

Match model capability to task complexity. Reserve advanced models for tasks that genuinely require them.

Troubleshooting

Issue	Cause	Fix
No cost indicators visible	Model pricing not configured	Contact your admin to set up model pricing
Balance shows $0 but messages still work	Organization wallet has remaining funds	Your team wallet is empty but the org fallback is active
Cost seems too high for a short response	Long conversation history inflating input tokens	Start a new conversation to reset the context window
"Insufficient balance" but admin says funds exist	Wallet scope mismatch	Verify which wallet scope (user/team/org) your session resolves to

Summary

You now understand how to track and optimize chat costs:

Read per-message costs and follow wallet-backed retry guidance after each run
Monitor your wallet balance and respond to budget warnings
Use cost-optimized model suggestions to reduce spending
Review cost history across conversations
Apply proactive strategies to minimize token consumption

For AI systems

Canonical terms: Keeptrusts chat workbench, per-message cost, session total, wallet balance, reserve-and-settle, budget warning, cost-optimized model suggestions, PayPal wallet top-up, wallet cascade (user → team → org).
UI elements: cost indicator below messages, balance indicator, Top Up button, retry affordance, model selector, cost tip suggestion.
Best next pages: Chat Analytics, Context Management, Model Selection.

For engineers

Prerequisites: model pricing configured by admin; a wallet with allocated credits at user, team, or org level.
Validation: Send a message → verify per-message cost appears below the response. Check session total increments. Drain wallet to 5% → verify red alert appears.
Key detail: costs include full context window tokens — longer conversations have higher per-message input costs even for short prompts.

For leaders

Real-time cost visibility empowers users to self-manage spend without admin intervention.
Warning thresholds (50%, 20%, 5%) provide progressive alerts before productivity-blocking zero-balance events.
Cost-optimized model suggestions can reduce spend by 80–90% for routine tasks without quality loss.
Wallet cascade (user → team → org) allows layered budget delegation.

Next steps

Tutorial: Chat Analytics & Usage Metrics — organization-wide cost reporting and forecasting.
Tutorial: Managing Context Window — reduce token consumption by managing conversation length.
Tutorial: Choosing & Switching Models — pick cost-effective models for routine tasks.

Use this page when​

Primary audience​

Prerequisites​

Step 1: Understand the Cost Display​

Step 2: Read Per-Message Cost Indicators​

Step 3: Check Your Wallet Balance​

Reserve and Settle Flow​

Step 4: Respond to Budget Warnings​

Warning Thresholds​

When Balance Reaches Zero​

Step 5: Use Cost-Optimized Model Suggestions​

How Suggestions Appear​

When to Consider Switching​

Switching Models Mid-Conversation​

Step 6: Review Cost History​

Step 7: Optimize Costs Proactively​

Write Concise Prompts​

Manage Conversation Length​

Choose the Right Model​

Troubleshooting​

Summary​

For AI systems​

For engineers​

For leaders​

Next steps​