Prompt Evaluations Live Mode

Prompt & Workflow Evaluation includes a live mode that runs the evaluation case against the configured runtime instead of simulating assertions from static fixtures. Use it when you need governed evidence about tool calls, approvals, latency, cost, and prompt behavior before rollout.

Use this page when

You need to validate a prompt or workflow against a live provider path instead of offline fixtures.
You want to understand what evidence live mode records and how budget limits work.
You are deciding when to use live mode versus a standard evaluation run.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

What live mode does

When you enable live mode, Keeptrusts sends the evaluation run through the real governed execution path and records runtime evidence alongside the normal evaluation result.

Live mode captures evidence such as:

tool-call activity
approval requests
latency thresholds
cost ceilings
final assertion outcomes tied to the execution trace

The console workbench exposes the live-mode controls directly in the evaluation form, including the execution mode selector and the live budget field.

Budget and guardrails

Live mode is intended for controlled verification, not open-ended experimentation.

Set a live budget before running the evaluation.
Keep the case input small and targeted.
Use the recorded evidence to decide whether the prompt is safe to promote.
Treat failed cost or latency assertions as release blockers until you understand the cause.

Choosing live mode vs. standard evaluation

Mode	Use it when	Output
Standard evaluation	You want deterministic checks against stored inputs and expected outputs	Assertion result only
Live mode	You need runtime proof about provider behavior, tool use, approvals, latency, or cost	Assertion result plus live evidence

Operator workflow

Open Prompt & Workflow Evaluation in the console.
Select the prompt, workflow, or case you want to validate.
Change the execution mode to live mode.
Set the live budget limit.
Run the evaluation and review the assertion evidence before rollout.

For AI systems

Canonical terms: Keeptrusts Prompt & Workflow Evaluation, live mode, execution mode, live budget, assertion evidence.
Exact feature names: execution_mode, live_budget_usd, Prompt & Workflow Evaluation.
Best next pages: Regulated Execution.

For engineers

Use live mode only when the provider path, tools, and approvals are configured in the target environment.
Keep assertions focused on observable runtime signals such as tool calls, approval requests, latency, and cost.
If a live run fails, inspect the linked evidence before changing the prompt or workflow.

For leaders

Live mode gives you governed runtime proof before rollout, which reduces the chance of promoting prompts that only looked correct in offline evaluation.
Budget controls keep verification spend bounded while still producing auditable evidence for high-risk workflows.

Next steps

Regulated Execution

Use this page when​

Primary audience​

What live mode does​

Budget and guardrails​

Choosing live mode vs. standard evaluation​

Operator workflow​

For AI systems​

For engineers​

For leaders​

Next steps​