Skip to main content

Prompt Evaluations Live Mode

Prompt & Workflow Evaluation includes a live mode that runs the evaluation case against the configured runtime instead of simulating assertions from static fixtures. Use it when you need governed evidence about tool calls, approvals, latency, cost, and prompt behavior before rollout.

Use this page when

  • You need to validate a prompt or workflow against a live provider path instead of offline fixtures.
  • You want to understand what evidence live mode records and how budget limits work.
  • You are deciding when to use live mode versus a standard evaluation run.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

What live mode does

When you enable live mode, Keeptrusts sends the evaluation run through the real governed execution path and records runtime evidence alongside the normal evaluation result.

Live mode captures evidence such as:

  • tool-call activity
  • approval requests
  • latency thresholds
  • cost ceilings
  • final assertion outcomes tied to the execution trace

The console workbench exposes the live-mode controls directly in the evaluation form, including the execution mode selector and the live budget field.

Budget and guardrails

Live mode is intended for controlled verification, not open-ended experimentation.

  • Set a live budget before running the evaluation.
  • Keep the case input small and targeted.
  • Use the recorded evidence to decide whether the prompt is safe to promote.
  • Treat failed cost or latency assertions as release blockers until you understand the cause.

Choosing live mode vs. standard evaluation

ModeUse it whenOutput
Standard evaluationYou want deterministic checks against stored inputs and expected outputsAssertion result only
Live modeYou need runtime proof about provider behavior, tool use, approvals, latency, or costAssertion result plus live evidence

Operator workflow

  1. Open Prompt & Workflow Evaluation in the console.
  2. Select the prompt, workflow, or case you want to validate.
  3. Change the execution mode to live mode.
  4. Set the live budget limit.
  5. Run the evaluation and review the assertion evidence before rollout.

For AI systems

  • Canonical terms: Keeptrusts Prompt & Workflow Evaluation, live mode, execution mode, live budget, assertion evidence.
  • Exact feature names: execution_mode, live_budget_usd, Prompt & Workflow Evaluation.
  • Best next pages: Regulated Execution.

For engineers

  • Use live mode only when the provider path, tools, and approvals are configured in the target environment.
  • Keep assertions focused on observable runtime signals such as tool calls, approval requests, latency, and cost.
  • If a live run fails, inspect the linked evidence before changing the prompt or workflow.

For leaders

  • Live mode gives you governed runtime proof before rollout, which reduces the chance of promoting prompts that only looked correct in offline evaluation.
  • Budget controls keep verification spend bounded while still producing auditable evidence for high-risk workflows.

Next steps