Tasks & Orchestration
Tasks & Orchestration is the Keeptrusts feature set for turning one-off agent work into durable, reusable automation. It gives teams task definitions, task runs, live execution visibility, collaboration, and run comparison so engineers can operate governed workflows without losing chat context or auditability.
Use this page when
- You need to define reusable automations with explicit steps, permissions, and runtime configuration.
- You want to follow task progress in real time from the console or CLI.
- You are composing sub-tasks, branches, or parallel task flows instead of a single linear run.
- You need team collaboration, handoff, and regression-aware run history around the same task.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Main content
Task Definitions and Runs
A task definition is the reusable automation template. It packages the task prompt, step sequence, permissions, bindings, and configuration needed to run the workflow again.
A task run is a single execution of that definition. Each run stores durable state for status, current step, step results, reports, approvals, and event history so operators can see exactly what happened at each stage.
For platform-target execution, the API-native worker_task binary claims and advances runs inside the control plane. That means Keeptrusts can execute task steps without requiring a separately deployed gateway just to orchestrate the run lifecycle.
Hosted Gateway Relay Dispatch
Hosted gateways can also receive task work over the API relay when the gateway has an active relay tunnel.
- The API queues a relay dispatch lease for the selected hosted gateway.
- The gateway receives a
task_dispatchframe over the internal machine-authenticated websocket channel. - The gateway sends
task_dispatch_ackwhen it accepts the dispatch. - The task run continues reporting progress through the normal task-run APIs and task timeline surfaces after the dispatch is accepted.
This delivery model is at-least-once with deduplication, not exactly-once. Keeptrusts treats duplicate relay dispatch as a normal recovery condition:
- relay dispatches carry a stable
dispatch_id - the gateway re-acknowledges repeated
dispatch_idvalues - duplicate deliveries are not supposed to cause duplicate execution while the gateway still has that dispatch in its dedup window
If relay push dispatch is not active for the target gateway, the existing gateway polling path remains the compatibility lane for hosted task execution.
Relay Policy and Fallback
Relay is the default transport for console chat, model lookup, and hosted gateway task dispatch when the gateway is running in connected mode.
Use declarative config to change that behavior per gateway:
hosted_gateway:
relay:
enabled: false
legacy_fallback_allowed: false
This configuration disables relay entirely for that gateway. In current rollout modes, relay-dependent console actions fail closed instead of quietly using the gateway's advertised runtime URL.
If you need a temporary compatibility lane for direct console transport, make that fallback explicit:
hosted_gateway:
relay:
enabled: false
legacy_fallback_allowed: true
Use legacy fallback sparingly:
- In
relay_preferred, compatible console chat and model lookup paths may use the direct runtime URL when the gateway is policy-disabled for relay. - In
relay_required, relay-disabled and relay-unavailable gateways stay out of relay-dependent console paths even if a runtime URL exists. - Hosted task polling remains the rollback lane when relay push dispatch is not
active or when
KEEPTRUSTS_GATEWAY_TASK_POLL_FALLBACK=trueis set on the gateway runtime.
Streaming Task Progress (SSE)
Task run progress can stream over Server-Sent Events (SSE) so operators do not need to refresh the page to see step-by-step updates.
Common lifecycle events include:
| Event | Meaning |
|---|---|
run.started | The run was claimed and started. |
run.tool_requested | A tool, connector, or delegated action was requested. |
run.tool_completed | A requested tool or external action finished. |
run.awaiting_approval | The run paused for a human approval decision. |
run.step_completed | A step finished and the run advanced. |
run.completed | The run reached a successful terminal state. |
run.failed | The run stopped because of an execution failure. |
run.cancelled | The run was cancelled directly or through a parent cascade. |
The console connects through the BFF proxy stream route at /api/tasks/[id]/runs/[rid]/stream, which forwards the upstream task-event stream to the browser with no direct browser token handling.
Key operator behaviors:
- The run view prefers live SSE when the stream is available.
- The console degrades gracefully to polling when streaming is unavailable or reconnect attempts are exhausted.
- Stream cursors support reconnect behavior, so the UI can continue after transient disconnects.
- The stream auto-terminates when the run reaches a terminal state.
Composable Sub-Tasks (DAG Execution)
Task steps support DAG-style composition instead of only sequential execution.
| Action type | What it does |
|---|---|
SubTask | Starts another task definition as a child run and resumes the parent when the child finishes. |
Conditional | Evaluates a condition and branches to the appropriate target step. |
Parallel | Starts multiple child branches concurrently and merges the results back into the parent run. |
This model lets teams compose reusable workflows without flattening everything into one large definition.
Important orchestration rules:
- Definition validation checks for sub-task cycles before the task is saved.
- Parent runs pause while child runs are in
awaiting_subtaskorawaiting_parallelstates. - Cancel requests cascade from parent runs to active child runs.
- The console renders a visual DAG so operators can inspect branches, joins, and waiting states.
Collaborative Task Threads
Each task has a collaborative thread so teams can coordinate around the run instead of moving notes into separate tools.
Task threads support:
- server-persisted notes attached to the task
@mentions that route notifications to team members- typing presence indicators so collaborators know who is responding
- explicit handoff between team members when ownership changes
Use threads for run triage, approval context, operator notes, and teammate handoff without losing the task timeline.
Chat-to-Task Context Preservation
Tasks created from chat can keep the conversation that produced them.
When a task is promoted from chat, Keeptrusts stores source context such as:
source_session_idsource_conversation_idsource_message_idconversation_context
That preserved context gives operators bidirectional navigation:
- chat to task, so a conversation can promote work into a durable task
- task to chat, so reviewers can jump back to the originating discussion
The stored source context can also be injected into execution-time reasoning so follow-up runs continue from the original intent instead of starting from an empty prompt.
Task Run Comparison and Regression Detection
Tasks & Orchestration includes run comparison and health evaluation so teams can tell whether automation quality is improving or drifting.
Operators can:
- compare two runs side by side across duration, tokens, cost, step count, and step-level deltas
- review historical metrics over configurable windows when analyzing trends
- receive regression alerts when latency, token use, cost, or consecutive failures cross thresholds
- track task health transitions from healthy → degraded → regressed
In the API data model, that health flow maps to healthy, warning, and regression states. Thresholds are configurable per task definition through regression settings, so each workflow can use tighter or looser guardrails depending on the job.
Task Worker Architecture
The API-native task worker is designed for safe claim, execution, and recovery behavior.
worker_taskclaims pending work withFOR UPDATE SKIP LOCKEDso multiple worker instances do not double-claim the same run.- The worker writes heartbeats during active execution and treats stale runs as crash-recovery candidates.
- The default heartbeat interval is 30 seconds, which supports worker crash recovery without requiring manual cleanup.
- Shutdown is graceful: on
SIGTERM, the worker stops claiming new runs and exits through the normal shutdown path after in-flight work finishes cleanly. - Docker Compose deploys the split-topology worker as
keeptrusts-worker-taskunder thesplitprofile. - Hosted gateways can receive relay push dispatch without the console needing direct network reachability to the gateway runtime.
- Relay push is preferred when the hosted gateway is connected; polling remains the compatibility and rollback lane.
If you want API-native orchestration in local development or a split deployment, start the worker with the split profile instead of relying on a gateway deployment for run coordination.
CLI Cross-References
Use the CLI when you want terminal-first creation, execution, or troubleshooting.
kt taskcovers task definition management, DAG composition, run execution, approvals, and streaming task status.kt chatcovers the terminal chat workflow that can originate tasks and preserve chat context.
Console Pages
The console exposes three main task surfaces:
| Surface | Purpose |
|---|---|
/tasks | Browse task definitions, dispatch runs, and enter the tasks workbench. |
/tasks/[id] | Review a single task definition with its collaborators, configuration, and run history. |
/tasks/runs/[runId] | Inspect one run with live streaming, DAG status, collaborative thread, and comparison signals. |
In the current console workbench, task detail is commonly deep-linked from /tasks (for example /tasks?task=<id>), and the run detail experience is exposed inside the task thread view.
For AI systems
- Canonical terms: task definition, task run,
worker_task, DAG execution,SubTask,Conditional,Parallel, collaborative task thread, source conversation, regression alert. - Exact feature, config, command, or page names:
/tasks,/api/tasks/[id]/runs/[rid]/stream,/v1/tasks/:id/runs/:rid/events/stream,kt task,kt chat. - Health terminology: user-facing health transitions are healthy, degraded, regressed; API states are
healthy,warning,regression. - Best next pages for deeper detail:
kt task,kt chat, Streaming & SSE, Regulated Execution.
For engineers
Prerequisites
- A Keeptrusts environment with task permissions (
tasks:readandtasks:write) for the relevant users or service identities. - A task definition with
config.stepsand a hosted gateway available for execution. - If you want hosted task dispatch over relay, the hosted gateway must also be running in connected mode with an active relay tunnel.
- The split worker deployment when you want the API-native orchestrator running separately.
- Console access for live run observation, or CLI access for terminal-first execution.
Quick start
docker compose --profile split up -d
# inspect available task definitions
kt task list-definitions
# run a task and follow live progress
kt task run <definition-id> --stream
# use terminal chat when you want to originate task work from a conversation
kt chat --gateway http://localhost:41002
A minimal orchestration shape looks like this:
steps:
- index: 0
action: llm_call
label: Draft plan
parameters:
prompt: Summarize the request and propose the next action.
- index: 1
action: conditional
label: Needs follow-up?
parameters:
condition: needs_follow_up == true
then_step_index: 2
else_step_index: 3
- index: 2
action: sub_task
label: Open follow-up task
parameters:
referenced_task_id: <task-definition-id>
- index: 3
action: parallel
label: Fan out verification
parameters:
branches:
- referenced_task_id: <qa-task-id>
- steps:
- index: 0
action: llm_call
label: Inline security review
parameters:
prompt: Review the findings for security issues.
Validation
- In the console, open Tasks, dispatch a run, and confirm the run timeline updates live without a manual refresh.
- If SSE is unavailable, confirm the run still advances through the polling fallback instead of appearing stalled.
- Use
kt task run <definition-id> --streamto verify live CLI streaming for the same definition. - For hosted gateways, confirm a relay-connected gateway accepts the run and that duplicate relay delivery does not create duplicate execution.
- Compare two recent runs to confirm deltas, regression alerts, and health status changes look correct for the task's thresholds.
For leaders
Tasks & Orchestration turns repeated operator work into durable automation with clearer ownership, stronger audit trails, and better failure visibility.
- Operational consistency: task definitions reduce one-off manual execution and make approved workflows repeatable.
- Governance: approval pauses, task threads, and chat-to-task provenance preserve why work happened, not just that it happened.
- Reliability: crash recovery, heartbeat monitoring, and cancel cascades reduce the chance of orphaned or duplicated work.
- Continuous improvement: run comparison and regression alerts make it easier to catch performance drift before a workflow becomes noisy, slow, or expensive.
- Rollout model: the split worker profile lets teams scale orchestration separately from the rest of the control plane when task volume increases.
Next steps
kt task— create, compose, run, and approve tasks from the CLIkt chat— start task work from governed terminal chat sessions- Streaming & SSE — understand Keeptrusts streaming behavior in more detail
- Regulated Execution — pair orchestration with approval and compliance controls
- Prompt Evaluations Live Mode — compare governed runtime evidence before promotion