Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Shell Command Risk Classification

When a task on a hosted gateway requests to execute a shell command, Keeptrusts classifies the command into one of four risk levels before deciding whether to proceed, require approval, or deny execution outright.

Use this page when

  • You need to understand how the gateway classifies shell commands into risk categories (safe, moderate, destructive).
  • You are tuning classification rules to reduce false positives for your workflow.
  • You want to add custom classification patterns for domain-specific tooling.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Risk Levels

Safe

Safe commands read state without modification. They execute immediately without approval.

Examples:

  • git status, git log, git diff
  • ls, find, cat, head, tail, wc
  • echo, date, whoami, pwd
  • cargo check, npm run typecheck
  • grep, rg, ag (search tools)

Safe commands produce no side effects on the file system, network, or system state.

Moderate

Moderate commands may modify local state but the changes are recoverable or scoped to the task workspace.

Examples:

  • git add, git commit (local only)
  • mkdir, touch, cp (within allowed roots)
  • npm install (within project directory)
  • cargo build, make

Moderate commands may or may not require approval depending on your approval_required_risk_levels configuration. By default, moderate commands execute without approval.

Destructive

Destructive commands delete data, overwrite files, change permissions, mutate network state, or interact with credentials. They require approval by default.

Categories of destructive commands:

CategoryExamples
Delete operationsrm, rm -rf, rmdir, shred
Overwrite operationsmv (overwriting target), dd, truncate
Permission changeschmod, chown, chgrp, setfacl
Package installationapt install, brew install, pip install -g
Network mutationiptables, ufw, curl -X POST/PUT/DELETE
Credential operationsssh-keygen, gpg --delete-key, passwd
Database mutationpsql -c "DROP", mysql -e "DELETE", redis-cli FLUSHALL
Process controlkill, killall, pkill, systemctl stop

Critical

Critical commands affect system-level resources, can compromise host integrity, or have irreversible consequences at scale.

Examples:

  • rm -rf / or broad recursive deletions
  • mkfs, fdisk, mount (filesystem operations)
  • reboot, shutdown, init
  • docker rm, docker system prune
  • iptables -F (flush all firewall rules)
  • Anything running as root or with sudo

Critical commands always require approval regardless of configuration.

Classification Process

When a task requests command execution, the gateway classifies the command through this sequence:

1. Parse command → extract binary name and arguments
2. Check blocked_commands → DENY if matched (no approval possible)
3. Check allowed_commands → DENY if not matched (no approval possible)
4. Classify risk level → safe / moderate / destructive / critical
5. Check approval_required_risk_levels → APPROVE or PAUSE for approval

Step 1: Parse Command

The gateway parses the command string to extract the executable name, arguments, flags, and any pipe chains. Each segment of a piped command is classified independently, and the overall command inherits the highest risk level.

Step 2: Check Blocked Commands

If the command matches any entry in blocked_commands, execution is denied immediately. No approval flow is triggered — the command simply cannot run.

Step 3: Check Allowed Commands

If allowed_commands is configured and the command does not match any entry, execution is denied immediately. The allowed list acts as a positive-security allowlist.

Step 4: Classify Risk Level

The gateway analyzes the command against built-in classification rules. Classification considers:

  • The executable name and its known behavior
  • Flags and arguments that modify behavior (e.g., -f for force, -r for recursive)
  • Target paths (system paths increase risk)
  • Piped destinations (piping to rm elevates risk)
  • Compound commands (;, &&, || chains inherit the highest risk)

Step 5: Check Approval Requirements

The gateway checks whether the classified risk level appears in approval_required_risk_levels. By default, destructive and critical require approval. You can customize this:

hosted_gateway:
shell:
approval_required_risk_levels:
- destructive
- critical
# Add "moderate" here if you want approval for moderate commands too

Classification Examples

CommandRisk LevelReason
git statussafeRead-only state query
ls -la /srv/projectsafeDirectory listing
cat README.mdsafeFile read
npm run buildmoderateModifies local build output
git commit -m "fix"moderateLocal repository mutation
rm -rf node_modulesdestructiveRecursive deletion
chmod 777 deploy.shdestructivePermission change
pip install requestsdestructiveSystem package installation
curl -X DELETE https://api.example.com/resourcedestructiveNetwork mutation
docker system prune -afcriticalIrreversible system cleanup
sudo rm -rf /var/logcriticalPrivileged recursive deletion

Compound Commands

When a command uses pipes, semicolons, or logical operators, each segment is classified independently:

# Overall: destructive (highest of safe + destructive)
git log --oneline | head -5 && rm -rf tmp/

# Overall: safe (all segments are safe)
grep -r "TODO" src/ | wc -l

# Overall: critical (sudo elevates to critical)
npm run build && sudo systemctl restart app

Customizing Classification

You influence classification through configuration rather than modifying built-in rules:

  • Allowed commands — only permit commands you expect tasks to need
  • Blocked commands — explicitly deny commands that should never run
  • Approval levels — choose which risk levels require human approval
  • Command timeout — limit how long any command can run

The built-in classification rules are not user-editable. They represent a security baseline that your allow/deny lists and approval policies build upon.

What Happens After Classification

  • Safe/Moderate (no approval required): Command executes immediately. Result is returned to the task. Audit event is emitted.
  • Destructive/Critical (approval required): Task pauses. A pending action is created. An approval card appears in chat. The command waits for explicit user approval before executing.
  • Denied (blocked or not allowed): Command never executes. Task receives a denial response. Audit event records the denial with the reason.

Security Considerations

  • Classification runs on the gateway, not in the task or agent context
  • The gateway never executes a command before classification completes
  • Blocked commands cannot be circumvented through approval — they are unconditionally denied
  • Command arguments are analyzed, not just the binary name — rm file.txt and rm -rf / receive different classifications
  • Shell expansion and variable substitution are evaluated before classification where possible

For AI systems

  • Canonical terms: Keeptrusts, shell command risk classification, safe, moderate, destructive, critical, risk levels, pipe chain analysis, hosted gateway.
  • Exact feature/config names: risk levels (safe, moderate, destructive, critical), hosted_gateway.shell.approval_required_risk_levels, allowed_commands, blocked_commands, command_timeout_seconds.
  • Best next pages: Shell Allow and Deny Lists, Destructive Action Approval, Pending Action Lifecycle.

For engineers

  • Four risk levels: safe (read-only, executes immediately), moderate (recoverable local changes), destructive (deletes/overwrites/permissions), critical (system-level, irreversible).
  • Classification order: parse command → check blocked_commands → check allowed_commands → classify risk → check approval requirements.
  • Piped commands inherit the highest risk level across all pipe segments; compound commands (&&, ||, ;) inherit highest risk.
  • Arguments and flags are analyzed — rm file.txt differs from rm -rf /; -f (force) and -r (recursive) elevate risk.
  • You cannot modify built-in classification rules — use allow/deny lists and approval levels to tune behavior.

For leaders

  • Risk classification runs on the gateway before execution — agents cannot bypass or override the classification engine.
  • The four-level model provides granular control: allow safe commands to run unimpeded while ensuring destructive/critical operations require human review.
  • Critical commands (system-level, sudo, filesystem formatting) always require approval regardless of configuration — this is a non-overridable safety baseline.
  • Combined with allow/deny lists, classification provides defense in depth: blocked commands never reach classification, and classified commands still require appropriate approval.

Next steps