Shell Command Risk Classification
When a task on a hosted gateway requests to execute a shell command, Keeptrusts classifies the command into one of four risk levels before deciding whether to proceed, require approval, or deny execution outright.
Use this page when
- You need to understand how the gateway classifies shell commands into risk categories (safe, moderate, destructive).
- You are tuning classification rules to reduce false positives for your workflow.
- You want to add custom classification patterns for domain-specific tooling.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Risk Levels
Safe
Safe commands read state without modification. They execute immediately without approval.
Examples:
git status,git log,git diffls,find,cat,head,tail,wcecho,date,whoami,pwdcargo check,npm run typecheckgrep,rg,ag(search tools)
Safe commands produce no side effects on the file system, network, or system state.
Moderate
Moderate commands may modify local state but the changes are recoverable or scoped to the task workspace.
Examples:
git add,git commit(local only)mkdir,touch,cp(within allowed roots)npm install(within project directory)cargo build,make
Moderate commands may or may not require approval depending on your approval_required_risk_levels configuration. By default, moderate commands execute without approval.
Destructive
Destructive commands delete data, overwrite files, change permissions, mutate network state, or interact with credentials. They require approval by default.
Categories of destructive commands:
| Category | Examples |
|---|---|
| Delete operations | rm, rm -rf, rmdir, shred |
| Overwrite operations | mv (overwriting target), dd, truncate |
| Permission changes | chmod, chown, chgrp, setfacl |
| Package installation | apt install, brew install, pip install -g |
| Network mutation | iptables, ufw, curl -X POST/PUT/DELETE |
| Credential operations | ssh-keygen, gpg --delete-key, passwd |
| Database mutation | psql -c "DROP", mysql -e "DELETE", redis-cli FLUSHALL |
| Process control | kill, killall, pkill, systemctl stop |
Critical
Critical commands affect system-level resources, can compromise host integrity, or have irreversible consequences at scale.
Examples:
rm -rf /or broad recursive deletionsmkfs,fdisk,mount(filesystem operations)reboot,shutdown,initdocker rm,docker system pruneiptables -F(flush all firewall rules)- Anything running as root or with
sudo
Critical commands always require approval regardless of configuration.
Classification Process
When a task requests command execution, the gateway classifies the command through this sequence:
1. Parse command → extract binary name and arguments
2. Check blocked_commands → DENY if matched (no approval possible)
3. Check allowed_commands → DENY if not matched (no approval possible)
4. Classify risk level → safe / moderate / destructive / critical
5. Check approval_required_risk_levels → APPROVE or PAUSE for approval
Step 1: Parse Command
The gateway parses the command string to extract the executable name, arguments, flags, and any pipe chains. Each segment of a piped command is classified independently, and the overall command inherits the highest risk level.
Step 2: Check Blocked Commands
If the command matches any entry in blocked_commands, execution is denied immediately. No approval flow is triggered — the command simply cannot run.
Step 3: Check Allowed Commands
If allowed_commands is configured and the command does not match any entry, execution is denied immediately. The allowed list acts as a positive-security allowlist.
Step 4: Classify Risk Level
The gateway analyzes the command against built-in classification rules. Classification considers:
- The executable name and its known behavior
- Flags and arguments that modify behavior (e.g.,
-ffor force,-rfor recursive) - Target paths (system paths increase risk)
- Piped destinations (piping to
rmelevates risk) - Compound commands (
;,&&,||chains inherit the highest risk)
Step 5: Check Approval Requirements
The gateway checks whether the classified risk level appears in approval_required_risk_levels. By default, destructive and critical require approval. You can customize this:
hosted_gateway:
shell:
approval_required_risk_levels:
- destructive
- critical
# Add "moderate" here if you want approval for moderate commands too
Classification Examples
| Command | Risk Level | Reason |
|---|---|---|
git status | safe | Read-only state query |
ls -la /srv/project | safe | Directory listing |
cat README.md | safe | File read |
npm run build | moderate | Modifies local build output |
git commit -m "fix" | moderate | Local repository mutation |
rm -rf node_modules | destructive | Recursive deletion |
chmod 777 deploy.sh | destructive | Permission change |
pip install requests | destructive | System package installation |
curl -X DELETE https://api.example.com/resource | destructive | Network mutation |
docker system prune -af | critical | Irreversible system cleanup |
sudo rm -rf /var/log | critical | Privileged recursive deletion |
Compound Commands
When a command uses pipes, semicolons, or logical operators, each segment is classified independently:
# Overall: destructive (highest of safe + destructive)
git log --oneline | head -5 && rm -rf tmp/
# Overall: safe (all segments are safe)
grep -r "TODO" src/ | wc -l
# Overall: critical (sudo elevates to critical)
npm run build && sudo systemctl restart app
Customizing Classification
You influence classification through configuration rather than modifying built-in rules:
- Allowed commands — only permit commands you expect tasks to need
- Blocked commands — explicitly deny commands that should never run
- Approval levels — choose which risk levels require human approval
- Command timeout — limit how long any command can run
The built-in classification rules are not user-editable. They represent a security baseline that your allow/deny lists and approval policies build upon.
What Happens After Classification
- Safe/Moderate (no approval required): Command executes immediately. Result is returned to the task. Audit event is emitted.
- Destructive/Critical (approval required): Task pauses. A pending action is created. An approval card appears in chat. The command waits for explicit user approval before executing.
- Denied (blocked or not allowed): Command never executes. Task receives a denial response. Audit event records the denial with the reason.
Security Considerations
- Classification runs on the gateway, not in the task or agent context
- The gateway never executes a command before classification completes
- Blocked commands cannot be circumvented through approval — they are unconditionally denied
- Command arguments are analyzed, not just the binary name —
rm file.txtandrm -rf /receive different classifications - Shell expansion and variable substitution are evaluated before classification where possible
For AI systems
- Canonical terms: Keeptrusts, shell command risk classification, safe, moderate, destructive, critical, risk levels, pipe chain analysis, hosted gateway.
- Exact feature/config names: risk levels (
safe,moderate,destructive,critical),hosted_gateway.shell.approval_required_risk_levels,allowed_commands,blocked_commands,command_timeout_seconds. - Best next pages: Shell Allow and Deny Lists, Destructive Action Approval, Pending Action Lifecycle.
For engineers
- Four risk levels: safe (read-only, executes immediately), moderate (recoverable local changes), destructive (deletes/overwrites/permissions), critical (system-level, irreversible).
- Classification order: parse command → check blocked_commands → check allowed_commands → classify risk → check approval requirements.
- Piped commands inherit the highest risk level across all pipe segments; compound commands (
&&,||,;) inherit highest risk. - Arguments and flags are analyzed —
rm file.txtdiffers fromrm -rf /;-f(force) and-r(recursive) elevate risk. - You cannot modify built-in classification rules — use allow/deny lists and approval levels to tune behavior.
For leaders
- Risk classification runs on the gateway before execution — agents cannot bypass or override the classification engine.
- The four-level model provides granular control: allow safe commands to run unimpeded while ensuring destructive/critical operations require human review.
- Critical commands (system-level, sudo, filesystem formatting) always require approval regardless of configuration — this is a non-overridable safety baseline.
- Combined with allow/deny lists, classification provides defense in depth: blocked commands never reach classification, and classified commands still require appropriate approval.
Next steps
- Shell Allow and Deny Lists — pre-filter commands before classification
- Destructive Action Approval — the approval flow for classified destructive/critical commands
- Pending Action Lifecycle — what happens after a command pauses for approval