Skip to main content

Agent Firewall: Complete Guide to Governing Autonomous AI Agent Actions

Agent Firewall: Complete Guide to Governing Autonomous AI Agent Actions

Autonomous agents become operationally serious the moment they can do more than answer questions. If they can search, query, export, modify, or transact, then governance has to move from content moderation to action control. In Keeptrusts, the main action-control surface is Agent Firewall. It is not a marketing label. It is a tool-phase policy with specific behavior: exact-match allow and deny checks, action caps, transaction thresholds, role-aware rules, and kill switches that can block or escalate unsafe behavior before the action continues.

Use this page when

  • You are designing or reviewing autonomous agent actions that touch data, tools, or money.
  • You need a precise understanding of what the current agent-firewall evaluator does.
  • You want a complete baseline for governing tool actions without inventing platform behavior.

Primary audience

  • Primary: Platform engineers, agent-system owners, and security engineers
  • Secondary: Technical Leaders overseeing autonomous workflow risk

The problem

Autonomous AI failures are often described as “the agent did something surprising.” That description is too vague to engineer against. The real questions are narrower. Which exact tool name was invoked? How many actions happened in one request? Was the action allowed for the current role? Did the request contain suspicious patterns? Did a transaction threshold require approval? Those are the questions a real action-control layer must answer.

Without an action firewall, teams tend to rely on either prompt defenses or broad access control. Both help, but neither is enough. Prompt Injection Detection can stop a hostile request. RBAC can constrain identity and role context. But once a valid request reaches the tool phase, you still need a control dedicated to governing the actions themselves.

The solution

Agent Firewall is that control. It evaluates extracted tool actions in the tool phase and can return allow, block, or escalate. The current evaluator supports exact-match allowed_tools and blocked_tools, action caps for the current evaluation window, a session cap proxy, per-action rate limits, transaction thresholds, role-aware allow and deny lists, suspicious-pattern blocking, and optional PII checks in action content.

Two implementation details matter for safe design. First, matching is exact string matching, not wildcard or glob matching. If your tool emits database_query_v2 and your allowlist contains database_query, that is not the same action. Second, escalation is narrow: the policy returns escalate when a detected amount meets or exceeds transaction_limits.require_approval_above. That means human review for financial or high-risk actions should be designed intentionally around those thresholds, not assumed to happen for any surprising action.

The strongest pattern is to place Prompt Injection Detection, Tool Validation, and Tool Security ahead of the firewall. That way the firewall is not asked to compensate for hostile prompts, undeclared tools, or obviously dangerous serialized requests.

Implementation

This example shows a realistic baseline for an autonomous agent that can read data and export reports, but not delete systems or run a shell. It also escalates high-value actions.

pack:
name: autonomous-agent-firewall
version: 1.0.0
enabled: true

policies:
chain:
- prompt-injection
- tool-validation
- tool-security
- agent-firewall
- audit-logger

policy:
prompt-injection:
use_embedding: true
detection:
embedding_threshold: 0.8
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true

tool-validation:
declared_tools:
- read_database
- export_csv
- knowledge_lookup
allow_undeclared: false

tool-security:
analysis_mode: local
blocked_patterns:
- rm -rf
- drop table
- file://
blocked_entity_types:
- jwt
- private_key

agent-firewall:
allowed_tools:
- read_database
- export_csv
- knowledge_lookup
blocked_tools:
- delete_database
- shell_command
max_actions_per_window: 3
max_actions_per_session: 10
rate_limits:
export_csv: 1
transaction_limits:
max_single_transaction: 5000.0
max_daily_total: 20000.0
require_approval_above: 1000.0
tools:
roles:
analyst:
allowed:
- read_database
- export_csv
denied:
- delete_database
kill_switches:
halt_on_suspicious_pattern: true
halt_on_pii_in_action: true

audit-logger: {}

Validate the policy and then review the decision stream around agent activity:

kt policy lint --file autonomous-agent-firewall.yaml
kt events tail --since 15m --json

When require_approval_above is crossed, the correct operational follow-through is not guesswork. Reviewers can work the item through How To: Resolve an Escalation and the broader evidence workflow in Reviewing Alerts and Evidence.

Results and impact

An agent firewall changes agent governance from implicit trust to explicit permissioning. The agent can still be useful, but it has to operate inside a defined action envelope. That is the difference between an assistant and an unsupervised executor.

It also improves engineering discipline. Because the evaluator is precise and exact-match, teams are forced to decide what the agent may do in concrete terms. That usually reveals risky assumptions early, especially around shell access, destructive tools, and export volume.

Key takeaways

  • Agent Firewall is the tool-phase control for governing autonomous actions.
  • Matching is exact string matching; treat tool names as security-relevant identifiers.
  • Escalation is driven by transaction_limits.require_approval_above, so high-value approvals should be configured deliberately.
  • Pair the firewall with Tool Validation, Tool Security, and Prompt Injection Detection.
  • Use escalation and evidence workflows so risky actions are not only blocked or escalated, but also reviewable later.

Next steps