Agent Firewall: Governing Tool Access for AI Agents
Keeptrusts governs tool access for AI agents through agent-firewall, a tool-phase policy that evaluates extracted tool actions before they are executed. It can allow or deny exact tool names, cap actions per request or session, apply role-specific permissions, and escalate risky transactions instead of letting autonomous tool use run unchecked.
Use this page when
- You are deploying agents that call tools and need a clear approval boundary for what those agents may do.
- You want a concrete explanation of the current
agent-firewallbehavior rather than a vague “agent governance” promise. - You need to pair tool access control with prompt-injection, RBAC, or audit logging.
Primary audience
- Primary: Technical Engineers
- Secondary: Technical Leaders, agent platform owners
The problem
The risk in agent systems is rarely the prompt alone. It is the action surface behind the prompt. An agent that can search documentation is one thing. An agent that can execute shell commands, export records, or initiate transfers is another. Once tool calls are available, the security boundary shifts from “what can the model say?” to “what can the system do because the model said it?”
Many teams try to control this inside the agent framework with ad hoc allowlists. That helps, but it leaves governance fragmented across SDKs, prompts, and application repositories. It also makes review harder because the operative policy is hidden in code instead of declared at the gateway boundary.
There is also a reliability problem. Unsafe behavior is not always malicious. Agents loop. They repeat the same action too many times. They send arguments containing sensitive data. They invoke the right tool under the wrong role. Without a control point at the request boundary, those patterns are hard to stop consistently.
The solution
agent-firewall is the gateway control for that surface.
The current documented evaluator is explicit about what it does today. It performs exact string matches on action names, supports allow and deny lists, limits action counts, can apply role-aware allow and deny rules, checks simple transaction thresholds, and offers kill-switch behavior for suspicious patterns or PII in tool-call arguments.
That directness matters. You do not need to guess what the policy means.
allowed_toolsdefines the permitted action names.blocked_toolsdefines action names that are always denied.max_actions_per_windowlimits extracted actions in the current evaluation.max_actions_per_sessionapplies a session cap using the current evaluator's session proxy.tools.roleslets you express role-specific allow and deny rules.transaction_limits.require_approval_abovecan returnescalatefor higher-value actions.kill_switchescan hard-stop the flow when suspicious patterns or PII are detected.
The important nuance is that the current evaluator uses exact matches and current-request safety caps. That is helpful because it keeps the policy predictable. If your agent emits export_csv, you should allow export_csv, not a guessed wildcard.
Implementation
Start with a minimal chain that makes tool access explicit and keeps the evidence path visible:
pack:
name: agent-firewall-example
version: 1.0.0
enabled: true
policies:
chain:
- rbac
- agent-firewall
- prompt-injection
- audit-logger
policy:
rbac:
require_auth: true
deny_if_missing:
- role
- team
agent-firewall:
allowed_tools:
- read_database
- export_csv
blocked_tools:
- delete_database
max_actions_per_window: 3
max_actions_per_session: 10
rate_limits:
export_csv: 1
transaction_limits:
max_single_transaction: 5000.0
max_daily_total: 20000.0
require_approval_above: 1000.0
tools:
roles:
analyst:
allowed:
- read_database
- export_csv
denied:
- delete_database
kill_switches:
halt_on_suspicious_pattern: true
halt_on_pii_in_action: true
prompt-injection:
response:
action: block
audit-logger: {}
Validate before rollout:
kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --listen 0.0.0.0:41002
This configuration does a few useful things immediately.
It requires identity through rbac, so unattributed agent calls are blocked early.
It gives the analyst role an explicit tool set instead of relying on agent prompt discipline.
It prevents obviously unsafe actions such as delete_database even if the model attempts them.
It limits repetitive behavior with action caps.
And it turns high-value actions into review events instead of automatic execution when the approval threshold is crossed.
In a real rollout, keep the first tool set small. Teams usually want to start by allowing read-only operations such as search, retrieval, or export, then widen access only when the agent behavior is understood. The gateway is a good place to enforce that because the same rules can apply across different agent frameworks.
Also pair this with adjacent controls rather than treating agent-firewall as the whole story. Prompt Injection Detection helps catch hostile instructions before they shape tool behavior. RBAC ensures the request has meaningful identity. Tool Validation and Tool Security help you reason about the boundary from the tool side, not only the model side.
If cost is a concern, use team wallets alongside the firewall. The gateway can enforce who may call which tool, while wallet controls enforce how much the overall agent workload may spend.
Results and impact
The first impact is a smaller blast radius. Agents stop being “anything the framework can call” and start becoming “only the actions the platform explicitly permits.” That is a meaningful operational difference.
The second impact is faster review. Security, platform, and application teams can review a single declarative tool-access contract instead of tracing tool permissions through prompts and code.
The third impact is better incident handling. When a tool call is blocked or escalated, the reason is part of the governed request path rather than an unstructured application log. That makes it easier to explain what happened and decide whether the rule or the agent needs adjustment.
There is also a delivery benefit. Teams can move faster with agents when the dangerous edge is controlled centrally. The presence of a hard boundary often reduces pressure to over-constrain prompts because the runtime has a real enforcement layer behind them.
Key takeaways
agent-firewallgoverns tool actions, not just model text.- Use exact action names in
allowed_toolsandblocked_tools; the current evaluator does exact matching. - Pair tool access rules with
rbac, prompt-injection defenses, and audit logging. - Treat action caps and approval thresholds as practical runtime controls, not just documentation.
- Start with a narrow read-only tool set and widen only when the behavior is understood.