Agent Firewall: Governing Tool Access for AI Agents

Keeptrusts governs tool access for AI agents through agent-firewall, a tool-phase policy that evaluates extracted tool actions before they are executed. It can allow or deny exact tool names, cap actions per request or session, apply role-specific permissions, and escalate risky transactions instead of letting autonomous tool use run unchecked.

Use this page when

You are deploying agents that call tools and need a clear approval boundary for what those agents may do.
You want a concrete explanation of the current agent-firewall behavior rather than a vague “agent governance” promise.
You need to pair tool access control with prompt-injection, RBAC, or audit logging.

Primary audience

Primary: Technical Engineers
Secondary: Technical Leaders, agent platform owners

The problem

The risk in agent systems is rarely the prompt alone. It is the action surface behind the prompt. An agent that can search documentation is one thing. An agent that can execute shell commands, export records, or initiate transfers is another. Once tool calls are available, the security boundary shifts from “what can the model say?” to “what can the system do because the model said it?”

Many teams try to control this inside the agent framework with ad hoc allowlists. That helps, but it leaves governance fragmented across SDKs, prompts, and application repositories. It also makes review harder because the operative policy is hidden in code instead of declared at the gateway boundary.

There is also a reliability problem. Unsafe behavior is not always malicious. Agents loop. They repeat the same action too many times. They send arguments containing sensitive data. They invoke the right tool under the wrong role. Without a control point at the request boundary, those patterns are hard to stop consistently.

The solution

agent-firewall is the gateway control for that surface.

The current documented evaluator is explicit about what it does today. It performs exact string matches on action names, supports allow and deny lists, limits action counts, can apply role-aware allow and deny rules, checks simple transaction thresholds, and offers kill-switch behavior for suspicious patterns or PII in tool-call arguments.

That directness matters. You do not need to guess what the policy means.

allowed_tools defines the permitted action names.
blocked_tools defines action names that are always denied.
max_actions_per_window limits extracted actions in the current evaluation.
max_actions_per_session applies a session cap using the current evaluator's session proxy.
tools.roles lets you express role-specific allow and deny rules.
transaction_limits.require_approval_above can return escalate for higher-value actions.
kill_switches can hard-stop the flow when suspicious patterns or PII are detected.

The important nuance is that the current evaluator uses exact matches and current-request safety caps. That is helpful because it keeps the policy predictable. If your agent emits export_csv, you should allow export_csv, not a guessed wildcard.

Implementation

Start with a minimal chain that makes tool access explicit and keeps the evidence path visible:

pack:
  name: agent-firewall-example
  version: 1.0.0
  enabled: true

policies:
  chain:
    - rbac
    - agent-firewall
    - prompt-injection
    - audit-logger

policy:
  rbac:
    require_auth: true
    deny_if_missing:
      - role
      - team

  agent-firewall:
    allowed_tools:
      - read_database
      - export_csv
    blocked_tools:
      - delete_database
    max_actions_per_window: 3
    max_actions_per_session: 10
    rate_limits:
      export_csv: 1
    transaction_limits:
      max_single_transaction: 5000.0
      max_daily_total: 20000.0
      require_approval_above: 1000.0
    tools:
      roles:
        analyst:
          allowed:
            - read_database
            - export_csv
          denied:
            - delete_database
    kill_switches:
      halt_on_suspicious_pattern: true
      halt_on_pii_in_action: true

  prompt-injection:
    response:
      action: block

  audit-logger: {}

Validate before rollout:

kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --listen 0.0.0.0:41002

This configuration does a few useful things immediately.

It requires identity through rbac, so unattributed agent calls are blocked early.

It gives the analyst role an explicit tool set instead of relying on agent prompt discipline.

It prevents obviously unsafe actions such as delete_database even if the model attempts them.

It limits repetitive behavior with action caps.

And it turns high-value actions into review events instead of automatic execution when the approval threshold is crossed.

In a real rollout, keep the first tool set small. Teams usually want to start by allowing read-only operations such as search, retrieval, or export, then widen access only when the agent behavior is understood. The gateway is a good place to enforce that because the same rules can apply across different agent frameworks.

Also pair this with adjacent controls rather than treating agent-firewall as the whole story. Prompt Injection Detection helps catch hostile instructions before they shape tool behavior. RBAC ensures the request has meaningful identity. Tool Validation and Tool Security help you reason about the boundary from the tool side, not only the model side.

If cost is a concern, use team wallets alongside the firewall. The gateway can enforce who may call which tool, while wallet controls enforce how much the overall agent workload may spend.

Results and impact

The first impact is a smaller blast radius. Agents stop being “anything the framework can call” and start becoming “only the actions the platform explicitly permits.” That is a meaningful operational difference.

The second impact is faster review. Security, platform, and application teams can review a single declarative tool-access contract instead of tracing tool permissions through prompts and code.

The third impact is better incident handling. When a tool call is blocked or escalated, the reason is part of the governed request path rather than an unstructured application log. That makes it easier to explain what happened and decide whether the rule or the agent needs adjustment.

There is also a delivery benefit. Teams can move faster with agents when the dangerous edge is controlled centrally. The presence of a hard boundary often reduces pressure to over-constrain prompts because the runtime has a real enforcement layer behind them.

Key takeaways

agent-firewall governs tool actions, not just model text.
Use exact action names in allowed_tools and blocked_tools; the current evaluator does exact matching.
Pair tool access rules with rbac, prompt-injection defenses, and audit logging.
Treat action caps and approval thresholds as practical runtime controls, not just documentation.
Start with a narrow read-only tool set and widen only when the behavior is understood.

Agent Firewall: Governing Tool Access for AI Agents

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​