Skip to main content

Tool Security: Governing What AI Agents Can Execute

Tool Security: Governing What AI Agents Can Execute

The moment an AI agent can call tools, your risk model changes. You are no longer governing text alone. You are governing action. That means “tool security” cannot be one switch. You need to know whether the tool was declared, whether the request itself looks unsafe, and whether the action should be allowed for this session at all. Keeptrusts splits those responsibilities across Tool Validation, Tool Security, and Agent Firewall, and that separation is what makes the control story workable.

Use this page when

  • You are adding or reviewing agent tool execution paths in a governed AI system.
  • You want a precise model for how Keeptrusts handles declared tools, dangerous request content, and action limits.
  • You need an execution-governance baseline that can be tested and audited.

Primary audience

  • Primary: Platform engineers, security engineers, and agent-platform owners
  • Secondary: Technical Leaders designing safe agent workflows

The problem

Teams often say an agent is “safe” because it only has a small tool list. That is incomplete. A small tool list does not tell you whether undeclared tools can still appear in the request, whether the tool payload contains dangerous patterns, or whether a valid tool can be abused through repetition or a high-risk action sequence.

There is also a naming problem. “Tool security” gets used as shorthand for the whole execution boundary, but the boundary has at least three distinct questions.

  • Was the requested tool declared and is the tool schema at least structurally valid?
  • Does the serialized tool request itself contain obvious dangerous patterns or blocked entity types?
  • Even if the tool is valid and the request is clean, should this action be allowed, denied, capped, or escalated right now?

Trying to answer all three with one feature usually leads to blind spots.

The solution

Use the controls in the order the execution boundary actually needs them.

Tool Validation comes first. It extracts requested tool names and blocks undeclared tools when allow_undeclared: false. It also verifies that configured JSON Schemas compile. It is important to remember the implementation detail here: this policy does not validate actual tool-call arguments against the schema by itself. If you need meaning-level review, pair it with the optional external semantic validator or with other tool controls.

Tool Security is the content scanner for the serialized tool request. In local mode it checks for fixed dangerous substrings such as path traversal, cloud-metadata access, drop table, rm -rf, or file://, and it can also block detected entities such as JWTs or private keys. If you need an external firewall verdict, the policy can call an external endpoint and fail closed on errors.

Agent Firewall then governs the action itself. This is where exact-match allow and deny lists, per-request action caps, per-session caps, rate limits by action, transaction thresholds, role-aware allow and deny lists, and kill switches live. That is the real action-control layer.

Implementation

The safest default is to combine all three controls and place prompt defense ahead of them so the tool path never starts from obviously hostile input.

pack:
name: governed-tool-execution
version: 1.0.0
enabled: true

policies:
chain:
- prompt-injection
- tool-validation
- tool-security
- agent-firewall
- audit-logger

policy:
prompt-injection:
use_embedding: true
detection:
embedding_threshold: 0.8
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true

tool-validation:
declared_tools:
- web_search
- knowledge_lookup
- database_query
schemas:
database_query:
type: object
properties:
table:
type: string
required:
- table
allow_undeclared: false

tool-security:
analysis_mode: local
blocked_patterns:
- ../
- rm -rf
- drop table
- file://
blocked_entity_types:
- jwt
- private_key
- ssn

agent-firewall:
allowed_tools:
- web_search
- knowledge_lookup
- database_query
blocked_tools:
- shell_command
- delete_database
max_actions_per_window: 3
max_actions_per_session: 10
kill_switches:
halt_on_suspicious_pattern: true
halt_on_pii_in_action: true

audit-logger: {}

The validation loop is straightforward:

kt policy lint --file governed-tool-execution.yaml
kt events tail --since 30m --verdict blocked --json

If the blocked population shows undeclared tools, tune declared_tools. If it shows dangerous serialized requests, tighten blocked_patterns or entity coverage. If it shows valid tools performing too much work too quickly, tighten agent-firewall caps. That is the practical advantage of keeping the layers separate.

Results and impact

Teams that adopt this layered model usually make better decisions faster. Instead of arguing whether a tool path is “safe,” they can point to exactly which part of the tool boundary is enforced and what evidence proves it.

The second payoff is incident clarity. When an agent attempts something unsafe, the event stream is easier to interpret because the control that fired has a clear job. That reduces noisy remediation and prevents the common failure where engineers weaken the wrong control just to quiet an alert.

Key takeaways

  • Tool security in Keeptrusts is a three-layer story: Tool Validation, Tool Security, and Agent Firewall.
  • tool-validation allowlists tool names and checks schema compilation, but does not by itself validate real arguments against those schemas.
  • tool-security scans serialized requests for dangerous patterns and blocked entities.
  • agent-firewall governs exact action names, limits, and escalation thresholds.
  • Put Prompt Injection Detection ahead of the tool boundary so hostile prompts do not reach it in the first place.

Next steps