Tool Security: Governing What AI Agents Can Execute
Tool Security: Governing What AI Agents Can Execute
The moment an AI agent can call tools, your risk model changes. You are no longer governing text alone. You are governing action. That means “tool security” cannot be one switch. You need to know whether the tool was declared, whether the request itself looks unsafe, and whether the action should be allowed for this session at all. Keeptrusts splits those responsibilities across Tool Validation, Tool Security, and Agent Firewall, and that separation is what makes the control story workable.
Use this page when
- You are adding or reviewing agent tool execution paths in a governed AI system.
- You want a precise model for how Keeptrusts handles declared tools, dangerous request content, and action limits.
- You need an execution-governance baseline that can be tested and audited.
Primary audience
- Primary: Platform engineers, security engineers, and agent-platform owners
- Secondary: Technical Leaders designing safe agent workflows
The problem
Teams often say an agent is “safe” because it only has a small tool list. That is incomplete. A small tool list does not tell you whether undeclared tools can still appear in the request, whether the tool payload contains dangerous patterns, or whether a valid tool can be abused through repetition or a high-risk action sequence.
There is also a naming problem. “Tool security” gets used as shorthand for the whole execution boundary, but the boundary has at least three distinct questions.
- Was the requested tool declared and is the tool schema at least structurally valid?
- Does the serialized tool request itself contain obvious dangerous patterns or blocked entity types?
- Even if the tool is valid and the request is clean, should this action be allowed, denied, capped, or escalated right now?
Trying to answer all three with one feature usually leads to blind spots.
The solution
Use the controls in the order the execution boundary actually needs them.
Tool Validation comes first. It extracts requested tool names and blocks undeclared tools when allow_undeclared: false. It also verifies that configured JSON Schemas compile. It is important to remember the implementation detail here: this policy does not validate actual tool-call arguments against the schema by itself. If you need meaning-level review, pair it with the optional external semantic validator or with other tool controls.
Tool Security is the content scanner for the serialized tool request. In local mode it checks for fixed dangerous substrings such as path traversal, cloud-metadata access, drop table, rm -rf, or file://, and it can also block detected entities such as JWTs or private keys. If you need an external firewall verdict, the policy can call an external endpoint and fail closed on errors.
Agent Firewall then governs the action itself. This is where exact-match allow and deny lists, per-request action caps, per-session caps, rate limits by action, transaction thresholds, role-aware allow and deny lists, and kill switches live. That is the real action-control layer.
Implementation
The safest default is to combine all three controls and place prompt defense ahead of them so the tool path never starts from obviously hostile input.
pack:
name: governed-tool-execution
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- tool-validation
- tool-security
- agent-firewall
- audit-logger
policy:
prompt-injection:
use_embedding: true
detection:
embedding_threshold: 0.8
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true
tool-validation:
declared_tools:
- web_search
- knowledge_lookup
- database_query
schemas:
database_query:
type: object
properties:
table:
type: string
required:
- table
allow_undeclared: false
tool-security:
analysis_mode: local
blocked_patterns:
- ../
- rm -rf
- drop table
- file://
blocked_entity_types:
- jwt
- private_key
- ssn
agent-firewall:
allowed_tools:
- web_search
- knowledge_lookup
- database_query
blocked_tools:
- shell_command
- delete_database
max_actions_per_window: 3
max_actions_per_session: 10
kill_switches:
halt_on_suspicious_pattern: true
halt_on_pii_in_action: true
audit-logger: {}
The validation loop is straightforward:
kt policy lint --file governed-tool-execution.yaml
kt events tail --since 30m --verdict blocked --json
If the blocked population shows undeclared tools, tune declared_tools. If it shows dangerous serialized requests, tighten blocked_patterns or entity coverage. If it shows valid tools performing too much work too quickly, tighten agent-firewall caps. That is the practical advantage of keeping the layers separate.
Results and impact
Teams that adopt this layered model usually make better decisions faster. Instead of arguing whether a tool path is “safe,” they can point to exactly which part of the tool boundary is enforced and what evidence proves it.
The second payoff is incident clarity. When an agent attempts something unsafe, the event stream is easier to interpret because the control that fired has a clear job. That reduces noisy remediation and prevents the common failure where engineers weaken the wrong control just to quiet an alert.
Key takeaways
- Tool security in Keeptrusts is a three-layer story: Tool Validation, Tool Security, and Agent Firewall.
tool-validationallowlists tool names and checks schema compilation, but does not by itself validate real arguments against those schemas.tool-securityscans serialized requests for dangerous patterns and blocked entities.agent-firewallgoverns exact action names, limits, and escalation thresholds.- Put Prompt Injection Detection ahead of the tool boundary so hostile prompts do not reach it in the first place.