Tool Budget
The tool-budget policy enforces token and cost limits on tool calls made by AI agents within a single session. It prevents runaway agent loops, excessive API spend, and resource exhaustion by tracking cumulative usage per tool and blocking calls that would exceed configured budgets.
Use this page when
- You need to prevent runaway agent loops and excessive API spend by capping per-tool token or cost budgets.
- You are configuring per-session limits on tool calls to control resource exhaustion.
- You want to cap both token consumption and USD spend independently per tool within a single session.
When an agent's tool call would push cumulative token consumption or USD spend past the configured threshold, the gateway returns a policy-violation response and logs the blocked call as an event.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Configuration
pack:
name: tool-budget-example-1
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
web_search:
max_tokens: 50000
max_cost_usd: 0.5
code_generation:
max_tokens: 100000
max_cost_usd: 2.0
image_generation:
max_tokens: 10000
max_cost_usd: 5.0
database_query:
max_tokens: 30000
max_cost_usd: 0.25
Fields
Top-level
| Property | Type | Default | Description |
|---|---|---|---|
budgets | object | {} | Map of tool names to budget limit objects. Each key is the exact tool/function name as it appears in the LLM's tool-call payload. Unlisted tools have no budget enforcement. |
Budget entry (per tool)
Each value inside budgets is an object with the following properties:
| Property | Type | Constraint | Default | Description |
|---|---|---|---|---|
max_tokens | integer | >= 1 | — | Maximum total tokens that may be consumed by this tool across all calls within a single session. Includes both input tokens (the tool-call arguments) and output tokens (the tool response). Once the cumulative token count reaches or exceeds this value, subsequent calls to the tool are blocked. |
max_cost_usd | number | >= 0 | — | Maximum USD spend permitted for this tool within a single session. Costs are calculated from the model's per-token pricing and accumulated across calls. When the cumulative cost reaches or exceeds this value, subsequent calls are blocked. |
At least one of max_tokens or max_cost_usd must be specified per tool entry. If both are specified, the call is blocked when either limit is reached.
Use Cases
1. Token budget for search tools
Prevent recursive search loops where an agent repeatedly calls a search tool without making progress:
pack:
name: tool-budget-example-2
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
web_search:
max_tokens: 50000
knowledge_base_search:
max_tokens: 30000
2. Cost budget for expensive operations
Cap spend on high-cost tools like code generation or image generation:
pack:
name: tool-budget-example-3
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
code_generation:
max_cost_usd: 2.0
image_generation:
max_cost_usd: 5.0
video_generation:
max_cost_usd: 10.0
3. Combined token and cost limits
Apply both token and cost ceilings for defense-in-depth. The call is blocked when either limit is hit:
pack:
name: tool-budget-example-4
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
code_generation:
max_tokens: 100000
max_cost_usd: 3.0
data_analysis:
max_tokens: 80000
max_cost_usd: 1.5
4. Per-tool differentiated budgets
Give cheap, fast tools generous limits while tightly constraining expensive ones:
pack:
name: tool-budget-example-5
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
calculator:
max_tokens: 200000
string_formatter:
max_tokens: 200000
web_search:
max_tokens: 50000
max_cost_usd: 0.5
database_query:
max_tokens: 40000
max_cost_usd: 0.3
code_generation:
max_tokens: 50000
max_cost_usd: 2.0
image_generation:
max_tokens: 10000
max_cost_usd: 5.0
5. Agent loop prevention
Limit total token consumption across all tool calls to catch infinite-loop agents:
pack:
name: tool-budget-example-6
version: 1.0.0
enabled: true
policies:
chain:
- tool-budget
policy:
tool-budget:
budgets:
web_search:
max_tokens: 20000
fetch_page:
max_tokens: 30000
summarize:
max_tokens: 15000
plan_next_step:
max_tokens: 10000
How It Works
-
Session tracking — The gateway maintains a per-session, per-tool counter for tokens consumed and USD spent. A session corresponds to a single top-level request or conversation turn, depending on the upstream provider's session semantics.
-
Pre-call check — Before forwarding a tool call to the model or external service, the gateway checks whether the tool has a budget entry. If it does, the gateway estimates the token count of the outgoing call arguments and verifies the cumulative total (existing usage + estimated new usage) against both
max_tokensandmax_cost_usd. -
Blocking — If either limit would be exceeded, the gateway returns a policy-violation error to the model instead of executing the tool. The violation is recorded as a decision event with action
blocked, reasontool_budget_exceeded, and metadata including the tool name, limit type (tokensorcost), current usage, and configured limit. -
Post-call accounting — After a tool call completes, the gateway records the actual token count and computed cost against the session's running totals.
-
Session reset — Counters reset at the start of each new session. There is no cross-session accumulation.
Combining With Other Policies
| Combined with | Effect |
|---|---|
tool-validation | Validate tool arguments against JSON Schema before budget accounting, so malformed calls don't consume budget. |
tool-security | Run injection and traversal checks before budget checks, so attacks are caught without spending budget. |
agent-firewall | The agent firewall provides broad intent-level blocking; tool-budget provides fine-grained resource limits per tool. |
rate-limiter | Rate-limiter caps requests per time window; tool-budget caps cumulative resource consumption per session. Use both for layered protection. |
content-filter | Content filtering applies to tool outputs. Budget limits apply regardless of content filtering results. |
Recommended evaluation order: tool-security → tool-validation → tool-budget → tool execution → content-filter.
Best Practices
- Start with token budgets. Token limits are easier to reason about than cost limits because they don't depend on per-model pricing. Add cost limits once you have baseline usage data.
- Set budgets based on observed usage. Run your agent pipeline without budgets, collect token/cost metrics from events, then set budgets at 2–3× the 95th-percentile observed usage.
- Budget every tool the agent can call. Unbudgeted tools have no enforcement. If a tool is in the agent's tool list, give it a budget.
- Use tight budgets on recursive-capable tools. Tools like
web_search,fetch_page, orplan_next_stepare the most common sources of runaway loops. Keep their budgets conservative. - Combine with
tool-validation. Schema validation rejects malformed calls before they consume budget, keeping your budget headroom for legitimate calls. - Monitor blocked events. A high rate of
tool_budget_exceededevents indicates either a budget set too low or an agent that needs prompt engineering to reduce tool-call volume. Review event logs regularly. - Avoid setting
max_cost_usd: 0. A zero cost budget effectively disables the tool. Usetool-validationwithdeclared_toolsto explicitly block tools instead.
For AI systems
- Canonical terms: Keeptrusts, tool-budget, budgets, max_tokens, max_cost_usd, per-session, tool call, runaway loop, resource exhaustion
- Config/command names:
tool-budgetpolicy,budgets.<tool_name>.max_tokens,budgets.<tool_name>.max_cost_usd - Best next pages: Tool Validation, Tool Security, Config Rate Limits
For engineers
- Prerequisites: Know the exact tool/function names as they appear in your LLM's tool-call payloads. Baseline usage data from monitoring to set appropriate limits.
- Validation: Set a low budget, make repeated tool calls, and verify blocking when the budget is exceeded. Check
tool_budget_exceededevents in the console orkt events tail. - Key commands:
kt policy lint,kt gateway run,kt events tail
For leaders
- Governance: Tool budgets prevent AI agents from consuming unbounded resources. They provide a hard ceiling on per-session spend — critical for cost governance in agentic AI deployments.
- Cost: Each tool call's tokens and cost are tracked against the budget. Without budgets, a single agent loop can exhaust your entire provider quota. Set budgets at 2-3× observed 95th-percentile usage.
- Rollout: Start with token budgets on recursive-capable tools (search, fetch, plan). Add cost budgets once you have baseline spend data. Monitor
tool_budget_exceededevent rates to tune limits.
Next steps
- Tool Validation — Schema-level tool access control
- Tool Security — Injection protection for tool arguments
- Config Rate Limits — Request-level rate limiting
- Agent Firewall — Intent-level tool blocking