Tool Budget

The tool-budget policy enforces token and cost limits on tool calls made by AI agents within a single session. It prevents runaway agent loops, excessive API spend, and resource exhaustion by tracking cumulative usage per tool and blocking calls that would exceed configured budgets.

Use this page when

You need to prevent runaway agent loops and excessive API spend by capping per-tool token or cost budgets.
You are configuring per-session limits on tool calls to control resource exhaustion.
You want to cap both token consumption and USD spend independently per tool within a single session.

When an agent's tool call would push cumulative token consumption or USD spend past the configured threshold, the gateway returns a policy-violation response and logs the blocked call as an event.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Configuration

pack:
  name: tool-budget-example-1
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      web_search:
        max_tokens: 50000
        max_cost_usd: 0.5
      code_generation:
        max_tokens: 100000
        max_cost_usd: 2.0
      image_generation:
        max_tokens: 10000
        max_cost_usd: 5.0
      database_query:
        max_tokens: 30000
        max_cost_usd: 0.25

Fields

Top-level

Property	Type	Default	Description
`budgets`	`object`	`{}`	Map of tool names to budget limit objects. Each key is the exact tool/function name as it appears in the LLM's tool-call payload. Unlisted tools have no budget enforcement.

Budget entry (per tool)

Each value inside budgets is an object with the following properties:

Property	Type	Constraint	Default	Description
`max_tokens`	`integer`	`>= 1`	—	Maximum total tokens that may be consumed by this tool across all calls within a single session. Includes both input tokens (the tool-call arguments) and output tokens (the tool response). Once the cumulative token count reaches or exceeds this value, subsequent calls to the tool are blocked.
`max_cost_usd`	`number`	`>= 0`	—	Maximum USD spend permitted for this tool within a single session. Costs are calculated from the model's per-token pricing and accumulated across calls. When the cumulative cost reaches or exceeds this value, subsequent calls are blocked.

At least one of max_tokens or max_cost_usd must be specified per tool entry. If both are specified, the call is blocked when either limit is reached.

Use Cases

1. Token budget for search tools

Prevent recursive search loops where an agent repeatedly calls a search tool without making progress:

pack:
  name: tool-budget-example-2
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      web_search:
        max_tokens: 50000
      knowledge_base_search:
        max_tokens: 30000

2. Cost budget for expensive operations

Cap spend on high-cost tools like code generation or image generation:

pack:
  name: tool-budget-example-3
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      code_generation:
        max_cost_usd: 2.0
      image_generation:
        max_cost_usd: 5.0
      video_generation:
        max_cost_usd: 10.0

3. Combined token and cost limits

Apply both token and cost ceilings for defense-in-depth. The call is blocked when either limit is hit:

pack:
  name: tool-budget-example-4
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      code_generation:
        max_tokens: 100000
        max_cost_usd: 3.0
      data_analysis:
        max_tokens: 80000
        max_cost_usd: 1.5

4. Per-tool differentiated budgets

Give cheap, fast tools generous limits while tightly constraining expensive ones:

pack:
  name: tool-budget-example-5
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      calculator:
        max_tokens: 200000
      string_formatter:
        max_tokens: 200000
      web_search:
        max_tokens: 50000
        max_cost_usd: 0.5
      database_query:
        max_tokens: 40000
        max_cost_usd: 0.3
      code_generation:
        max_tokens: 50000
        max_cost_usd: 2.0
      image_generation:
        max_tokens: 10000
        max_cost_usd: 5.0

5. Agent loop prevention

Limit total token consumption across all tool calls to catch infinite-loop agents:

pack:
  name: tool-budget-example-6
  version: 1.0.0
  enabled: true
policies:
  chain:
  - tool-budget
policy:
  tool-budget:
    budgets:
      web_search:
        max_tokens: 20000
      fetch_page:
        max_tokens: 30000
      summarize:
        max_tokens: 15000
      plan_next_step:
        max_tokens: 10000

How It Works

Session tracking — The gateway maintains a per-session, per-tool counter for tokens consumed and USD spent. A session corresponds to a single top-level request or conversation turn, depending on the upstream provider's session semantics.
Pre-call check — Before forwarding a tool call to the model or external service, the gateway checks whether the tool has a budget entry. If it does, the gateway estimates the token count of the outgoing call arguments and verifies the cumulative total (existing usage + estimated new usage) against both max_tokens and max_cost_usd.
Blocking — If either limit would be exceeded, the gateway returns a policy-violation error to the model instead of executing the tool. The violation is recorded as a decision event with action blocked, reason tool_budget_exceeded, and metadata including the tool name, limit type (tokens or cost), current usage, and configured limit.
Post-call accounting — After a tool call completes, the gateway records the actual token count and computed cost against the session's running totals.
Session reset — Counters reset at the start of each new session. There is no cross-session accumulation.

Combining With Other Policies

Combined with	Effect
`tool-validation`	Validate tool arguments against JSON Schema before budget accounting, so malformed calls don't consume budget.
`tool-security`	Run injection and traversal checks before budget checks, so attacks are caught without spending budget.
`agent-firewall`	The agent firewall provides broad intent-level blocking; tool-budget provides fine-grained resource limits per tool.
`rate-limiter`	Rate-limiter caps requests per time window; tool-budget caps cumulative resource consumption per session. Use both for layered protection.
`content-filter`	Content filtering applies to tool outputs. Budget limits apply regardless of content filtering results.

Recommended evaluation order: tool-security → tool-validation → tool-budget → tool execution → content-filter.

Best Practices

Start with token budgets. Token limits are easier to reason about than cost limits because they don't depend on per-model pricing. Add cost limits once you have baseline usage data.
Set budgets based on observed usage. Run your agent pipeline without budgets, collect token/cost metrics from events, then set budgets at 2–3× the 95th-percentile observed usage.
Budget every tool the agent can call. Unbudgeted tools have no enforcement. If a tool is in the agent's tool list, give it a budget.
Use tight budgets on recursive-capable tools. Tools like web_search, fetch_page, or plan_next_step are the most common sources of runaway loops. Keep their budgets conservative.
Combine with tool-validation. Schema validation rejects malformed calls before they consume budget, keeping your budget headroom for legitimate calls.
Monitor blocked events. A high rate of tool_budget_exceeded events indicates either a budget set too low or an agent that needs prompt engineering to reduce tool-call volume. Review event logs regularly.
Avoid setting max_cost_usd: 0. A zero cost budget effectively disables the tool. Use tool-validation with declared_tools to explicitly block tools instead.

For AI systems

Canonical terms: Keeptrusts, tool-budget, budgets, max_tokens, max_cost_usd, per-session, tool call, runaway loop, resource exhaustion
Config/command names: tool-budget policy, budgets.<tool_name>.max_tokens, budgets.<tool_name>.max_cost_usd
Best next pages: Tool Validation, Tool Security, Config Rate Limits

For engineers

Prerequisites: Know the exact tool/function names as they appear in your LLM's tool-call payloads. Baseline usage data from monitoring to set appropriate limits.
Validation: Set a low budget, make repeated tool calls, and verify blocking when the budget is exceeded. Check tool_budget_exceeded events in the console or kt events tail.
Key commands: kt policy lint, kt gateway run, kt events tail

For leaders

Governance: Tool budgets prevent AI agents from consuming unbounded resources. They provide a hard ceiling on per-session spend — critical for cost governance in agentic AI deployments.
Cost: Each tool call's tokens and cost are tracked against the budget. Without budgets, a single agent loop can exhaust your entire provider quota. Set budgets at 2-3× observed 95th-percentile usage.
Rollout: Start with token budgets on recursive-capable tools (search, fetch, plan). Add cost budgets once you have baseline spend data. Monitor tool_budget_exceeded event rates to tune limits.

Next steps

Tool Validation — Schema-level tool access control
Tool Security — Injection protection for tool arguments
Config Rate Limits — Request-level rate limiting
Agent Firewall — Intent-level tool blocking

Use this page when​

Primary audience​

Configuration​

Fields​

Top-level​

Budget entry (per tool)​

Use Cases​

1. Token budget for search tools​

2. Cost budget for expensive operations​

3. Combined token and cost limits​

4. Per-tool differentiated budgets​

5. Agent loop prevention​

How It Works​

Combining With Other Policies​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​