Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Per-Agent Cache Policies

Not all agents should interact with the cache in the same way. A code explanation agent benefits from aggressive caching — the same question about the same code should return the same explanation. A code-writing agent, however, should generate fresh output every time to account for the full conversation context and avoid replaying stale suggestions.

Use this page when

  • You need to configure different cache behavior per agent type (explanation vs code-writing vs security).
  • You want to control which agents can use semantic replay and which always generate fresh responses.
  • You are deciding max_staleness_hours and artifact_types access for each agent in your fleet.

You configure per-agent cache policies to control this behavior precisely.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Why Per-Agent Policies Matter

Different agent types have different accuracy and freshness requirements:

Agent TypeCache BenefitRisk of Replay
Code explanationHigh — deterministic answersLow — explanations rarely go stale
Documentation generationHigh — stable structureLow — format is consistent
Code reviewMedium — pattern detectionMedium — context may differ
Code writingLow — highly contextualHigh — may replay incorrect code
Debugging assistanceLow — unique per sessionHigh — stale diagnosis is harmful
Security analysisMedium — known pattern matchingMedium — new vulnerabilities emerge

Configuring Agent Cache Policies

Define cache policies per agent in your declarative configuration:

agents:
- name: code-explainer
model: gpt-4o
cache_policy:
semantic_replay: true
max_staleness_hours: 168
artifact_types:
- file_summary
- symbol_index
- dependency_graph

- name: code-writer
model: gpt-4o
cache_policy:
semantic_replay: false
read_only: true
artifact_types:
- repo_map
- dependency_graph
- test_map

- name: security-reviewer
model: gpt-4o
cache_policy:
semantic_replay: true
max_staleness_hours: 24
artifact_types:
- dependency_graph
- api_inventory
- symbol_index

Policy Fields

semantic_replay

Controls whether the agent can receive semantically equivalent cached responses for similar queries.

  • true — The agent receives cached responses when a sufficiently similar query has been answered before for the same codebase state.
  • false — The agent always generates a fresh response, even if an identical query was recently processed.
cache_policy:
semantic_replay: true # Allow cached response replay

read_only

Controls whether the agent can read from the cache without contributing new entries.

  • true — The agent reads cached artifacts (repo_map, dependency_graph, etc.) but its responses are not cached for replay.
  • false — The agent both reads from and writes to the cache.
cache_policy:
read_only: true # Read cached context, but don't cache responses

max_staleness_hours

The maximum age (in hours) of a cached entry that this agent will accept. Entries older than this threshold are treated as misses, even if they pass freshness signal checks.

cache_policy:
max_staleness_hours: 24 # Only accept entries created in the last 24 hours

Set to 0 to accept entries of any age (respecting only code-aware freshness signals).

artifact_types

Limits which cached artifact types this agent can access. Agents only receive cache entries matching the listed types.

cache_policy:
artifact_types:
- repo_map
- dependency_graph
# This agent cannot access file_summary or embedding_index

Omit this field to grant access to all artifact types.

Common Agent Configurations

Aggressive Caching (Explanation Agents)

Explanation and documentation agents produce deterministic output for the same input. Cache aggressively to reduce latency and LLM costs:

agents:
- name: code-explainer
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 0 # Accept any age if code-fresh
artifact_types: [] # All types available

No Replay (Code-Writing Agents)

Code-writing agents must generate fresh output to account for conversation context, recent edits, and user intent. Disable semantic replay entirely:

agents:
- name: code-writer
cache_policy:
semantic_replay: false
read_only: true
max_staleness_hours: 0
artifact_types:
- repo_map
- dependency_graph
- test_map
- symbol_index

The agent still reads structural cache entries (repo_map, dependency_graph) for context, but never receives replayed responses.

Balanced (Review Agents)

Code review agents benefit from pattern caching but need relatively fresh data to account for recent changes:

agents:
- name: code-reviewer
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 48
artifact_types:
- repo_map
- dependency_graph
- api_inventory
- file_summary

Strict Freshness (Security Agents)

Security-focused agents need the most current data to detect newly introduced vulnerabilities:

agents:
- name: security-scanner
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 12
artifact_types:
- dependency_graph
- api_inventory
- symbol_index

Overriding Policies at Runtime

You can override cache policies for a specific request using gateway headers:

# Force fresh generation for this request (bypass semantic replay)
curl -H "X-Keeptrusts-Cache-Policy: no-replay" ...

# Force cache-only mode (fail if no cache hit)
curl -H "X-Keeptrusts-Cache-Policy: cache-only" ...

Runtime overrides take precedence over declarative agent configuration for the individual request.

Viewing Policy Effectiveness

Monitor per-agent cache behavior in the console under Settings → Engineering Cache → Agents:

  • Hit rate per agent: Percentage of requests served from cache.
  • Replay rate: Percentage of responses that were semantic replays.
  • Freshness violations: Requests where staleness exceeded max_staleness_hours.
  • Cost savings: Estimated LLM cost avoided through caching per agent.

Inheritance and Defaults

If an agent does not specify a cache_policy, it inherits the organization default:

engineering_cache:
default_agent_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 168
artifact_types: [] # All types

You can set a restrictive organization default and then selectively relax it for specific agents that benefit from caching.

Next steps

For AI systems

For engineers

  • Code-writing agents: set semantic_replay: false and read_only: true — reads structural context but never replays cached responses.
  • Explanation agents: set semantic_replay: true, max_staleness_hours: 0 (code-aware only) for maximum cache reuse.
  • Security agents: set max_staleness_hours: 12 for strict freshness on vulnerability-sensitive data.
  • Runtime override headers: X-Keeptrusts-Cache-Policy: no-replay (force fresh) or cache-only (fail if no hit).
  • Monitor per-agent effectiveness under Settings → Engineering Cache → Agents: hit rate, replay rate, freshness violations.
  • Agents without explicit cache_policy inherit engineering_cache.default_agent_policy.

For leaders

  • Per-agent policies prevent stale or incorrect cached code from being replayed to developers during active code-writing sessions.
  • Different agent types have different accuracy/cost trade-offs — explanation agents cache aggressively (high savings), code-writing agents always go fresh (high accuracy).
  • Security agents get strict freshness to detect newly introduced vulnerabilities without delay.
  • The default policy sets a safe baseline; selective relaxation for high-cache-benefit agents maximizes ROI.