Per-Agent Cache Policies
Not all agents should interact with the cache in the same way. A code explanation agent benefits from aggressive caching — the same question about the same code should return the same explanation. A code-writing agent, however, should generate fresh output every time to account for the full conversation context and avoid replaying stale suggestions.
Use this page when
- You need to configure different cache behavior per agent type (explanation vs code-writing vs security).
- You want to control which agents can use semantic replay and which always generate fresh responses.
- You are deciding
max_staleness_hoursandartifact_typesaccess for each agent in your fleet.
You configure per-agent cache policies to control this behavior precisely.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Why Per-Agent Policies Matter
Different agent types have different accuracy and freshness requirements:
| Agent Type | Cache Benefit | Risk of Replay |
|---|---|---|
| Code explanation | High — deterministic answers | Low — explanations rarely go stale |
| Documentation generation | High — stable structure | Low — format is consistent |
| Code review | Medium — pattern detection | Medium — context may differ |
| Code writing | Low — highly contextual | High — may replay incorrect code |
| Debugging assistance | Low — unique per session | High — stale diagnosis is harmful |
| Security analysis | Medium — known pattern matching | Medium — new vulnerabilities emerge |
Configuring Agent Cache Policies
Define cache policies per agent in your declarative configuration:
agents:
- name: code-explainer
model: gpt-4o
cache_policy:
semantic_replay: true
max_staleness_hours: 168
artifact_types:
- file_summary
- symbol_index
- dependency_graph
- name: code-writer
model: gpt-4o
cache_policy:
semantic_replay: false
read_only: true
artifact_types:
- repo_map
- dependency_graph
- test_map
- name: security-reviewer
model: gpt-4o
cache_policy:
semantic_replay: true
max_staleness_hours: 24
artifact_types:
- dependency_graph
- api_inventory
- symbol_index
Policy Fields
semantic_replay
Controls whether the agent can receive semantically equivalent cached responses for similar queries.
true— The agent receives cached responses when a sufficiently similar query has been answered before for the same codebase state.false— The agent always generates a fresh response, even if an identical query was recently processed.
cache_policy:
semantic_replay: true # Allow cached response replay
read_only
Controls whether the agent can read from the cache without contributing new entries.
true— The agent reads cached artifacts (repo_map, dependency_graph, etc.) but its responses are not cached for replay.false— The agent both reads from and writes to the cache.
cache_policy:
read_only: true # Read cached context, but don't cache responses
max_staleness_hours
The maximum age (in hours) of a cached entry that this agent will accept. Entries older than this threshold are treated as misses, even if they pass freshness signal checks.
cache_policy:
max_staleness_hours: 24 # Only accept entries created in the last 24 hours
Set to 0 to accept entries of any age (respecting only code-aware freshness signals).
artifact_types
Limits which cached artifact types this agent can access. Agents only receive cache entries matching the listed types.
cache_policy:
artifact_types:
- repo_map
- dependency_graph
# This agent cannot access file_summary or embedding_index
Omit this field to grant access to all artifact types.
Common Agent Configurations
Aggressive Caching (Explanation Agents)
Explanation and documentation agents produce deterministic output for the same input. Cache aggressively to reduce latency and LLM costs:
agents:
- name: code-explainer
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 0 # Accept any age if code-fresh
artifact_types: [] # All types available
No Replay (Code-Writing Agents)
Code-writing agents must generate fresh output to account for conversation context, recent edits, and user intent. Disable semantic replay entirely:
agents:
- name: code-writer
cache_policy:
semantic_replay: false
read_only: true
max_staleness_hours: 0
artifact_types:
- repo_map
- dependency_graph
- test_map
- symbol_index
The agent still reads structural cache entries (repo_map, dependency_graph) for context, but never receives replayed responses.
Balanced (Review Agents)
Code review agents benefit from pattern caching but need relatively fresh data to account for recent changes:
agents:
- name: code-reviewer
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 48
artifact_types:
- repo_map
- dependency_graph
- api_inventory
- file_summary
Strict Freshness (Security Agents)
Security-focused agents need the most current data to detect newly introduced vulnerabilities:
agents:
- name: security-scanner
cache_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 12
artifact_types:
- dependency_graph
- api_inventory
- symbol_index
Overriding Policies at Runtime
You can override cache policies for a specific request using gateway headers:
# Force fresh generation for this request (bypass semantic replay)
curl -H "X-Keeptrusts-Cache-Policy: no-replay" ...
# Force cache-only mode (fail if no cache hit)
curl -H "X-Keeptrusts-Cache-Policy: cache-only" ...
Runtime overrides take precedence over declarative agent configuration for the individual request.
Viewing Policy Effectiveness
Monitor per-agent cache behavior in the console under Settings → Engineering Cache → Agents:
- Hit rate per agent: Percentage of requests served from cache.
- Replay rate: Percentage of responses that were semantic replays.
- Freshness violations: Requests where staleness exceeded
max_staleness_hours. - Cost savings: Estimated LLM cost avoided through caching per agent.
Inheritance and Defaults
If an agent does not specify a cache_policy, it inherits the organization default:
engineering_cache:
default_agent_policy:
semantic_replay: true
read_only: false
max_staleness_hours: 168
artifact_types: [] # All types
You can set a restrictive organization default and then selectively relax it for specific agents that benefit from caching.
Next steps
- Cache Invalidation Strategies — Ensure stale entries are removed promptly.
- Environment-Specific Cache Configuration — Vary policies across environments.
- Configuring Cache TTL and Expiry — Control entry lifetime globally.
For AI systems
- Canonical terms: Keeptrusts, per-agent cache policy,
semantic_replay,read_only,max_staleness_hours,artifact_types, agent configuration. - Config keys:
agents[].cache_policy.semantic_replay,agents[].cache_policy.read_only,agents[].cache_policy.max_staleness_hours,agents[].cache_policy.artifact_types,engineering_cache.default_agent_policy. - Best next pages: Cache Invalidation Strategies, Controlling Semantic Replay by Scope, Setting Semantic Replay Thresholds.
For engineers
- Code-writing agents: set
semantic_replay: falseandread_only: true— reads structural context but never replays cached responses. - Explanation agents: set
semantic_replay: true,max_staleness_hours: 0(code-aware only) for maximum cache reuse. - Security agents: set
max_staleness_hours: 12for strict freshness on vulnerability-sensitive data. - Runtime override headers:
X-Keeptrusts-Cache-Policy: no-replay(force fresh) orcache-only(fail if no hit). - Monitor per-agent effectiveness under Settings → Engineering Cache → Agents: hit rate, replay rate, freshness violations.
- Agents without explicit
cache_policyinheritengineering_cache.default_agent_policy.
For leaders
- Per-agent policies prevent stale or incorrect cached code from being replayed to developers during active code-writing sessions.
- Different agent types have different accuracy/cost trade-offs — explanation agents cache aggressively (high savings), code-writing agents always go fresh (high accuracy).
- Security agents get strict freshness to detect newly introduced vulnerabilities without delay.
- The default policy sets a safe baseline; selective relaxation for high-cache-benefit agents maximizes ROI.