Per-Agent Cache Policies

Not all agents should interact with the cache in the same way. A code explanation agent benefits from aggressive caching — the same question about the same code should return the same explanation. A code-writing agent, however, should generate fresh output every time to account for the full conversation context and avoid replaying stale suggestions.

Use this page when

You need to configure different cache behavior per agent type (explanation vs code-writing vs security).
You want to control which agents can use semantic replay and which always generate fresh responses.
You are deciding max_staleness_hours and artifact_types access for each agent in your fleet.

You configure per-agent cache policies to control this behavior precisely.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Why Per-Agent Policies Matter

Different agent types have different accuracy and freshness requirements:

Agent Type	Cache Benefit	Risk of Replay
Code explanation	High — deterministic answers	Low — explanations rarely go stale
Documentation generation	High — stable structure	Low — format is consistent
Code review	Medium — pattern detection	Medium — context may differ
Code writing	Low — highly contextual	High — may replay incorrect code
Debugging assistance	Low — unique per session	High — stale diagnosis is harmful
Security analysis	Medium — known pattern matching	Medium — new vulnerabilities emerge

Configuring Agent Cache Policies

Define cache policies per agent in your declarative configuration:

agents:
  - name: code-explainer
    model: gpt-4o
    cache_policy:
      semantic_replay: true
      max_staleness_hours: 168
      artifact_types:
        - file_summary
        - symbol_index
        - dependency_graph

  - name: code-writer
    model: gpt-4o
    cache_policy:
      semantic_replay: false
      read_only: true
      artifact_types:
        - repo_map
        - dependency_graph
        - test_map

  - name: security-reviewer
    model: gpt-4o
    cache_policy:
      semantic_replay: true
      max_staleness_hours: 24
      artifact_types:
        - dependency_graph
        - api_inventory
        - symbol_index

Policy Fields

`semantic_replay`

Controls whether the agent can receive semantically equivalent cached responses for similar queries.

true — The agent receives cached responses when a sufficiently similar query has been answered before for the same codebase state.
false — The agent always generates a fresh response, even if an identical query was recently processed.

cache_policy:
  semantic_replay: true  # Allow cached response replay

`read_only`

Controls whether the agent can read from the cache without contributing new entries.

true — The agent reads cached artifacts (repo_map, dependency_graph, etc.) but its responses are not cached for replay.
false — The agent both reads from and writes to the cache.

cache_policy:
  read_only: true  # Read cached context, but don't cache responses

`max_staleness_hours`

The maximum age (in hours) of a cached entry that this agent will accept. Entries older than this threshold are treated as misses, even if they pass freshness signal checks.

cache_policy:
  max_staleness_hours: 24  # Only accept entries created in the last 24 hours

Set to 0 to accept entries of any age (respecting only code-aware freshness signals).

`artifact_types`

Limits which cached artifact types this agent can access. Agents only receive cache entries matching the listed types.

cache_policy:
  artifact_types:
    - repo_map
    - dependency_graph
    # This agent cannot access file_summary or embedding_index

Omit this field to grant access to all artifact types.

Common Agent Configurations

Aggressive Caching (Explanation Agents)

Explanation and documentation agents produce deterministic output for the same input. Cache aggressively to reduce latency and LLM costs:

agents:
  - name: code-explainer
    cache_policy:
      semantic_replay: true
      read_only: false
      max_staleness_hours: 0  # Accept any age if code-fresh
      artifact_types: []  # All types available

No Replay (Code-Writing Agents)

Code-writing agents must generate fresh output to account for conversation context, recent edits, and user intent. Disable semantic replay entirely:

agents:
  - name: code-writer
    cache_policy:
      semantic_replay: false
      read_only: true
      max_staleness_hours: 0
      artifact_types:
        - repo_map
        - dependency_graph
        - test_map
        - symbol_index

The agent still reads structural cache entries (repo_map, dependency_graph) for context, but never receives replayed responses.

Balanced (Review Agents)

Code review agents benefit from pattern caching but need relatively fresh data to account for recent changes:

agents:
  - name: code-reviewer
    cache_policy:
      semantic_replay: true
      read_only: false
      max_staleness_hours: 48
      artifact_types:
        - repo_map
        - dependency_graph
        - api_inventory
        - file_summary

Strict Freshness (Security Agents)

Security-focused agents need the most current data to detect newly introduced vulnerabilities:

agents:
  - name: security-scanner
    cache_policy:
      semantic_replay: true
      read_only: false
      max_staleness_hours: 12
      artifact_types:
        - dependency_graph
        - api_inventory
        - symbol_index

Overriding Policies at Runtime

You can override cache policies for a specific request using gateway headers:

# Force fresh generation for this request (bypass semantic replay)
curl -H "X-Keeptrusts-Cache-Policy: no-replay" ...

# Force cache-only mode (fail if no cache hit)
curl -H "X-Keeptrusts-Cache-Policy: cache-only" ...

Runtime overrides take precedence over declarative agent configuration for the individual request.

Viewing Policy Effectiveness

Monitor per-agent cache behavior in the console under Settings → Engineering Cache → Agents:

Hit rate per agent: Percentage of requests served from cache.
Replay rate: Percentage of responses that were semantic replays.
Freshness violations: Requests where staleness exceeded max_staleness_hours.
Cost savings: Estimated LLM cost avoided through caching per agent.

Inheritance and Defaults

If an agent does not specify a cache_policy, it inherits the organization default:

engineering_cache:
  default_agent_policy:
    semantic_replay: true
    read_only: false
    max_staleness_hours: 168
    artifact_types: []  # All types

You can set a restrictive organization default and then selectively relax it for specific agents that benefit from caching.

Next steps

Cache Invalidation Strategies — Ensure stale entries are removed promptly.
Environment-Specific Cache Configuration — Vary policies across environments.
Configuring Cache TTL and Expiry — Control entry lifetime globally.

For AI systems

Canonical terms: Keeptrusts, per-agent cache policy, semantic_replay, read_only, max_staleness_hours, artifact_types, agent configuration.
Config keys: agents[].cache_policy.semantic_replay, agents[].cache_policy.read_only, agents[].cache_policy.max_staleness_hours, agents[].cache_policy.artifact_types, engineering_cache.default_agent_policy.
Best next pages: Cache Invalidation Strategies, Controlling Semantic Replay by Scope, Setting Semantic Replay Thresholds.

For engineers

Code-writing agents: set semantic_replay: false and read_only: true — reads structural context but never replays cached responses.
Explanation agents: set semantic_replay: true, max_staleness_hours: 0 (code-aware only) for maximum cache reuse.
Security agents: set max_staleness_hours: 12 for strict freshness on vulnerability-sensitive data.
Runtime override headers: X-Keeptrusts-Cache-Policy: no-replay (force fresh) or cache-only (fail if no hit).
Monitor per-agent effectiveness under Settings → Engineering Cache → Agents: hit rate, replay rate, freshness violations.
Agents without explicit cache_policy inherit engineering_cache.default_agent_policy.

For leaders

Per-agent policies prevent stale or incorrect cached code from being replayed to developers during active code-writing sessions.
Different agent types have different accuracy/cost trade-offs — explanation agents cache aggressively (high savings), code-writing agents always go fresh (high accuracy).
Security agents get strict freshness to detect newly introduced vulnerabilities without delay.
The default policy sets a safe baseline; selective relaxation for high-cache-benefit agents maximizes ROI.

Use this page when​

Primary audience​

Why Per-Agent Policies Matter​

Configuring Agent Cache Policies​

Policy Fields​

semantic_replay​

read_only​

max_staleness_hours​

artifact_types​

Common Agent Configurations​

Aggressive Caching (Explanation Agents)​

No Replay (Code-Writing Agents)​

Balanced (Review Agents)​

Strict Freshness (Security Agents)​

Overriding Policies at Runtime​

Viewing Policy Effectiveness​

Inheritance and Defaults​

Next steps​

For AI systems​

For engineers​

For leaders​