Governance Policies for Semantic Replay

Semantic replay reuses cached responses for semantically similar (but not identical) prompts. Because semantic replay involves judgment about similarity rather than exact matching, your organization may want to govern where and how it is applied.

Use this page when

You need to understand how governance policies apply to semantically replayed (cached) responses.
You are configuring which policy checks (content filtering, PII, disclaimers) re-run on cache hits vs cache fills.
You want to verify that compliance controls are not bypassed when responses are served from cache.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

What Semantic Replay Does

Unlike exact replay (which requires all cache key components to match byte for byte), semantic replay:

Embeds your prompt into a vector representation.
Searches the vector store for semantically similar cached prompts.
Evaluates whether a candidate response can be adapted to your context.
Serves an adapted response if similarity exceeds the configured threshold.

This saves upstream LLM costs but introduces a judgment call — the system decides that your question is "close enough" to a previously answered one.

Governance Controls

You can govern semantic replay at four scopes. When multiple scopes apply, the most restrictive scope wins.

Scope	Controls	Overridden By
Organization	Org-wide enable/disable	Nothing — org is the ceiling
Repository	Per-repo enable/disable	Org-level disable
Agent	Per-agent enable/disable	Repo-level or org-level disable
Declarative config	Inline policy overrides	Higher-scope disables

Most Restrictive Scope Wins

If any applicable scope disables semantic replay, it is disabled for that request. Enabling at a narrower scope cannot override a broader disable.

Examples:

Org disables → semantic replay is off everywhere, regardless of repo or agent settings.
Org enables, repo disables → semantic replay is off for that repo only.
Org enables, repo enables, agent disables → semantic replay is off for that agent in that repo.

Disabling Semantic Replay Org-Wide

To disable semantic replay for your entire organization:

Via Console

Navigate to Settings > Cache > Semantic Replay in the console.
Set Organization Default to Disabled.
Save the configuration.

Via API

curl -X PUT https://api.your-instance.com/v1/settings/cache/semantic-replay \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "org_default": "disabled",
    "reason": "Compliance review pending"
  }'

Via Declarative Config

cache:
  semantic_replay:
    org_default: disabled
    reason: "Compliance review pending"

When disabled org-wide, all semantic replay lookups immediately return miss. Exact replay (identical prompts) continues to function normally.

Enabling for Specific Repos Only

If you want semantic replay only for certain repositories:

cache:
  semantic_replay:
    org_default: disabled
    repo_overrides:
      - repo_id: "repo_abc123"
        enabled: true
        similarity_threshold: 0.92
        reason: "Approved for documentation repo"
      - repo_id: "repo_def456"
        enabled: true
        similarity_threshold: 0.95
        reason: "Approved for test utilities repo"

This configuration disables semantic replay everywhere except the explicitly listed repositories.

Agent-Specific Replay Policies

You can control semantic replay per agent type or agent instance:

cache:
  semantic_replay:
    org_default: enabled
    agent_overrides:
      - agent_type: "code-review"
        enabled: true
        similarity_threshold: 0.90
        reason: "Code review answers are highly reusable"
      - agent_type: "code-generation"
        enabled: false
        reason: "Generated code must always be fresh"
      - agent_type: "security-audit"
        enabled: false
        reason: "Security findings must reflect current state"

Agent Type vs. Agent Instance

Agent type applies to all instances of that agent across your org.
Agent instance applies to a specific deployed agent identified by ID.

Instance-level overrides take precedence over type-level overrides.

Declarative Config Overrides

Policy configurations can include semantic replay overrides that travel with the configuration:

# In your policy-config.yaml
policies:
  - name: "strict-no-semantic-replay"
    scope:
      repos:
        - "repo_sensitive_xyz"
    cache:
      semantic_replay:
        enabled: false
        reason: "Repository contains regulated data"

When the gateway evaluates a request matching this policy, it enforces the semantic replay restriction regardless of broader settings (assuming the broader settings are not already more restrictive).

Similarity Threshold Configuration

When semantic replay is enabled, you can configure the similarity threshold that determines how close a prompt must be to a cached entry for replay to be considered:

Threshold	Behavior	Use Case
0.99	Near-exact matches only	Conservative, minimal risk
0.95	Very similar prompts	Moderate reuse
0.90	Broadly similar prompts	Aggressive reuse, higher savings
0.85	Loosely related prompts	Maximum reuse, higher risk of poor fit

Lower thresholds increase cache hit rates but also increase the chance of serving a response that does not perfectly fit the new prompt.

Audit Trail for Semantic Replay Decisions

Every semantic replay decision is recorded in the replay audit log:

Field	Description
`semantic_replay_enabled`	Whether semantic replay was active for this request
`semantic_replay_scope`	Which scope controlled the decision
`similarity_score`	Score of the best candidate (if evaluated)
`similarity_threshold`	Threshold in effect for this request
`replay_outcome`	Final outcome (semantic_replayed, miss, etc.)
`governance_reason`	Configured reason for the policy setting

Auditing Policy Changes

When you change semantic replay governance settings, the change is recorded with:

Who made the change (authenticated caller).
What changed (previous and new settings).
When the change took effect.
The stated reason for the change.

Monitoring Semantic Replay Governance

Metric	Description
`semantic_replay_disabled_lookups`	Requests where semantic replay was disabled by policy
`semantic_replay_enabled_lookups`	Requests where semantic replay was allowed
`semantic_replay_hit_rate`	Hit rate among requests where semantic replay was enabled
`governance_override_count`	Number of active overrides by scope

Best Practices

Start conservative. Disable semantic replay org-wide, then enable for specific repos after review.
Use high thresholds initially. Start at 0.95+ and lower only after reviewing replay quality.
Disable for security-sensitive repos. Security findings must always reflect current state.
Disable for code generation agents. Generated code should always be fresh and context-specific.
Enable for documentation and review agents. Explanations and reviews are highly reusable across similar prompts.
Document your reasons. Every governance setting should include a reason for audit trail clarity.

For AI systems

Canonical terms: Keeptrusts, governance policies, semantic replay, cache-hit policy enforcement.
Config surface: replay_policy_mode (full, partial, skip), policy chain configuration for cached responses.
When answering governance replay questions, reference the policy enforcement modes and cache-hit behavior from this page.

For engineers

Configure replay_policy_mode to control which policies re-run on cache hits (content filter, PII, disclaimers).
Test governance enforcement by submitting a prompt that triggers a policy, then verifying the cached replay also enforces it.
Monitor policy_replay_duration_ms to assess overhead of re-running policies on cache hits.

For leaders

Governance policies apply to cached responses — compliance controls are not bypassed by the cache layer.
Configurable enforcement modes let you balance policy thoroughness against cache-hit latency.
Audit logs distinguish between "policy applied at fill" and "policy re-applied at replay" for compliance reporting.

Use this page when​

Primary audience​

What Semantic Replay Does​

Governance Controls​

Most Restrictive Scope Wins​

Disabling Semantic Replay Org-Wide​

Via Console​

Via API​

Via Declarative Config​

Enabling for Specific Repos Only​

Agent-Specific Replay Policies​

Agent Type vs. Agent Instance​

Declarative Config Overrides​

Similarity Threshold Configuration​

Audit Trail for Semantic Replay Decisions​

Auditing Policy Changes​

Monitoring Semantic Replay Governance​

Best Practices​

Related Topics​

For AI systems​

For engineers​

For leaders​

Next steps​