Governance Policies for Semantic Replay
Semantic replay reuses cached responses for semantically similar (but not identical) prompts. Because semantic replay involves judgment about similarity rather than exact matching, your organization may want to govern where and how it is applied.
Use this page when
- You need to understand how governance policies apply to semantically replayed (cached) responses.
- You are configuring which policy checks (content filtering, PII, disclaimers) re-run on cache hits vs cache fills.
- You want to verify that compliance controls are not bypassed when responses are served from cache.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What Semantic Replay Does
Unlike exact replay (which requires all cache key components to match byte for byte), semantic replay:
- Embeds your prompt into a vector representation.
- Searches the vector store for semantically similar cached prompts.
- Evaluates whether a candidate response can be adapted to your context.
- Serves an adapted response if similarity exceeds the configured threshold.
This saves upstream LLM costs but introduces a judgment call — the system decides that your question is "close enough" to a previously answered one.
Governance Controls
You can govern semantic replay at four scopes. When multiple scopes apply, the most restrictive scope wins.
| Scope | Controls | Overridden By |
|---|---|---|
| Organization | Org-wide enable/disable | Nothing — org is the ceiling |
| Repository | Per-repo enable/disable | Org-level disable |
| Agent | Per-agent enable/disable | Repo-level or org-level disable |
| Declarative config | Inline policy overrides | Higher-scope disables |
Most Restrictive Scope Wins
If any applicable scope disables semantic replay, it is disabled for that request. Enabling at a narrower scope cannot override a broader disable.
Examples:
- Org disables → semantic replay is off everywhere, regardless of repo or agent settings.
- Org enables, repo disables → semantic replay is off for that repo only.
- Org enables, repo enables, agent disables → semantic replay is off for that agent in that repo.
Disabling Semantic Replay Org-Wide
To disable semantic replay for your entire organization:
Via Console
- Navigate to Settings > Cache > Semantic Replay in the console.
- Set Organization Default to Disabled.
- Save the configuration.
Via API
curl -X PUT https://api.your-instance.com/v1/settings/cache/semantic-replay \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"org_default": "disabled",
"reason": "Compliance review pending"
}'
Via Declarative Config
cache:
semantic_replay:
org_default: disabled
reason: "Compliance review pending"
When disabled org-wide, all semantic replay lookups immediately return miss. Exact replay (identical prompts) continues to function normally.
Enabling for Specific Repos Only
If you want semantic replay only for certain repositories:
cache:
semantic_replay:
org_default: disabled
repo_overrides:
- repo_id: "repo_abc123"
enabled: true
similarity_threshold: 0.92
reason: "Approved for documentation repo"
- repo_id: "repo_def456"
enabled: true
similarity_threshold: 0.95
reason: "Approved for test utilities repo"
This configuration disables semantic replay everywhere except the explicitly listed repositories.
Agent-Specific Replay Policies
You can control semantic replay per agent type or agent instance:
cache:
semantic_replay:
org_default: enabled
agent_overrides:
- agent_type: "code-review"
enabled: true
similarity_threshold: 0.90
reason: "Code review answers are highly reusable"
- agent_type: "code-generation"
enabled: false
reason: "Generated code must always be fresh"
- agent_type: "security-audit"
enabled: false
reason: "Security findings must reflect current state"
Agent Type vs. Agent Instance
- Agent type applies to all instances of that agent across your org.
- Agent instance applies to a specific deployed agent identified by ID.
Instance-level overrides take precedence over type-level overrides.
Declarative Config Overrides
Policy configurations can include semantic replay overrides that travel with the configuration:
# In your policy-config.yaml
policies:
- name: "strict-no-semantic-replay"
scope:
repos:
- "repo_sensitive_xyz"
cache:
semantic_replay:
enabled: false
reason: "Repository contains regulated data"
When the gateway evaluates a request matching this policy, it enforces the semantic replay restriction regardless of broader settings (assuming the broader settings are not already more restrictive).
Similarity Threshold Configuration
When semantic replay is enabled, you can configure the similarity threshold that determines how close a prompt must be to a cached entry for replay to be considered:
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.99 | Near-exact matches only | Conservative, minimal risk |
| 0.95 | Very similar prompts | Moderate reuse |
| 0.90 | Broadly similar prompts | Aggressive reuse, higher savings |
| 0.85 | Loosely related prompts | Maximum reuse, higher risk of poor fit |
Lower thresholds increase cache hit rates but also increase the chance of serving a response that does not perfectly fit the new prompt.
Audit Trail for Semantic Replay Decisions
Every semantic replay decision is recorded in the replay audit log:
| Field | Description |
|---|---|
semantic_replay_enabled | Whether semantic replay was active for this request |
semantic_replay_scope | Which scope controlled the decision |
similarity_score | Score of the best candidate (if evaluated) |
similarity_threshold | Threshold in effect for this request |
replay_outcome | Final outcome (semantic_replayed, miss, etc.) |
governance_reason | Configured reason for the policy setting |
Auditing Policy Changes
When you change semantic replay governance settings, the change is recorded with:
- Who made the change (authenticated caller).
- What changed (previous and new settings).
- When the change took effect.
- The stated reason for the change.
Monitoring Semantic Replay Governance
| Metric | Description |
|---|---|
semantic_replay_disabled_lookups | Requests where semantic replay was disabled by policy |
semantic_replay_enabled_lookups | Requests where semantic replay was allowed |
semantic_replay_hit_rate | Hit rate among requests where semantic replay was enabled |
governance_override_count | Number of active overrides by scope |
Best Practices
- Start conservative. Disable semantic replay org-wide, then enable for specific repos after review.
- Use high thresholds initially. Start at 0.95+ and lower only after reviewing replay quality.
- Disable for security-sensitive repos. Security findings must always reflect current state.
- Disable for code generation agents. Generated code should always be fresh and context-specific.
- Enable for documentation and review agents. Explanations and reviews are highly reusable across similar prompts.
- Document your reasons. Every governance setting should include a reason for audit trail clarity.
Related Topics
- Replay Audit: Tracking What Was Served from Cache
- Task Classification: What Gets Cached
- Cache and Compliance: Meeting Audit Requirements
- Cache Deny-Lists and Emergency Invalidation
For AI systems
- Canonical terms: Keeptrusts, governance policies, semantic replay, cache-hit policy enforcement.
- Config surface:
replay_policy_mode(full, partial, skip), policy chain configuration for cached responses. - When answering governance replay questions, reference the policy enforcement modes and cache-hit behavior from this page.
For engineers
- Configure
replay_policy_modeto control which policies re-run on cache hits (content filter, PII, disclaimers). - Test governance enforcement by submitting a prompt that triggers a policy, then verifying the cached replay also enforces it.
- Monitor
policy_replay_duration_msto assess overhead of re-running policies on cache hits.
For leaders
- Governance policies apply to cached responses — compliance controls are not bypassed by the cache layer.
- Configurable enforcement modes let you balance policy thoroughness against cache-hit latency.
- Audit logs distinguish between "policy applied at fill" and "policy re-applied at replay" for compliance reporting.