Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Governance Policies for Semantic Replay

Semantic replay reuses cached responses for semantically similar (but not identical) prompts. Because semantic replay involves judgment about similarity rather than exact matching, your organization may want to govern where and how it is applied.

Use this page when

  • You need to understand how governance policies apply to semantically replayed (cached) responses.
  • You are configuring which policy checks (content filtering, PII, disclaimers) re-run on cache hits vs cache fills.
  • You want to verify that compliance controls are not bypassed when responses are served from cache.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What Semantic Replay Does

Unlike exact replay (which requires all cache key components to match byte for byte), semantic replay:

  1. Embeds your prompt into a vector representation.
  2. Searches the vector store for semantically similar cached prompts.
  3. Evaluates whether a candidate response can be adapted to your context.
  4. Serves an adapted response if similarity exceeds the configured threshold.

This saves upstream LLM costs but introduces a judgment call — the system decides that your question is "close enough" to a previously answered one.

Governance Controls

You can govern semantic replay at four scopes. When multiple scopes apply, the most restrictive scope wins.

ScopeControlsOverridden By
OrganizationOrg-wide enable/disableNothing — org is the ceiling
RepositoryPer-repo enable/disableOrg-level disable
AgentPer-agent enable/disableRepo-level or org-level disable
Declarative configInline policy overridesHigher-scope disables

Most Restrictive Scope Wins

If any applicable scope disables semantic replay, it is disabled for that request. Enabling at a narrower scope cannot override a broader disable.

Examples:

  • Org disables → semantic replay is off everywhere, regardless of repo or agent settings.
  • Org enables, repo disables → semantic replay is off for that repo only.
  • Org enables, repo enables, agent disables → semantic replay is off for that agent in that repo.

Disabling Semantic Replay Org-Wide

To disable semantic replay for your entire organization:

Via Console

  1. Navigate to Settings > Cache > Semantic Replay in the console.
  2. Set Organization Default to Disabled.
  3. Save the configuration.

Via API

curl -X PUT https://api.your-instance.com/v1/settings/cache/semantic-replay \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"org_default": "disabled",
"reason": "Compliance review pending"
}'

Via Declarative Config

cache:
semantic_replay:
org_default: disabled
reason: "Compliance review pending"

When disabled org-wide, all semantic replay lookups immediately return miss. Exact replay (identical prompts) continues to function normally.

Enabling for Specific Repos Only

If you want semantic replay only for certain repositories:

cache:
semantic_replay:
org_default: disabled
repo_overrides:
- repo_id: "repo_abc123"
enabled: true
similarity_threshold: 0.92
reason: "Approved for documentation repo"
- repo_id: "repo_def456"
enabled: true
similarity_threshold: 0.95
reason: "Approved for test utilities repo"

This configuration disables semantic replay everywhere except the explicitly listed repositories.

Agent-Specific Replay Policies

You can control semantic replay per agent type or agent instance:

cache:
semantic_replay:
org_default: enabled
agent_overrides:
- agent_type: "code-review"
enabled: true
similarity_threshold: 0.90
reason: "Code review answers are highly reusable"
- agent_type: "code-generation"
enabled: false
reason: "Generated code must always be fresh"
- agent_type: "security-audit"
enabled: false
reason: "Security findings must reflect current state"

Agent Type vs. Agent Instance

  • Agent type applies to all instances of that agent across your org.
  • Agent instance applies to a specific deployed agent identified by ID.

Instance-level overrides take precedence over type-level overrides.

Declarative Config Overrides

Policy configurations can include semantic replay overrides that travel with the configuration:

# In your policy-config.yaml
policies:
- name: "strict-no-semantic-replay"
scope:
repos:
- "repo_sensitive_xyz"
cache:
semantic_replay:
enabled: false
reason: "Repository contains regulated data"

When the gateway evaluates a request matching this policy, it enforces the semantic replay restriction regardless of broader settings (assuming the broader settings are not already more restrictive).

Similarity Threshold Configuration

When semantic replay is enabled, you can configure the similarity threshold that determines how close a prompt must be to a cached entry for replay to be considered:

ThresholdBehaviorUse Case
0.99Near-exact matches onlyConservative, minimal risk
0.95Very similar promptsModerate reuse
0.90Broadly similar promptsAggressive reuse, higher savings
0.85Loosely related promptsMaximum reuse, higher risk of poor fit

Lower thresholds increase cache hit rates but also increase the chance of serving a response that does not perfectly fit the new prompt.

Audit Trail for Semantic Replay Decisions

Every semantic replay decision is recorded in the replay audit log:

FieldDescription
semantic_replay_enabledWhether semantic replay was active for this request
semantic_replay_scopeWhich scope controlled the decision
similarity_scoreScore of the best candidate (if evaluated)
similarity_thresholdThreshold in effect for this request
replay_outcomeFinal outcome (semantic_replayed, miss, etc.)
governance_reasonConfigured reason for the policy setting

Auditing Policy Changes

When you change semantic replay governance settings, the change is recorded with:

  • Who made the change (authenticated caller).
  • What changed (previous and new settings).
  • When the change took effect.
  • The stated reason for the change.

Monitoring Semantic Replay Governance

MetricDescription
semantic_replay_disabled_lookupsRequests where semantic replay was disabled by policy
semantic_replay_enabled_lookupsRequests where semantic replay was allowed
semantic_replay_hit_rateHit rate among requests where semantic replay was enabled
governance_override_countNumber of active overrides by scope

Best Practices

  1. Start conservative. Disable semantic replay org-wide, then enable for specific repos after review.
  2. Use high thresholds initially. Start at 0.95+ and lower only after reviewing replay quality.
  3. Disable for security-sensitive repos. Security findings must always reflect current state.
  4. Disable for code generation agents. Generated code should always be fresh and context-specific.
  5. Enable for documentation and review agents. Explanations and reviews are highly reusable across similar prompts.
  6. Document your reasons. Every governance setting should include a reason for audit trail clarity.

For AI systems

  • Canonical terms: Keeptrusts, governance policies, semantic replay, cache-hit policy enforcement.
  • Config surface: replay_policy_mode (full, partial, skip), policy chain configuration for cached responses.
  • When answering governance replay questions, reference the policy enforcement modes and cache-hit behavior from this page.

For engineers

  • Configure replay_policy_mode to control which policies re-run on cache hits (content filter, PII, disclaimers).
  • Test governance enforcement by submitting a prompt that triggers a policy, then verifying the cached replay also enforces it.
  • Monitor policy_replay_duration_ms to assess overhead of re-running policies on cache hits.

For leaders

  • Governance policies apply to cached responses — compliance controls are not bypassed by the cache layer.
  • Configurable enforcement modes let you balance policy thoroughness against cache-hit latency.
  • Audit logs distinguish between "policy applied at fill" and "policy re-applied at replay" for compliance reporting.

Next steps