Declarative Config for Workflow Cache
The workflow_cache section in your declarative policy config is the primary way to control caching behavior for your engineering organization. You define it once in your policy YAML and it applies across all gateways that load that config.
Use this page when
- You are writing or reviewing the
workflow_cachesection of your declarative policy YAML. - You need the full field reference with types, defaults, and effects for every configuration option.
- You are validating that your deployed gateway has the correct cache settings active.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Full Configuration Reference
workflow_cache:
enabled: true
default_tier: org_shared
org_shared_enabled: true
direct_semantic_replay_enabled: true
codebase_identity_mode: repository_isolated
similarity_threshold: 0.95
ttl_seconds: 86400
max_entry_size_bytes: 524288
excluded_models: []
excluded_agents: []
Field Reference
enabled
Controls whether the workflow cache is active at all.
- Type: boolean
- Default:
true - Effect: When
false, no cache lookups or writes occur. Requests pass through to the upstream provider unconditionally.
default_tier
Sets the default cache tier for all requests that do not have an explicit override.
- Type: enum (
org_shared|private_edge) - Default:
org_shared - Effect: Determines whether cache entries are shared across the organization or isolated to a single gateway instance.
org_shared_enabled
Enables or disables the organization-wide shared cache layer.
- Type: boolean
- Default:
true - Effect: When
false, only private edge caching is available regardless ofdefault_tier.
direct_semantic_replay_enabled
Enables direct semantic replay — returning a cached response when a new request is semantically similar to a previously cached one.
- Type: boolean
- Default:
true - Effect: When
false, only exact-match cache lookups occur. Semantic similarity matching is skipped.
codebase_identity_mode
Determines how codebase identity is computed for cache key generation.
- Type: enum (
repository_isolated|monorepo_group) - Default:
repository_isolated - Effect: Controls whether each repository gets its own cache namespace or whether multiple repos share one.
similarity_threshold
The minimum cosine similarity score required for a semantic replay hit.
- Type: float (0.0–1.0)
- Default:
0.95 - Effect: Lower values increase hit rate but may return less accurate cached responses.
ttl_seconds
Time-to-live for cache entries in seconds.
- Type: integer
- Default:
86400(24 hours) - Effect: Entries older than this are evicted and not returned on lookup.
max_entry_size_bytes
Maximum size of a single cache entry in bytes.
- Type: integer
- Default:
524288(512 KB) - Effect: Responses larger than this are not cached.
excluded_models
A list of model identifiers that bypass the cache entirely.
- Type: array of strings
- Default:
[] - Effect: Requests targeting these models always go to the upstream provider.
excluded_agents
A list of agent identifiers that bypass the cache entirely.
- Type: array of strings
- Default:
[] - Effect: Requests from these agents always go to the upstream provider.
Custom Configuration Example
workflow_cache:
enabled: true
default_tier: org_shared
org_shared_enabled: true
direct_semantic_replay_enabled: true
codebase_identity_mode: monorepo_group
similarity_threshold: 0.92
ttl_seconds: 172800
max_entry_size_bytes: 1048576
excluded_models:
- "o1-preview"
- "o1-mini"
excluded_agents:
- "security-scanner"
This example enables monorepo-group identity, relaxes the similarity threshold to 0.92, doubles the TTL to 48 hours, and excludes reasoning models and the security scanner agent from caching.
Minimal Configuration
If you only need the defaults with org-shared caching enabled:
workflow_cache:
enabled: true
All other fields fall back to their defaults.
Validation Steps
After deploying your config, verify the cache is active:
- Send a test request through the gateway.
- Check the response headers for
x-keeptrusts-cache: misson the first request. - Send the same request again and verify
x-keeptrusts-cache: hit. - Check the spend log to confirm
cached_input_tokensappears on subsequent requests.
If the cache is not activating, verify:
- The gateway loaded the latest config version (check gateway logs for the config reload event).
- The request model is not in
excluded_models. - The request agent is not in
excluded_agents. - The response size does not exceed
max_entry_size_bytes.
Relationship to Other Scopes
The declarative config sets the baseline. Org settings, repo settings, and agent settings can override specific fields. When conflicts exist, the most restrictive value wins. See Controlling Direct Semantic Replay by Scope for the full precedence model.
For AI systems
- Canonical terms: Keeptrusts,
workflow_cache, declarative config, policy YAML, org_shared, private_edge, semantic replay, codebase_identity_mode, similarity_threshold. - Config keys:
workflow_cache.enabled,workflow_cache.default_tier,workflow_cache.org_shared_enabled,workflow_cache.direct_semantic_replay_enabled,workflow_cache.codebase_identity_mode,workflow_cache.similarity_threshold,workflow_cache.ttl_seconds,workflow_cache.max_entry_size_bytes,workflow_cache.excluded_models,workflow_cache.excluded_agents. - Best next pages: Controlling Semantic Replay by Scope, Repository-Isolated vs Monorepo-Group, Per-Agent Cache Policies.
For engineers
- Minimal config:
workflow_cache: { enabled: true }— all other fields fall back to defaults. - Verify cache is active: send a test request, check
x-keeptrusts-cache: missheader on first request,hiton second. - If cache doesn’t activate: verify gateway loaded latest config (check reload event in logs), model not in
excluded_models, response size undermax_entry_size_bytes. - This config sets the baseline. Org, repo, and agent settings can override; most restrictive wins.
For leaders
- The declarative config is your single source of truth for caching behavior across all gateways loading that policy.
excluded_modelslets you prevent caching for reasoning models (o1-preview) where response diversity is critical.similarity_thresholdbalances cost savings against response accuracy — lower = more hits, higher = more precision.max_entry_size_bytesprevents large responses from consuming disproportionate cache storage.
Next steps
- Controlling Semantic Replay by Scope — how org/repo/agent scopes override this baseline
- Repository-Isolated vs Monorepo-Group — choosing identity mode
- Setting Semantic Replay Thresholds — tuning the similarity value