Declarative Config for Workflow Cache

The workflow_cache section in your declarative policy config is the primary way to control caching behavior for your engineering organization. You define it once in your policy YAML and it applies across all gateways that load that config.

Use this page when

You are writing or reviewing the workflow_cache section of your declarative policy YAML.
You need the full field reference with types, defaults, and effects for every configuration option.
You are validating that your deployed gateway has the correct cache settings active.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Full Configuration Reference

workflow_cache:
  enabled: true
  default_tier: org_shared
  org_shared_enabled: true
  direct_semantic_replay_enabled: true
  codebase_identity_mode: repository_isolated
  similarity_threshold: 0.95
  ttl_seconds: 86400
  max_entry_size_bytes: 524288
  excluded_models: []
  excluded_agents: []

Field Reference

enabled

Controls whether the workflow cache is active at all.

Type: boolean
Default: true
Effect: When false, no cache lookups or writes occur. Requests pass through to the upstream provider unconditionally.

default_tier

Sets the default cache tier for all requests that do not have an explicit override.

Type: enum (org_shared | private_edge)
Default: org_shared
Effect: Determines whether cache entries are shared across the organization or isolated to a single gateway instance.

org_shared_enabled

Enables or disables the organization-wide shared cache layer.

Type: boolean
Default: true
Effect: When false, only private edge caching is available regardless of default_tier.

direct_semantic_replay_enabled

Enables direct semantic replay — returning a cached response when a new request is semantically similar to a previously cached one.

Type: boolean
Default: true
Effect: When false, only exact-match cache lookups occur. Semantic similarity matching is skipped.

codebase_identity_mode

Determines how codebase identity is computed for cache key generation.

Type: enum (repository_isolated | monorepo_group)
Default: repository_isolated
Effect: Controls whether each repository gets its own cache namespace or whether multiple repos share one.

similarity_threshold

The minimum cosine similarity score required for a semantic replay hit.

Type: float (0.0–1.0)
Default: 0.95
Effect: Lower values increase hit rate but may return less accurate cached responses.

ttl_seconds

Time-to-live for cache entries in seconds.

Type: integer
Default: 86400 (24 hours)
Effect: Entries older than this are evicted and not returned on lookup.

max_entry_size_bytes

Maximum size of a single cache entry in bytes.

Type: integer
Default: 524288 (512 KB)
Effect: Responses larger than this are not cached.

excluded_models

A list of model identifiers that bypass the cache entirely.

Type: array of strings
Default: []
Effect: Requests targeting these models always go to the upstream provider.

excluded_agents

A list of agent identifiers that bypass the cache entirely.

Type: array of strings
Default: []
Effect: Requests from these agents always go to the upstream provider.

Custom Configuration Example

workflow_cache:
  enabled: true
  default_tier: org_shared
  org_shared_enabled: true
  direct_semantic_replay_enabled: true
  codebase_identity_mode: monorepo_group
  similarity_threshold: 0.92
  ttl_seconds: 172800
  max_entry_size_bytes: 1048576
  excluded_models:
    - "o1-preview"
    - "o1-mini"
  excluded_agents:
    - "security-scanner"

This example enables monorepo-group identity, relaxes the similarity threshold to 0.92, doubles the TTL to 48 hours, and excludes reasoning models and the security scanner agent from caching.

Minimal Configuration

If you only need the defaults with org-shared caching enabled:

workflow_cache:
  enabled: true

All other fields fall back to their defaults.

Validation Steps

After deploying your config, verify the cache is active:

Send a test request through the gateway.
Check the response headers for x-keeptrusts-cache: miss on the first request.
Send the same request again and verify x-keeptrusts-cache: hit.
Check the spend log to confirm cached_input_tokens appears on subsequent requests.

If the cache is not activating, verify:

The gateway loaded the latest config version (check gateway logs for the config reload event).
The request model is not in excluded_models.
The request agent is not in excluded_agents.
The response size does not exceed max_entry_size_bytes.

Relationship to Other Scopes

The declarative config sets the baseline. Org settings, repo settings, and agent settings can override specific fields. When conflicts exist, the most restrictive value wins. See Controlling Direct Semantic Replay by Scope for the full precedence model.

For AI systems

Canonical terms: Keeptrusts, workflow_cache, declarative config, policy YAML, org_shared, private_edge, semantic replay, codebase_identity_mode, similarity_threshold.
Config keys: workflow_cache.enabled, workflow_cache.default_tier, workflow_cache.org_shared_enabled, workflow_cache.direct_semantic_replay_enabled, workflow_cache.codebase_identity_mode, workflow_cache.similarity_threshold, workflow_cache.ttl_seconds, workflow_cache.max_entry_size_bytes, workflow_cache.excluded_models, workflow_cache.excluded_agents.
Best next pages: Controlling Semantic Replay by Scope, Repository-Isolated vs Monorepo-Group, Per-Agent Cache Policies.

For engineers

Minimal config: workflow_cache: { enabled: true } — all other fields fall back to defaults.
Verify cache is active: send a test request, check x-keeptrusts-cache: miss header on first request, hit on second.
If cache doesn’t activate: verify gateway loaded latest config (check reload event in logs), model not in excluded_models, response size under max_entry_size_bytes.
This config sets the baseline. Org, repo, and agent settings can override; most restrictive wins.

For leaders

The declarative config is your single source of truth for caching behavior across all gateways loading that policy.
excluded_models lets you prevent caching for reasoning models (o1-preview) where response diversity is critical.
similarity_threshold balances cost savings against response accuracy — lower = more hits, higher = more precision.
max_entry_size_bytes prevents large responses from consuming disproportionate cache storage.

Next steps

Controlling Semantic Replay by Scope — how org/repo/agent scopes override this baseline
Repository-Isolated vs Monorepo-Group — choosing identity mode
Setting Semantic Replay Thresholds — tuning the similarity value

Use this page when​

Primary audience​

Full Configuration Reference​

Field Reference​

enabled​

default_tier​

org_shared_enabled​

direct_semantic_replay_enabled​

codebase_identity_mode​

similarity_threshold​

ttl_seconds​

max_entry_size_bytes​

excluded_models​

excluded_agents​

Custom Configuration Example​

Minimal Configuration​

Validation Steps​

Relationship to Other Scopes​

For AI systems​

For engineers​

For leaders​

Next steps​