Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Declarative Config for Workflow Cache

The workflow_cache section in your declarative policy config is the primary way to control caching behavior for your engineering organization. You define it once in your policy YAML and it applies across all gateways that load that config.

Use this page when

  • You are writing or reviewing the workflow_cache section of your declarative policy YAML.
  • You need the full field reference with types, defaults, and effects for every configuration option.
  • You are validating that your deployed gateway has the correct cache settings active.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Full Configuration Reference

workflow_cache:
enabled: true
default_tier: org_shared
org_shared_enabled: true
direct_semantic_replay_enabled: true
codebase_identity_mode: repository_isolated
similarity_threshold: 0.95
ttl_seconds: 86400
max_entry_size_bytes: 524288
excluded_models: []
excluded_agents: []

Field Reference

enabled

Controls whether the workflow cache is active at all.

  • Type: boolean
  • Default: true
  • Effect: When false, no cache lookups or writes occur. Requests pass through to the upstream provider unconditionally.

default_tier

Sets the default cache tier for all requests that do not have an explicit override.

  • Type: enum (org_shared | private_edge)
  • Default: org_shared
  • Effect: Determines whether cache entries are shared across the organization or isolated to a single gateway instance.

org_shared_enabled

Enables or disables the organization-wide shared cache layer.

  • Type: boolean
  • Default: true
  • Effect: When false, only private edge caching is available regardless of default_tier.

direct_semantic_replay_enabled

Enables direct semantic replay — returning a cached response when a new request is semantically similar to a previously cached one.

  • Type: boolean
  • Default: true
  • Effect: When false, only exact-match cache lookups occur. Semantic similarity matching is skipped.

codebase_identity_mode

Determines how codebase identity is computed for cache key generation.

  • Type: enum (repository_isolated | monorepo_group)
  • Default: repository_isolated
  • Effect: Controls whether each repository gets its own cache namespace or whether multiple repos share one.

similarity_threshold

The minimum cosine similarity score required for a semantic replay hit.

  • Type: float (0.0–1.0)
  • Default: 0.95
  • Effect: Lower values increase hit rate but may return less accurate cached responses.

ttl_seconds

Time-to-live for cache entries in seconds.

  • Type: integer
  • Default: 86400 (24 hours)
  • Effect: Entries older than this are evicted and not returned on lookup.

max_entry_size_bytes

Maximum size of a single cache entry in bytes.

  • Type: integer
  • Default: 524288 (512 KB)
  • Effect: Responses larger than this are not cached.

excluded_models

A list of model identifiers that bypass the cache entirely.

  • Type: array of strings
  • Default: []
  • Effect: Requests targeting these models always go to the upstream provider.

excluded_agents

A list of agent identifiers that bypass the cache entirely.

  • Type: array of strings
  • Default: []
  • Effect: Requests from these agents always go to the upstream provider.

Custom Configuration Example

workflow_cache:
enabled: true
default_tier: org_shared
org_shared_enabled: true
direct_semantic_replay_enabled: true
codebase_identity_mode: monorepo_group
similarity_threshold: 0.92
ttl_seconds: 172800
max_entry_size_bytes: 1048576
excluded_models:
- "o1-preview"
- "o1-mini"
excluded_agents:
- "security-scanner"

This example enables monorepo-group identity, relaxes the similarity threshold to 0.92, doubles the TTL to 48 hours, and excludes reasoning models and the security scanner agent from caching.

Minimal Configuration

If you only need the defaults with org-shared caching enabled:

workflow_cache:
enabled: true

All other fields fall back to their defaults.

Validation Steps

After deploying your config, verify the cache is active:

  1. Send a test request through the gateway.
  2. Check the response headers for x-keeptrusts-cache: miss on the first request.
  3. Send the same request again and verify x-keeptrusts-cache: hit.
  4. Check the spend log to confirm cached_input_tokens appears on subsequent requests.

If the cache is not activating, verify:

  • The gateway loaded the latest config version (check gateway logs for the config reload event).
  • The request model is not in excluded_models.
  • The request agent is not in excluded_agents.
  • The response size does not exceed max_entry_size_bytes.

Relationship to Other Scopes

The declarative config sets the baseline. Org settings, repo settings, and agent settings can override specific fields. When conflicts exist, the most restrictive value wins. See Controlling Direct Semantic Replay by Scope for the full precedence model.

For AI systems

  • Canonical terms: Keeptrusts, workflow_cache, declarative config, policy YAML, org_shared, private_edge, semantic replay, codebase_identity_mode, similarity_threshold.
  • Config keys: workflow_cache.enabled, workflow_cache.default_tier, workflow_cache.org_shared_enabled, workflow_cache.direct_semantic_replay_enabled, workflow_cache.codebase_identity_mode, workflow_cache.similarity_threshold, workflow_cache.ttl_seconds, workflow_cache.max_entry_size_bytes, workflow_cache.excluded_models, workflow_cache.excluded_agents.
  • Best next pages: Controlling Semantic Replay by Scope, Repository-Isolated vs Monorepo-Group, Per-Agent Cache Policies.

For engineers

  • Minimal config: workflow_cache: { enabled: true } — all other fields fall back to defaults.
  • Verify cache is active: send a test request, check x-keeptrusts-cache: miss header on first request, hit on second.
  • If cache doesn’t activate: verify gateway loaded latest config (check reload event in logs), model not in excluded_models, response size under max_entry_size_bytes.
  • This config sets the baseline. Org, repo, and agent settings can override; most restrictive wins.

For leaders

  • The declarative config is your single source of truth for caching behavior across all gateways loading that policy.
  • excluded_models lets you prevent caching for reasoning models (o1-preview) where response diversity is critical.
  • similarity_threshold balances cost savings against response accuracy — lower = more hits, higher = more precision.
  • max_entry_size_bytes prevents large responses from consuming disproportionate cache storage.

Next steps