Cache Keys with Mixed Knowledge Base and Fabric Context

When your prompts include context from both the Knowledge Base and the Codebase Context Fabric, the org-shared cache must incorporate both sources into its cache keys. This ensures that responses remain accurate as KB assets evolve and fabric indexes refresh.

Use this page when

You need to understand how cache keys are computed when both Knowledge Base assets and Fabric artifacts contribute to a prompt.
You are debugging low cache hit rates on prompts that combine KB and Fabric context.
You want to tune key composition to maximize sharing across engineers with similar contexts.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Why Mixed Context Complicates Caching

A cache key must uniquely identify the full input context that produced a cached response. With a single context source, the key is straightforward — hash the prompt and the source version. With two sources that evolve independently, the cache must detect staleness from either direction:

A KB asset gets a new version promoted → old cached responses no longer reflect current policy.
The fabric index refreshes with new code → old cached responses reference outdated implementations.

Keeptrusts handles both scenarios through composite cache keys and source-specific invalidation rules.

Cache Key Components

For prompts with mixed context, the cache key incorporates:

org_id + model + prompt_hash + kb_asset_ids_with_versions + fabric_cache_keys_with_timestamps

Breakdown

Component	Description
`org_id`	Organization identifier — caches are org-scoped
`model`	Language model identifier (e.g., `gpt-4o`, `claude-sonnet`)
`prompt_hash`	SHA-256 hash of the user query text
`kb_asset_ids_with_versions`	Sorted list of `asset_id:version` pairs for all KB chunks included
`fabric_cache_keys_with_timestamps`	Sorted list of `cache_key:indexed_at` pairs for all fabric chunks included

Example Key Construction

For a prompt that includes two KB assets and three fabric chunks:

org_id: org-uuid-123
model: gpt-4o
prompt_hash: sha256(user_query)
kb_assets: [asset-A:3, asset-B:1]
fabric_keys: [ws1:src/auth.ts:1714480200, ws1:src/middleware.ts:1714480200, ws1:src/types.ts:1714479600]

The final cache key is a hash of all these components combined:

cache_key = sha256(org-uuid-123 | gpt-4o | prompt_sha | asset-A:3 | asset-B:1 | ws1:src/auth.ts:1714480200 | ws1:src/middleware.ts:1714480200 | ws1:src/types.ts:1714479600)

Knowledge Base Version Pinning

KB assets in cache keys include their version number. When a new version of an asset is promoted to active:

The asset's version increments (e.g., from 3 to 4).
Any cache entry that includes asset-A:3 no longer matches lookups that now resolve asset-A:4.
The next identical query triggers a cache miss, assembles fresh context with version 4, and stores a new cache entry.

This ensures that cached responses always reflect the currently promoted KB content. You never serve a response based on superseded policy.

Version Pinning Behavior

Event	Cache Effect
New KB version promoted	Old entries stop matching; new entry created on next query
KB asset archived	Asset excluded from context selection; old entries stop matching
KB asset binding removed	Asset no longer selected for this gateway; old entries stop matching
KB asset content unchanged, re-promoted	Version number still increments; old entries invalidated

Fabric Staleness Threshold

Fabric context includes indexed_at timestamps that record when each code chunk was last indexed. Unlike KB version pinning (which is exact), fabric staleness uses a configurable threshold:

cache:
  fabric_staleness_threshold_seconds: 300

How Staleness Checking Works

When a cache lookup finds a matching entry, the system compares the fabric timestamps in the cached key against the current fabric index:

For each fabric chunk in the cached entry, check current_indexed_at - cached_indexed_at.
If any chunk's difference exceeds fabric_staleness_threshold_seconds, the cache entry is considered stale.
A stale entry is not served — the system assembles fresh context and creates a new cache entry.

Threshold Tuning

Threshold	Behavior
`60` (1 minute)	Very aggressive invalidation; cache hits only for rapid repeated queries
`300` (5 minutes)	Default; balances freshness with cache efficiency
`900` (15 minutes)	Relaxed; good for codebases with infrequent commits
`3600` (1 hour)	Very relaxed; suitable for stable codebases in maintenance mode

Choose a threshold that matches your development velocity. Teams with high commit frequency benefit from shorter thresholds; teams with stable codebases can use longer thresholds for better cache hit rates.

TTL Interactions

The org-shared cache has an overall TTL (time-to-live) for entries:

cache:
  ttl_seconds: 3600
  fabric_staleness_threshold_seconds: 300

Three mechanisms can invalidate a cache entry:

TTL expiry — the entry exceeds its maximum age regardless of content freshness.
KB version change — a KB asset in the entry has a new promoted version.
Fabric staleness — a fabric chunk in the entry has been re-indexed beyond the threshold.

The first condition to trigger wins. This means:

A cache entry with ttl_seconds: 3600 can be invalidated after 5 minutes if the fabric refreshes.
A cache entry can be invalidated immediately (before TTL) if a KB asset gets a new version promoted.
If neither KB nor fabric changes, the entry lives until TTL expiry.

Cache Efficiency With Mixed Context

Mixed-context prompts typically have lower cache hit rates than single-source prompts because there are more components that can trigger invalidation. To maximize cache efficiency:

Pin KB Asset Selection

Use explicit bindings to control which KB assets are selected. Fewer assets in the key means fewer version-change invalidation triggers.

Batch Fabric Indexing

If your fabric indexes on every commit, consider batching to index every N minutes. This reduces the frequency of timestamp changes in cache keys.

Separate High-Churn and Stable Context

If certain prompts mix highly stable KB content with rapidly changing fabric content, consider whether the fabric context is truly necessary for that prompt. Removing unnecessary fabric context improves cache hit rates.

Monitoring Cache Performance

Enable cache metrics to track hit rates and invalidation causes:

cache:
  metrics:
    enabled: true
    report_invalidation_reason: true

The metrics endpoint reports:

Metric	Description
`cache_hits_total`	Total cache hits
`cache_misses_total`	Total cache misses
`cache_invalidations_kb_version`	Invalidations due to KB version changes
`cache_invalidations_fabric_stale`	Invalidations due to fabric staleness
`cache_invalidations_ttl`	Invalidations due to TTL expiry
`cache_entry_size_tokens_avg`	Average token count in cached entries

Use these metrics to tune your fabric_staleness_threshold_seconds and overall ttl_seconds for your workload.

Example: Full Cache Configuration

cache:
  enabled: true
  ttl_seconds: 3600
  fabric_staleness_threshold_seconds: 300
  max_entries_per_org: 10000
  metrics:
    enabled: true
    report_invalidation_reason: true
context_budget:
  total_tokens: 4096
  knowledge_base_share: 0.5
  fabric_share: 0.5
  overflow_policy: "fill_from_other"

This configuration caches mixed-context responses for up to 1 hour, invalidates when fabric content is more than 5 minutes stale or KB assets get new versions, and tracks invalidation reasons for tuning.

Next steps

Learn how context is selected and ranked in Joint Context Selection.
Understand provenance tracking across cache boundaries in Provenance Separation.
Review when to use each source in Knowledge Base vs Fabric: When to Use Each.

For AI systems

Canonical terms: Keeptrusts, composite cache key, KB version pinning, fabric staleness threshold, cache invalidation, mixed context caching.
Exact feature/config names: cache.fabric_staleness_threshold_seconds, cache.ttl_seconds, kb_asset_ids_with_versions, fabric_cache_keys_with_timestamps, cache_invalidations_kb_version metric, cache_invalidations_fabric_stale metric.
Best next pages: Joint Context Selection, Provenance Separation, Knowledge vs Fabric: When to Use Each.

For engineers

Cache keys for mixed-context prompts: sha256(org_id | model | prompt_hash | sorted_kb_asset:version_pairs | sorted_fabric_cache_key:indexed_at_pairs).
KB version pinning: promoting a new KB asset version automatically invalidates all cache entries referencing the old version.
Fabric staleness threshold (default 300s): entries are stale if any fabric chunk's current_indexed_at - cached_indexed_at exceeds the threshold.
Tune fabric_staleness_threshold_seconds to your development velocity: 60s for high-commit teams, 900s for stable codebases.
Monitor cache_invalidations_kb_version and cache_invalidations_fabric_stale metrics to understand your invalidation mix.

For leaders

Mixed-context caching ensures responses always reflect currently promoted KB policies and recently indexed code — no stale policy guidance.
The threshold model balances cache efficiency (higher hit rates with longer thresholds) against freshness (accuracy with shorter thresholds).
KB version pinning provides immediate cache invalidation on policy changes — no TTL-based delay for compliance-critical updates.
Metrics allow data-driven tuning: track invalidation reasons to optimize the balance between cost savings and content freshness.

Use this page when​

Primary audience​

Why Mixed Context Complicates Caching​

Cache Key Components​

Breakdown​

Example Key Construction​

Knowledge Base Version Pinning​

Version Pinning Behavior​

Fabric Staleness Threshold​

How Staleness Checking Works​

Threshold Tuning​

TTL Interactions​

Cache Efficiency With Mixed Context​

Pin KB Asset Selection​

Batch Fabric Indexing​

Separate High-Churn and Stable Context​

Monitoring Cache Performance​

Example: Full Cache Configuration​

Next steps​

For AI systems​

For engineers​

For leaders​