Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cache Keys with Mixed Knowledge Base and Fabric Context

When your prompts include context from both the Knowledge Base and the Codebase Context Fabric, the org-shared cache must incorporate both sources into its cache keys. This ensures that responses remain accurate as KB assets evolve and fabric indexes refresh.

Use this page when

  • You need to understand how cache keys are computed when both Knowledge Base assets and Fabric artifacts contribute to a prompt.
  • You are debugging low cache hit rates on prompts that combine KB and Fabric context.
  • You want to tune key composition to maximize sharing across engineers with similar contexts.

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Why Mixed Context Complicates Caching

A cache key must uniquely identify the full input context that produced a cached response. With a single context source, the key is straightforward — hash the prompt and the source version. With two sources that evolve independently, the cache must detect staleness from either direction:

  • A KB asset gets a new version promoted → old cached responses no longer reflect current policy.
  • The fabric index refreshes with new code → old cached responses reference outdated implementations.

Keeptrusts handles both scenarios through composite cache keys and source-specific invalidation rules.

Cache Key Components

For prompts with mixed context, the cache key incorporates:

org_id + model + prompt_hash + kb_asset_ids_with_versions + fabric_cache_keys_with_timestamps

Breakdown

ComponentDescription
org_idOrganization identifier — caches are org-scoped
modelLanguage model identifier (e.g., gpt-4o, claude-sonnet)
prompt_hashSHA-256 hash of the user query text
kb_asset_ids_with_versionsSorted list of asset_id:version pairs for all KB chunks included
fabric_cache_keys_with_timestampsSorted list of cache_key:indexed_at pairs for all fabric chunks included

Example Key Construction

For a prompt that includes two KB assets and three fabric chunks:

org_id: org-uuid-123
model: gpt-4o
prompt_hash: sha256(user_query)
kb_assets: [asset-A:3, asset-B:1]
fabric_keys: [ws1:src/auth.ts:1714480200, ws1:src/middleware.ts:1714480200, ws1:src/types.ts:1714479600]

The final cache key is a hash of all these components combined:

cache_key = sha256(org-uuid-123 | gpt-4o | prompt_sha | asset-A:3 | asset-B:1 | ws1:src/auth.ts:1714480200 | ws1:src/middleware.ts:1714480200 | ws1:src/types.ts:1714479600)

Knowledge Base Version Pinning

KB assets in cache keys include their version number. When a new version of an asset is promoted to active:

  1. The asset's version increments (e.g., from 3 to 4).
  2. Any cache entry that includes asset-A:3 no longer matches lookups that now resolve asset-A:4.
  3. The next identical query triggers a cache miss, assembles fresh context with version 4, and stores a new cache entry.

This ensures that cached responses always reflect the currently promoted KB content. You never serve a response based on superseded policy.

Version Pinning Behavior

EventCache Effect
New KB version promotedOld entries stop matching; new entry created on next query
KB asset archivedAsset excluded from context selection; old entries stop matching
KB asset binding removedAsset no longer selected for this gateway; old entries stop matching
KB asset content unchanged, re-promotedVersion number still increments; old entries invalidated

Fabric Staleness Threshold

Fabric context includes indexed_at timestamps that record when each code chunk was last indexed. Unlike KB version pinning (which is exact), fabric staleness uses a configurable threshold:

cache:
fabric_staleness_threshold_seconds: 300

How Staleness Checking Works

When a cache lookup finds a matching entry, the system compares the fabric timestamps in the cached key against the current fabric index:

  1. For each fabric chunk in the cached entry, check current_indexed_at - cached_indexed_at.
  2. If any chunk's difference exceeds fabric_staleness_threshold_seconds, the cache entry is considered stale.
  3. A stale entry is not served — the system assembles fresh context and creates a new cache entry.

Threshold Tuning

ThresholdBehavior
60 (1 minute)Very aggressive invalidation; cache hits only for rapid repeated queries
300 (5 minutes)Default; balances freshness with cache efficiency
900 (15 minutes)Relaxed; good for codebases with infrequent commits
3600 (1 hour)Very relaxed; suitable for stable codebases in maintenance mode

Choose a threshold that matches your development velocity. Teams with high commit frequency benefit from shorter thresholds; teams with stable codebases can use longer thresholds for better cache hit rates.

TTL Interactions

The org-shared cache has an overall TTL (time-to-live) for entries:

cache:
ttl_seconds: 3600
fabric_staleness_threshold_seconds: 300

Three mechanisms can invalidate a cache entry:

  1. TTL expiry — the entry exceeds its maximum age regardless of content freshness.
  2. KB version change — a KB asset in the entry has a new promoted version.
  3. Fabric staleness — a fabric chunk in the entry has been re-indexed beyond the threshold.

The first condition to trigger wins. This means:

  • A cache entry with ttl_seconds: 3600 can be invalidated after 5 minutes if the fabric refreshes.
  • A cache entry can be invalidated immediately (before TTL) if a KB asset gets a new version promoted.
  • If neither KB nor fabric changes, the entry lives until TTL expiry.

Cache Efficiency With Mixed Context

Mixed-context prompts typically have lower cache hit rates than single-source prompts because there are more components that can trigger invalidation. To maximize cache efficiency:

Pin KB Asset Selection

Use explicit bindings to control which KB assets are selected. Fewer assets in the key means fewer version-change invalidation triggers.

Batch Fabric Indexing

If your fabric indexes on every commit, consider batching to index every N minutes. This reduces the frequency of timestamp changes in cache keys.

Separate High-Churn and Stable Context

If certain prompts mix highly stable KB content with rapidly changing fabric content, consider whether the fabric context is truly necessary for that prompt. Removing unnecessary fabric context improves cache hit rates.

Monitoring Cache Performance

Enable cache metrics to track hit rates and invalidation causes:

cache:
metrics:
enabled: true
report_invalidation_reason: true

The metrics endpoint reports:

MetricDescription
cache_hits_totalTotal cache hits
cache_misses_totalTotal cache misses
cache_invalidations_kb_versionInvalidations due to KB version changes
cache_invalidations_fabric_staleInvalidations due to fabric staleness
cache_invalidations_ttlInvalidations due to TTL expiry
cache_entry_size_tokens_avgAverage token count in cached entries

Use these metrics to tune your fabric_staleness_threshold_seconds and overall ttl_seconds for your workload.

Example: Full Cache Configuration

cache:
enabled: true
ttl_seconds: 3600
fabric_staleness_threshold_seconds: 300
max_entries_per_org: 10000
metrics:
enabled: true
report_invalidation_reason: true
context_budget:
total_tokens: 4096
knowledge_base_share: 0.5
fabric_share: 0.5
overflow_policy: "fill_from_other"

This configuration caches mixed-context responses for up to 1 hour, invalidates when fabric content is more than 5 minutes stale or KB assets get new versions, and tracks invalidation reasons for tuning.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, composite cache key, KB version pinning, fabric staleness threshold, cache invalidation, mixed context caching.
  • Exact feature/config names: cache.fabric_staleness_threshold_seconds, cache.ttl_seconds, kb_asset_ids_with_versions, fabric_cache_keys_with_timestamps, cache_invalidations_kb_version metric, cache_invalidations_fabric_stale metric.
  • Best next pages: Joint Context Selection, Provenance Separation, Knowledge vs Fabric: When to Use Each.

For engineers

  • Cache keys for mixed-context prompts: sha256(org_id | model | prompt_hash | sorted_kb_asset:version_pairs | sorted_fabric_cache_key:indexed_at_pairs).
  • KB version pinning: promoting a new KB asset version automatically invalidates all cache entries referencing the old version.
  • Fabric staleness threshold (default 300s): entries are stale if any fabric chunk's current_indexed_at - cached_indexed_at exceeds the threshold.
  • Tune fabric_staleness_threshold_seconds to your development velocity: 60s for high-commit teams, 900s for stable codebases.
  • Monitor cache_invalidations_kb_version and cache_invalidations_fabric_stale metrics to understand your invalidation mix.

For leaders

  • Mixed-context caching ensures responses always reflect currently promoted KB policies and recently indexed code — no stale policy guidance.
  • The threshold model balances cache efficiency (higher hit rates with longer thresholds) against freshness (accuracy with shorter thresholds).
  • KB version pinning provides immediate cache invalidation on policy changes — no TTL-based delay for compliance-critical updates.
  • Metrics allow data-driven tuning: track invalidation reasons to optimize the balance between cost savings and content freshness.