Cache Keys with Mixed Knowledge Base and Fabric Context
When your prompts include context from both the Knowledge Base and the Codebase Context Fabric, the org-shared cache must incorporate both sources into its cache keys. This ensures that responses remain accurate as KB assets evolve and fabric indexes refresh.
Use this page when
- You need to understand how cache keys are computed when both Knowledge Base assets and Fabric artifacts contribute to a prompt.
- You are debugging low cache hit rates on prompts that combine KB and Fabric context.
- You want to tune key composition to maximize sharing across engineers with similar contexts.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Why Mixed Context Complicates Caching
A cache key must uniquely identify the full input context that produced a cached response. With a single context source, the key is straightforward — hash the prompt and the source version. With two sources that evolve independently, the cache must detect staleness from either direction:
- A KB asset gets a new version promoted → old cached responses no longer reflect current policy.
- The fabric index refreshes with new code → old cached responses reference outdated implementations.
Keeptrusts handles both scenarios through composite cache keys and source-specific invalidation rules.
Cache Key Components
For prompts with mixed context, the cache key incorporates:
org_id + model + prompt_hash + kb_asset_ids_with_versions + fabric_cache_keys_with_timestamps
Breakdown
| Component | Description |
|---|---|
org_id | Organization identifier — caches are org-scoped |
model | Language model identifier (e.g., gpt-4o, claude-sonnet) |
prompt_hash | SHA-256 hash of the user query text |
kb_asset_ids_with_versions | Sorted list of asset_id:version pairs for all KB chunks included |
fabric_cache_keys_with_timestamps | Sorted list of cache_key:indexed_at pairs for all fabric chunks included |
Example Key Construction
For a prompt that includes two KB assets and three fabric chunks:
org_id: org-uuid-123
model: gpt-4o
prompt_hash: sha256(user_query)
kb_assets: [asset-A:3, asset-B:1]
fabric_keys: [ws1:src/auth.ts:1714480200, ws1:src/middleware.ts:1714480200, ws1:src/types.ts:1714479600]
The final cache key is a hash of all these components combined:
cache_key = sha256(org-uuid-123 | gpt-4o | prompt_sha | asset-A:3 | asset-B:1 | ws1:src/auth.ts:1714480200 | ws1:src/middleware.ts:1714480200 | ws1:src/types.ts:1714479600)
Knowledge Base Version Pinning
KB assets in cache keys include their version number. When a new version of an asset is promoted to active:
- The asset's version increments (e.g., from
3to4). - Any cache entry that includes
asset-A:3no longer matches lookups that now resolveasset-A:4. - The next identical query triggers a cache miss, assembles fresh context with version 4, and stores a new cache entry.
This ensures that cached responses always reflect the currently promoted KB content. You never serve a response based on superseded policy.
Version Pinning Behavior
| Event | Cache Effect |
|---|---|
| New KB version promoted | Old entries stop matching; new entry created on next query |
| KB asset archived | Asset excluded from context selection; old entries stop matching |
| KB asset binding removed | Asset no longer selected for this gateway; old entries stop matching |
| KB asset content unchanged, re-promoted | Version number still increments; old entries invalidated |
Fabric Staleness Threshold
Fabric context includes indexed_at timestamps that record when each code chunk was last indexed. Unlike KB version pinning (which is exact), fabric staleness uses a configurable threshold:
cache:
fabric_staleness_threshold_seconds: 300
How Staleness Checking Works
When a cache lookup finds a matching entry, the system compares the fabric timestamps in the cached key against the current fabric index:
- For each fabric chunk in the cached entry, check
current_indexed_at - cached_indexed_at. - If any chunk's difference exceeds
fabric_staleness_threshold_seconds, the cache entry is considered stale. - A stale entry is not served — the system assembles fresh context and creates a new cache entry.
Threshold Tuning
| Threshold | Behavior |
|---|---|
60 (1 minute) | Very aggressive invalidation; cache hits only for rapid repeated queries |
300 (5 minutes) | Default; balances freshness with cache efficiency |
900 (15 minutes) | Relaxed; good for codebases with infrequent commits |
3600 (1 hour) | Very relaxed; suitable for stable codebases in maintenance mode |
Choose a threshold that matches your development velocity. Teams with high commit frequency benefit from shorter thresholds; teams with stable codebases can use longer thresholds for better cache hit rates.
TTL Interactions
The org-shared cache has an overall TTL (time-to-live) for entries:
cache:
ttl_seconds: 3600
fabric_staleness_threshold_seconds: 300
Three mechanisms can invalidate a cache entry:
- TTL expiry — the entry exceeds its maximum age regardless of content freshness.
- KB version change — a KB asset in the entry has a new promoted version.
- Fabric staleness — a fabric chunk in the entry has been re-indexed beyond the threshold.
The first condition to trigger wins. This means:
- A cache entry with
ttl_seconds: 3600can be invalidated after 5 minutes if the fabric refreshes. - A cache entry can be invalidated immediately (before TTL) if a KB asset gets a new version promoted.
- If neither KB nor fabric changes, the entry lives until TTL expiry.
Cache Efficiency With Mixed Context
Mixed-context prompts typically have lower cache hit rates than single-source prompts because there are more components that can trigger invalidation. To maximize cache efficiency:
Pin KB Asset Selection
Use explicit bindings to control which KB assets are selected. Fewer assets in the key means fewer version-change invalidation triggers.
Batch Fabric Indexing
If your fabric indexes on every commit, consider batching to index every N minutes. This reduces the frequency of timestamp changes in cache keys.
Separate High-Churn and Stable Context
If certain prompts mix highly stable KB content with rapidly changing fabric content, consider whether the fabric context is truly necessary for that prompt. Removing unnecessary fabric context improves cache hit rates.
Monitoring Cache Performance
Enable cache metrics to track hit rates and invalidation causes:
cache:
metrics:
enabled: true
report_invalidation_reason: true
The metrics endpoint reports:
| Metric | Description |
|---|---|
cache_hits_total | Total cache hits |
cache_misses_total | Total cache misses |
cache_invalidations_kb_version | Invalidations due to KB version changes |
cache_invalidations_fabric_stale | Invalidations due to fabric staleness |
cache_invalidations_ttl | Invalidations due to TTL expiry |
cache_entry_size_tokens_avg | Average token count in cached entries |
Use these metrics to tune your fabric_staleness_threshold_seconds and overall ttl_seconds for your workload.
Example: Full Cache Configuration
cache:
enabled: true
ttl_seconds: 3600
fabric_staleness_threshold_seconds: 300
max_entries_per_org: 10000
metrics:
enabled: true
report_invalidation_reason: true
context_budget:
total_tokens: 4096
knowledge_base_share: 0.5
fabric_share: 0.5
overflow_policy: "fill_from_other"
This configuration caches mixed-context responses for up to 1 hour, invalidates when fabric content is more than 5 minutes stale or KB assets get new versions, and tracks invalidation reasons for tuning.
Next steps
- Learn how context is selected and ranked in Joint Context Selection.
- Understand provenance tracking across cache boundaries in Provenance Separation.
- Review when to use each source in Knowledge Base vs Fabric: When to Use Each.
For AI systems
- Canonical terms: Keeptrusts, composite cache key, KB version pinning, fabric staleness threshold, cache invalidation, mixed context caching.
- Exact feature/config names:
cache.fabric_staleness_threshold_seconds,cache.ttl_seconds,kb_asset_ids_with_versions,fabric_cache_keys_with_timestamps,cache_invalidations_kb_versionmetric,cache_invalidations_fabric_stalemetric. - Best next pages: Joint Context Selection, Provenance Separation, Knowledge vs Fabric: When to Use Each.
For engineers
- Cache keys for mixed-context prompts:
sha256(org_id | model | prompt_hash | sorted_kb_asset:version_pairs | sorted_fabric_cache_key:indexed_at_pairs). - KB version pinning: promoting a new KB asset version automatically invalidates all cache entries referencing the old version.
- Fabric staleness threshold (default 300s): entries are stale if any fabric chunk's
current_indexed_at - cached_indexed_atexceeds the threshold. - Tune
fabric_staleness_threshold_secondsto your development velocity: 60s for high-commit teams, 900s for stable codebases. - Monitor
cache_invalidations_kb_versionandcache_invalidations_fabric_stalemetrics to understand your invalidation mix.
For leaders
- Mixed-context caching ensures responses always reflect currently promoted KB policies and recently indexed code — no stale policy guidance.
- The threshold model balances cache efficiency (higher hit rates with longer thresholds) against freshness (accuracy with shorter thresholds).
- KB version pinning provides immediate cache invalidation on policy changes — no TTL-based delay for compliance-critical updates.
- Metrics allow data-driven tuning: track invalidation reasons to optimize the balance between cost savings and content freshness.