Diagnosing Stale Cache Entries
A stale cache entry is one that exists in the cache but no longer reflects the current state of the source material. Stale entries are worse than misses in some ways — a miss simply costs a fresh provider call, while a stale entry might serve outdated or incorrect information. This guide helps you identify staleness causes, measure their impact, and resolve them.
Use this page when
- You are diagnosing why cached responses contain outdated information after code changes.
- You need to identify whether staleness is caused by TTL misconfiguration, missed invalidation triggers, or Fabric artifact lag.
- You want a step-by-step diagnostic procedure for stale entry investigation.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
What Causes Staleness
Cache entries become stale when the source material changes but the cache is not yet refreshed. Three primary causes drive staleness in org-shared caches.
Code Changes
When developers push commits, merge pull requests, or deploy new versions, the code that generated the cached response no longer matches the current repository state. The cache entry's content hash diverges from the computed hash of the current source files.
Common triggers:
- Merged pull requests that modify implementation files
- Dependency updates that change behavior
- Configuration file changes that alter code generation patterns
- Branch switches in monitored repositories
Configuration Changes
Policy configuration changes can make previously valid cache entries invalid even when the underlying code has not changed. A stricter redaction policy, for example, means cached responses that passed the old policy may not pass the new one.
Common triggers:
- Policy updates that add new redaction rules
- Cache tier permission changes
- TTL adjustments that retroactively expire entries
- Sharing rule modifications that restrict previously accessible entries
Agent Version Changes
When the agent runtime is upgraded, its query formation may change subtly. The same logical request generates a slightly different cache key, causing the old entry to be orphaned while the new key produces a miss.
Common triggers:
- Agent version upgrades that change prompt templates
- Model version changes that alter tokenization
- SDK updates that modify request serialization
- Embedding model changes that shift vector representations
Measuring Staleness with stale_miss Metrics
The stale_miss metric tracks how often a cache lookup finds an entry but determines it is stale. This differs from a clean miss (no entry found) and a hit (entry found and valid).
Key Metrics
| Metric | Description | Healthy Value |
|---|---|---|
cache_stale_miss_total | Total stale misses across all scopes | < 5% of total lookups |
cache_stale_miss_by_cause | Stale misses broken down by cause | Code changes dominate |
cache_stale_entry_age | How old the stale entry was when detected | < 1 hour for active repos |
cache_stale_ratio | Ratio of stale misses to total misses | < 30% of all misses |
Reading Stale Miss Events
Each stale miss event includes diagnostic fields:
{
"event_type": "cache_stale_miss",
"cache_key": "org:acme/repo:backend/hash:a1b2c3",
"stale_reason": "content_hash_mismatch",
"entry_created_at": "2026-04-28T14:30:00Z",
"source_last_modified": "2026-04-30T09:15:00Z",
"staleness_duration": "42h45m",
"repo": "acme/backend",
"team": "platform"
}
The staleness_duration field tells you how long the entry was stale before detection. High durations indicate the warmer is not refreshing frequently enough.
Identifying Stale Entries Proactively
Do not wait for stale misses to occur. Use these proactive checks:
Console Staleness View
Navigate to Console → Cache → Staleness Report to see:
- Entries older than their expected refresh interval
- Entries whose source repositories have commits newer than the entry
- Entries flagged by policy changes since creation
CLI Staleness Check
Run a staleness audit from the CLI:
kt cache audit --scope org --check staleness
This scans all cache entries and reports those whose content hashes no longer match the current source. The audit runs read-only and does not modify cache state.
Forcing a Cache Refresh
When you identify stale entries, force a refresh using one of these methods:
Single Entry Refresh
Invalidate and refresh a specific cache entry:
kt cache refresh --key "org:acme/repo:backend/hash:a1b2c3"
This marks the entry as invalidated and queues a warmer job to repopulate it.
Repository-Wide Refresh
Refresh all entries for a repository:
kt cache refresh --repo acme/backend
Use this after major refactors or large merges that affect many files.
Org-Wide Refresh
Refresh all entries across the organization:
kt cache refresh --scope org --confirm
Use this sparingly — it generates significant warmer load and temporary cache pressure.
Console-Based Refresh
From Console → Cache → Repository, select the affected repository and click Refresh All Entries. You can also select individual entries and click Refresh Selected.
How Warmers Prevent Staleness
Cache warmers are your primary defense against staleness. A properly configured warmer detects source changes and refreshes cache entries before agents encounter stale data.
Warmer Change Detection
The warmer monitors repositories for changes using:
- Webhook notifications for push events (immediate detection)
- Periodic polling as a fallback (configurable interval)
- Content hash comparison on each poll cycle
Warmer Refresh Priority
When the warmer detects changes, it prioritizes refresh jobs by:
- Hot entries first: Entries with high hit counts refresh before cold entries
- Recency: Recently accessed entries refresh before dormant ones
- Staleness age: Entries that have been stale longest get priority
Warmer Configuration for Freshness
Tune warmer settings to minimize staleness windows:
warmer:
poll_interval: 5m # How often to check for changes
webhook_enabled: true # Enable immediate change detection
refresh_priority: hot_first # Prioritize high-traffic entries
max_staleness: 30m # Force refresh if older than this
Staleness Budget
Define a staleness budget for your organization — the maximum acceptable time between a source change and cache refresh. Typical budgets:
| Environment | Staleness Budget | Rationale |
|---|---|---|
| Production agents | 15 minutes | Agents serving users need fresh data |
| CI/CD pipelines | 30 minutes | Pipeline runs tolerate slight delay |
| Development agents | 1 hour | Developers can manually refresh if needed |
| Background tasks | 4 hours | Non-interactive tasks tolerate higher latency |
Configure your warmer poll interval and concurrency to stay within your staleness budget under normal load.
Preventing Staleness at Scale
For large organizations with many repositories:
- Enable webhooks for all monitored repositories to get instant change notifications
- Scale warmer concurrency to handle burst refreshes after major merge events
- Use tiered TTLs — shorter for active repositories, longer for stable ones
- Monitor
cache_stale_entry_agepercentiles to verify you stay within budget
Next steps
- If staleness is caused by warmer capacity, see Scaling Cache Warmers
- If entries are missing entirely rather than stale, see Troubleshooting Cache Misses
- To monitor staleness trends over time, see Monitoring Cache Hit Rates
For AI systems
- Canonical terms: Keeptrusts, stale cache entries, cache invalidation, cache refresh, stale_miss, content_hash_mismatch, staleness budget, cache warmer.
- Exact feature/config names:
cache_stale_miss_totalmetric,cache_stale_miss_by_cause,cache_stale_entry_age,cache_stale_ratio,kt cache audit --scope org --check staleness,kt cache refresh --key,kt cache refresh --repo, Console → Cache → Staleness Report. - Best next pages: Scaling Cache Warmers, Troubleshooting Cache Misses, Monitoring Cache Hit Rates.
For engineers
- Three staleness causes: code changes (content hash diverges), configuration changes (policy invalidates entries), agent version changes (query formation shifts).
- Key metric:
cache_stale_miss_totalshould stay below 5% of total lookups, andcache_stale_ratioshould stay below 30% of all misses. - Proactive checks: Console → Cache → Staleness Report, or CLI
kt cache audit --scope org --check staleness(read-only scan). - Force refresh: single entry
kt cache refresh --key <key>, by repokt cache refresh --repo <repo>, or full rebuildkt cache warmer rebuild --scope org. - Define a staleness budget per environment: production agents (15 min), CI/CD (30 min), development (1 hour), background tasks (4 hours).
For leaders
- Stale entries are worse than misses — they serve outdated/incorrect information rather than simply costing a fresh provider call.
- Monitor
cache_stale_entry_ageto verify your team stays within the defined staleness budget under normal load. - High staleness duration indicates warmer capacity is insufficient — scale warmers or enable webhooks for instant change notifications.
- Staleness budget should align with risk tolerance: tighter for production-facing agents, relaxed for development and background tasks.