Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Troubleshooting Cache Misses

When your cache hit rate drops or specific queries consistently miss, you need a systematic approach to diagnosis. This guide walks you through the six most common causes of cache misses and provides step-by-step resolution for each.

Use this page when

  • You are investigating why cache hit rates are lower than expected.
  • You need to diagnose specific miss reasons (TTL expiry, key mismatch, invalidation, deny-list exclusion).
  • You want a systematic troubleshooting process for identifying and fixing cache miss patterns.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Quick Diagnosis Flowchart

Start here when you see unexpected cache misses:

  1. Is the cache backend reachable? → If no, see Backend Unreachable
  2. Is the warmer running? → If no, see Warmer Not Running
  3. Has the code changed recently? → If yes, see Stale Cache from Code Changes
  4. Does the requesting agent have entitlements? → If no, see Entitlement Mismatch
  5. Does the policy allow cache access? → If no, see Policy Mismatch
  6. Is this a never-before-seen query? → If yes, see New Query Pattern

Stale Cache (Code Changed)

Symptoms: Queries that previously hit the cache now miss. The miss_reason field shows stale_content or hash_mismatch. This typically happens after merges, deployments, or major refactors.

Step-by-step diagnosis:

  1. Check the miss event in Console → Cache → Recent Misses
  2. Look at the miss_reason field — if it shows stale_content, the cache entry exists but its content hash no longer matches the current repository state
  3. Identify which repository changed by checking the repo field on the miss event
  4. Verify the warmer has a pending job for that repository: Console → Cache → Warmer Jobs

Resolution:

  • If the warmer has a pending job, wait for it to complete. Warmers automatically detect code changes and refresh affected entries.
  • If no warmer job exists, trigger a manual refresh: Console → Cache → Repository → Refresh Now
  • For frequent code changes, reduce the warmer poll interval for that repository

Policy Mismatch

Symptoms: The miss_reason field shows policy_denied or policy_mismatch. The cache entry exists and is fresh, but the requesting agent's policy does not permit reading from the cache tier where the entry lives.

Step-by-step diagnosis:

  1. Identify the requesting agent from the miss event
  2. Check the agent's assigned policy: Console → Agents → [Agent] → Policy
  3. Look at the cache_access section of the policy
  4. Compare the cache tier of the entry with the tiers the policy allows

Resolution:

  • Update the agent's policy to include the cache tier where the entry resides
  • If the entry should live in a tier the agent can access, adjust the cache placement rules
  • Verify that the policy change propagates by checking the agent's next cache lookup
policy:
cache_access:
read_tiers:
- agent-local
- team-shared
- org-shared
pack:
name: troubleshooting-misses-example-1
version: 1.0.0
enabled: true
policies:
chain:
- cache_access

Entitlement Mismatch

Symptoms: The miss_reason field shows entitlement_denied. The cache entry exists but the requesting identity lacks the entitlement to access it. This commonly occurs when teams have strict data boundaries.

Step-by-step diagnosis:

  1. Check which team owns the cache entry (Console → Cache → Entry Details → Owner)
  2. Check the requesting agent's team membership
  3. Verify the org-shared cache sharing rules permit cross-team access for this content type
  4. Check if the entry was created with restricted sharing flags

Resolution:

  • If cross-team sharing is intended, update the sharing configuration for the owning team
  • If the requesting team should have their own entry, verify the warmer is configured to populate entries for that team
  • Review entitlement mirror settings if you use connector-based entitlements

No Match (New Query Pattern)

Symptoms: The miss_reason field shows no_match or not_found. No cache entry exists for the query. This is expected for genuinely new queries but problematic if it happens for queries that should be cached.

Step-by-step diagnosis:

  1. Check if the query pattern matches any existing cache keys (Console → Cache → Search)
  2. Verify the repository has been indexed by the warmer
  3. Check if the query uses a model or prompt format the cache recognizes
  4. Look for slight variations in the query that might prevent matching (whitespace, parameter ordering)

Resolution:

  • If the repository is not indexed, add it to the warmer configuration
  • If the query format is slightly different from cached entries, check your cache key normalization settings
  • For genuinely new patterns, this is expected behavior — the first request fills the cache for subsequent hits
  • Consider adding the query pattern to the warmer's seed list for proactive population

Warmer Not Running

Symptoms: Cache entries are not being refreshed or created. The warmer job queue shows no recent completions. New repositories are not being indexed.

Step-by-step diagnosis:

  1. Check warmer process health: Console → Cache → Warmers → Status
  2. Look at the last successful job timestamp
  3. Check warmer logs for errors or crashes
  4. Verify the warmer has connectivity to both the cache backend and the source repositories

Resolution:

  • If the warmer process crashed, restart it. Check logs for the crash cause.
  • If the warmer is running but stalled, check queue depth. A queue backlog indicates the warmer cannot keep up with demand.
  • Scale warmer concurrency if queue depth exceeds your acceptable threshold:
# Increase warmer parallelism
export KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8
  • If the warmer cannot reach repositories, check network connectivity and credentials
  • See Scaling Cache Warmers for detailed scaling guidance

Cache Backend Unreachable

Symptoms: All cache lookups miss. The miss_reason field shows backend_error or timeout. Backend health checks show red status.

Step-by-step diagnosis:

  1. Check backend health: Console → Cache → Health Dashboard → Backend Status
  2. Verify network connectivity from the gateway to the cache backend
  3. Check if the backend process is running and accepting connections
  4. Look for resource exhaustion (memory, CPU, disk) on the backend host

Resolution by backend:

Redis/Valkey unreachable

  • Verify the Redis process is running: check container or service status
  • Test connectivity: attempt a PING from the gateway host
  • Check memory usage — Redis evicts entries when memory is full
  • Review connection pool settings if you see connection refused errors

S3/GCS unreachable

  • Verify credentials are valid and not expired
  • Check IAM permissions on the bucket
  • Test bucket access from the gateway host with a simple object list
  • Check for regional outages in your cloud provider's status page

Qdrant unreachable

  • Verify the Qdrant cluster is running and all nodes are healthy
  • Check disk space — Qdrant requires disk for index and WAL
  • Test collection access with a simple search query
  • Review cluster logs for shard rebalancing or recovery operations

Verifying Your Fix

After applying any resolution, verify the fix by:

  1. Waiting for the next cache lookup from the affected agent or query pattern
  2. Checking that the miss_reason changes or disappears
  3. Monitoring the hit rate for the affected scope over the next hour
  4. Confirming the fix persists across multiple lookup cycles

Next steps

For AI systems

  • Canonical terms: Keeptrusts, cache miss, troubleshooting, miss reason, key mismatch, TTL expiry.
  • Config surface: miss reason codes, cache key inspection, invalidation logs, deny-list audit.
  • When diagnosing cache misses, reference the miss reason taxonomy and diagnostic steps from this page.

For engineers

  • Check miss_reason in event logs to identify the specific cause: TTL expiry, key mismatch, invalidation, or deny-list.
  • Use cache key inspection to compare expected vs actual key components when hit rates are low.
  • Correlate miss spikes with recent code pushes, config changes, or Fabric artifact rebuilds.

For leaders

  • Systematic miss troubleshooting prevents unnecessary cost increases from undiagnosed cache degradation.
  • Miss reason tracking enables data-driven configuration improvements rather than guessing.
  • Low hit rates are a diagnosable operational issue, not an inherent platform limitation.