Troubleshooting Cache Misses

When your cache hit rate drops or specific queries consistently miss, you need a systematic approach to diagnosis. This guide walks you through the six most common causes of cache misses and provides step-by-step resolution for each.

Use this page when

You are investigating why cache hit rates are lower than expected.
You need to diagnose specific miss reasons (TTL expiry, key mismatch, invalidation, deny-list exclusion).
You want a systematic troubleshooting process for identifying and fixing cache miss patterns.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Quick Diagnosis Flowchart

Start here when you see unexpected cache misses:

Is the cache backend reachable? → If no, see Backend Unreachable
Is the warmer running? → If no, see Warmer Not Running
Has the code changed recently? → If yes, see Stale Cache from Code Changes
Does the requesting agent have entitlements? → If no, see Entitlement Mismatch
Does the policy allow cache access? → If no, see Policy Mismatch
Is this a never-before-seen query? → If yes, see New Query Pattern

Stale Cache (Code Changed)

Symptoms: Queries that previously hit the cache now miss. The miss_reason field shows stale_content or hash_mismatch. This typically happens after merges, deployments, or major refactors.

Step-by-step diagnosis:

Check the miss event in Console → Cache → Recent Misses
Look at the miss_reason field — if it shows stale_content, the cache entry exists but its content hash no longer matches the current repository state
Identify which repository changed by checking the repo field on the miss event
Verify the warmer has a pending job for that repository: Console → Cache → Warmer Jobs

Resolution:

If the warmer has a pending job, wait for it to complete. Warmers automatically detect code changes and refresh affected entries.
If no warmer job exists, trigger a manual refresh: Console → Cache → Repository → Refresh Now
For frequent code changes, reduce the warmer poll interval for that repository

Policy Mismatch

Symptoms: The miss_reason field shows policy_denied or policy_mismatch. The cache entry exists and is fresh, but the requesting agent's policy does not permit reading from the cache tier where the entry lives.

Step-by-step diagnosis:

Identify the requesting agent from the miss event
Check the agent's assigned policy: Console → Agents → [Agent] → Policy
Look at the cache_access section of the policy
Compare the cache tier of the entry with the tiers the policy allows

Resolution:

Update the agent's policy to include the cache tier where the entry resides
If the entry should live in a tier the agent can access, adjust the cache placement rules
Verify that the policy change propagates by checking the agent's next cache lookup

policy:
  cache_access:
    read_tiers:
    - agent-local
    - team-shared
    - org-shared
pack:
  name: troubleshooting-misses-example-1
  version: 1.0.0
  enabled: true
policies:
  chain:
  - cache_access

Entitlement Mismatch

Symptoms: The miss_reason field shows entitlement_denied. The cache entry exists but the requesting identity lacks the entitlement to access it. This commonly occurs when teams have strict data boundaries.

Step-by-step diagnosis:

Check which team owns the cache entry (Console → Cache → Entry Details → Owner)
Check the requesting agent's team membership
Verify the org-shared cache sharing rules permit cross-team access for this content type
Check if the entry was created with restricted sharing flags

Resolution:

If cross-team sharing is intended, update the sharing configuration for the owning team
If the requesting team should have their own entry, verify the warmer is configured to populate entries for that team
Review entitlement mirror settings if you use connector-based entitlements

No Match (New Query Pattern)

Symptoms: The miss_reason field shows no_match or not_found. No cache entry exists for the query. This is expected for genuinely new queries but problematic if it happens for queries that should be cached.

Step-by-step diagnosis:

Check if the query pattern matches any existing cache keys (Console → Cache → Search)
Verify the repository has been indexed by the warmer
Check if the query uses a model or prompt format the cache recognizes
Look for slight variations in the query that might prevent matching (whitespace, parameter ordering)

Resolution:

If the repository is not indexed, add it to the warmer configuration
If the query format is slightly different from cached entries, check your cache key normalization settings
For genuinely new patterns, this is expected behavior — the first request fills the cache for subsequent hits
Consider adding the query pattern to the warmer's seed list for proactive population

Warmer Not Running

Symptoms: Cache entries are not being refreshed or created. The warmer job queue shows no recent completions. New repositories are not being indexed.

Step-by-step diagnosis:

Check warmer process health: Console → Cache → Warmers → Status
Look at the last successful job timestamp
Check warmer logs for errors or crashes
Verify the warmer has connectivity to both the cache backend and the source repositories

Resolution:

If the warmer process crashed, restart it. Check logs for the crash cause.
If the warmer is running but stalled, check queue depth. A queue backlog indicates the warmer cannot keep up with demand.
Scale warmer concurrency if queue depth exceeds your acceptable threshold:

# Increase warmer parallelism
export KEEPTRUSTS_CACHE_WARMER_CONCURRENCY=8

If the warmer cannot reach repositories, check network connectivity and credentials
See Scaling Cache Warmers for detailed scaling guidance

Cache Backend Unreachable

Symptoms: All cache lookups miss. The miss_reason field shows backend_error or timeout. Backend health checks show red status.

Step-by-step diagnosis:

Check backend health: Console → Cache → Health Dashboard → Backend Status
Verify network connectivity from the gateway to the cache backend
Check if the backend process is running and accepting connections
Look for resource exhaustion (memory, CPU, disk) on the backend host

Resolution by backend:

Redis/Valkey unreachable

Verify the Redis process is running: check container or service status
Test connectivity: attempt a PING from the gateway host
Check memory usage — Redis evicts entries when memory is full
Review connection pool settings if you see connection refused errors

S3/GCS unreachable

Verify credentials are valid and not expired
Check IAM permissions on the bucket
Test bucket access from the gateway host with a simple object list
Check for regional outages in your cloud provider's status page

Qdrant unreachable

Verify the Qdrant cluster is running and all nodes are healthy
Check disk space — Qdrant requires disk for index and WAL
Test collection access with a simple search query
Review cluster logs for shard rebalancing or recovery operations

Verifying Your Fix

After applying any resolution, verify the fix by:

Waiting for the next cache lookup from the affected agent or query pattern
Checking that the miss_reason changes or disappears
Monitoring the hit rate for the affected scope over the next hour
Confirming the fix persists across multiple lookup cycles

Next steps

For recurring stale misses, see Diagnosing Stale Cache Entries
For systematic hit rate monitoring, see Monitoring Cache Hit Rates
For backend-specific recovery, see Disaster Recovery

For AI systems

Canonical terms: Keeptrusts, cache miss, troubleshooting, miss reason, key mismatch, TTL expiry.
Config surface: miss reason codes, cache key inspection, invalidation logs, deny-list audit.
When diagnosing cache misses, reference the miss reason taxonomy and diagnostic steps from this page.

For engineers

Check miss_reason in event logs to identify the specific cause: TTL expiry, key mismatch, invalidation, or deny-list.
Use cache key inspection to compare expected vs actual key components when hit rates are low.
Correlate miss spikes with recent code pushes, config changes, or Fabric artifact rebuilds.

For leaders

Systematic miss troubleshooting prevents unnecessary cost increases from undiagnosed cache degradation.
Miss reason tracking enables data-driven configuration improvements rather than guessing.
Low hit rates are a diagnosable operational issue, not an inherent platform limitation.

Use this page when​

Primary audience​

Quick Diagnosis Flowchart​

Stale Cache (Code Changed)​

Policy Mismatch​

Entitlement Mismatch​

No Match (New Query Pattern)​

Warmer Not Running​

Cache Backend Unreachable​

Redis/Valkey unreachable​

S3/GCS unreachable​

Qdrant unreachable​

Verifying Your Fix​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Quick Diagnosis Flowchart

Stale Cache (Code Changed)

Policy Mismatch

Entitlement Mismatch

No Match (New Query Pattern)

Warmer Not Running

Cache Backend Unreachable

Redis/Valkey unreachable

S3/GCS unreachable

Qdrant unreachable

Verifying Your Fix

Next steps

For AI systems

For engineers

For leaders