Alerting on Cache Fill Cost Spikes
Cache fills have a real cost — each fill triggers one or more provider calls to generate the content that gets cached. Under normal operation, fill costs are predictable and proportional to code change velocity. A sudden spike in fill cost indicates something unexpected is happening: cache corruption, a warmer loop, a misconfiguration, or a legitimate but large-scale change.
Use this page when
- You need to configure alerts for unexpected spikes in cache fill costs.
- You are investigating a fill cost spike and want to determine if it's expected (cold start, invalidation) or anomalous.
- You want to set up cost-based alert thresholds tied to your org's normal fill patterns.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Normal Fill Patterns
Before setting up alerts, understand what normal fill activity looks like so you can distinguish expected spikes from problems.
New Repository Onboarded
When you add a new repository to the cache warmer, it generates fills for all indexable content. This creates a one-time spike proportional to the repository size.
Expected signature:
- Single burst of fill activity lasting minutes to hours
- Fill cost proportional to repository size
- No repeat after initial population
- Associated with a warmer configuration change
Major Refactor or Large Merge
When a large pull request merges or a major refactor lands, many cache entries become stale simultaneously. The warmer refreshes them in a burst.
Expected signature:
- Spike correlates with a specific merge event
- Fill count is proportional to files changed
- Duration is bounded (hours, not days)
- Hit rate recovers after the burst completes
Scheduled Full Refresh
Some organizations configure periodic full refreshes to ensure maximum freshness. These create predictable, recurring spikes.
Expected signature:
- Occurs at the same time on a regular schedule
- Duration and cost are consistent across occurrences
- No degradation between scheduled refreshes
Abnormal Fill Patterns
These patterns indicate problems that need investigation.
Cache Corruption
When cached data becomes corrupted (bit rot, incomplete writes, storage failures), the cache rejects entries on read and triggers re-fills.
Signature:
- Fill rate increases without corresponding code changes
- Same entries are filled repeatedly
- Backend error rates are elevated
- Fill cost rises but hit rate does not improve
Warmer Loop
A warmer loop occurs when the warmer repeatedly fills the same entries because it fails to recognize them as already cached. This typically stems from a hash computation mismatch between the warmer and the lookup path.
Signature:
- Specific cache keys appear repeatedly in fill logs
- Fill rate is constant and does not decrease over time
- Queue depth stays elevated despite active workers
- Fill cost grows linearly without bound
Configuration Error
A misconfiguration can cause the warmer to treat all entries as stale or to operate on a broader scope than intended.
Signature:
- Fill rate jumps immediately after a configuration change
- Entries that were previously stable are being re-filled
- The scope of fills is broader than expected (all repos instead of one)
- Reverting the configuration change stops the spike
Embedding Model Change
If the embedding model version changes, all existing vector embeddings become incompatible. The warmer regenerates all embeddings, causing a large spike.
Signature:
- Massive spike affecting all repositories simultaneously
- Correlates with a deployment that updated the embedding model
- One-time event that resolves after full regeneration
- Qdrant write latency increases during the spike
Setting Up Alert Thresholds
Baseline Calculation
Calculate your fill cost baseline over a 7-day window, excluding known events (onboarding, scheduled refreshes):
baseline_fill_rate = median(daily_fill_cost) over 7 days
baseline_hourly_peak = p95(hourly_fill_cost) over 7 days
Recommended Thresholds
| Alert Level | Condition | Response Time |
|---|---|---|
| Info | Fill rate > 1.5× baseline hourly peak | Next business day |
| Warning | Fill rate > 2× baseline hourly peak for 15 minutes | Within 2 hours |
| Critical | Fill rate > 5× baseline hourly peak for 10 minutes | Immediate |
Alert Configuration
alerts:
fill_cost_info:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 1.5
severity: info
notify: cache-ops-channel
fill_cost_warning:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 2.0
for: 15m
severity: warning
notify: cache-ops
fill_cost_critical:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 5.0
for: 10m
severity: critical
notify: platform-ops
action: page_oncall
Runbook: Fill Cost Spike Response
When a fill cost alert fires, follow this runbook:
Step 1: Assess Scope (2 minutes)
- Open Console → Cache → Fill Activity
- Identify whether the spike affects all repos or specific ones
- Check if a configuration change was deployed recently
- Note the start time of the spike
Step 2: Check for Known Causes (3 minutes)
- Was a new repository onboarded in the last hour?
- Did a large merge land in any monitored repository?
- Was a scheduled full refresh triggered?
- Was a deployment made to the warmer or cache service?
If any of these apply and the scope matches, the spike is likely normal. Monitor for resolution.
Step 3: Investigate Abnormal Patterns (5 minutes)
If no known cause explains the spike:
- Check fill logs for repeated cache keys (warmer loop indicator)
- Check backend error rates (corruption indicator)
- Review recent configuration changes (misconfiguration indicator)
- Check if the embedding model version changed (regeneration indicator)
Step 4: Mitigate (immediate)
Based on diagnosis:
- Warmer loop: Pause the warmer, investigate hash mismatch, fix and resume
- Corruption: Isolate the affected backend, allow fallback to provider, schedule repair
- Configuration error: Revert the configuration change
- Embedding regeneration: Allow to complete but throttle rate if budget is a concern
# Pause warmer (stops new fills)
kt cache warmer pause --reason "investigating fill spike"
# Throttle fill rate (slows but does not stop fills)
kt cache warmer throttle --max-fills-per-minute 10
# Resume after fix
kt cache warmer resume
Step 5: Verify Resolution (15 minutes)
- Confirm fill rate returns to baseline
- Verify hit rate is stable or recovering
- Check that no entries are still being re-filled unnecessarily
- Close the alert with a root cause annotation
Cost Guardrails
Set hard cost limits to prevent runaway fill spending:
cache:
cost_guardrails:
max_daily_fill_cost: 500.00 # USD
max_hourly_fill_cost: 100.00 # USD
action_on_limit: pause_warmer # pause_warmer | alert_only | throttle
notify_on_limit: platform-ops
When the guardrail triggers, the warmer pauses and sends a notification. Cache lookups continue to serve existing entries, but no new fills occur until an operator resumes the warmer.
Next steps
- Export fill cost metrics to your monitoring stack with Observability Integration
- Understand what drives fill activity in Scaling Cache Warmers
- Plan for long-term fill cost growth with Capacity Planning
For AI systems
- Canonical terms: Keeptrusts, cache fill cost spike, alerting, warmer loop, cache corruption, cost guardrails, fill rate baseline.
- Exact feature/config names:
cache.cost_guardrails.max_daily_fill_cost,cache.cost_guardrails.max_hourly_fill_cost,cache.cost_guardrails.action_on_limit(pause_warmer,alert_only,throttle),kt cache warmer pause,kt cache warmer resume, fill rate metrics. - Best next pages: Observability Integration, Scaling Cache Warmers, Capacity Planning.
For engineers
- Calculate fill cost baseline:
median(daily_fill_cost)over 7 days excluding known events (onboarding, scheduled refreshes). - Set alert thresholds: info at 1.5× baseline peak, warning at 3×, critical at 5× or absolute dollar cap.
- Distinguish normal spikes (new repo, large merge, scheduled refresh) from abnormal patterns (warmer loop, cache corruption, config error).
- Warmer loop signature: same cache keys filled repeatedly, queue depth stays elevated, fill rate constant without decrease.
- Configure cost guardrails in YAML:
max_daily_fill_cost,max_hourly_fill_cost, andaction_on_limit: pause_warmerto prevent runaway spend. - Response runbook: identify pattern → pause warmer → investigate → fix root cause → resume → verify resolution.
For leaders
- Fill cost guardrails prevent runaway spending from warmer bugs or corruption — hard caps pause filling while cache lookups continue serving existing entries.
- Normal fill spikes are one-time investments (new repo, major refactor); abnormal spikes indicate operational issues that need investigation.
- Monthly fill cost should be a small fraction (5-15%) of total avoided cost — if not, the cache is not delivering expected ROI.
- Alert routing to platform-ops with clear thresholds and SLAs ensures fill cost anomalies are caught before they impact budget.