Alerting on Cache Fill Cost Spikes

Cache fills have a real cost — each fill triggers one or more provider calls to generate the content that gets cached. Under normal operation, fill costs are predictable and proportional to code change velocity. A sudden spike in fill cost indicates something unexpected is happening: cache corruption, a warmer loop, a misconfiguration, or a legitimate but large-scale change.

Use this page when

You need to configure alerts for unexpected spikes in cache fill costs.
You are investigating a fill cost spike and want to determine if it's expected (cold start, invalidation) or anomalous.
You want to set up cost-based alert thresholds tied to your org's normal fill patterns.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Normal Fill Patterns

Before setting up alerts, understand what normal fill activity looks like so you can distinguish expected spikes from problems.

New Repository Onboarded

When you add a new repository to the cache warmer, it generates fills for all indexable content. This creates a one-time spike proportional to the repository size.

Expected signature:

Single burst of fill activity lasting minutes to hours
Fill cost proportional to repository size
No repeat after initial population
Associated with a warmer configuration change

Major Refactor or Large Merge

When a large pull request merges or a major refactor lands, many cache entries become stale simultaneously. The warmer refreshes them in a burst.

Expected signature:

Spike correlates with a specific merge event
Fill count is proportional to files changed
Duration is bounded (hours, not days)
Hit rate recovers after the burst completes

Scheduled Full Refresh

Some organizations configure periodic full refreshes to ensure maximum freshness. These create predictable, recurring spikes.

Expected signature:

Occurs at the same time on a regular schedule
Duration and cost are consistent across occurrences
No degradation between scheduled refreshes

Abnormal Fill Patterns

These patterns indicate problems that need investigation.

Cache Corruption

When cached data becomes corrupted (bit rot, incomplete writes, storage failures), the cache rejects entries on read and triggers re-fills.

Signature:

Fill rate increases without corresponding code changes
Same entries are filled repeatedly
Backend error rates are elevated
Fill cost rises but hit rate does not improve

Warmer Loop

A warmer loop occurs when the warmer repeatedly fills the same entries because it fails to recognize them as already cached. This typically stems from a hash computation mismatch between the warmer and the lookup path.

Signature:

Specific cache keys appear repeatedly in fill logs
Fill rate is constant and does not decrease over time
Queue depth stays elevated despite active workers
Fill cost grows linearly without bound

Configuration Error

A misconfiguration can cause the warmer to treat all entries as stale or to operate on a broader scope than intended.

Signature:

Fill rate jumps immediately after a configuration change
Entries that were previously stable are being re-filled
The scope of fills is broader than expected (all repos instead of one)
Reverting the configuration change stops the spike

Embedding Model Change

If the embedding model version changes, all existing vector embeddings become incompatible. The warmer regenerates all embeddings, causing a large spike.

Signature:

Massive spike affecting all repositories simultaneously
Correlates with a deployment that updated the embedding model
One-time event that resolves after full regeneration
Qdrant write latency increases during the spike

Setting Up Alert Thresholds

Baseline Calculation

Calculate your fill cost baseline over a 7-day window, excluding known events (onboarding, scheduled refreshes):

baseline_fill_rate = median(daily_fill_cost) over 7 days
baseline_hourly_peak = p95(hourly_fill_cost) over 7 days

Recommended Thresholds

Alert Level	Condition	Response Time
Info	Fill rate > 1.5× baseline hourly peak	Next business day
Warning	Fill rate > 2× baseline hourly peak for 15 minutes	Within 2 hours
Critical	Fill rate > 5× baseline hourly peak for 10 minutes	Immediate

Alert Configuration

alerts:
  fill_cost_info:
    metric: keeptrusts_cache_fill_cost_dollars
    condition: rate(15m) > baseline_hourly_peak * 1.5
    severity: info
    notify: cache-ops-channel

  fill_cost_warning:
    metric: keeptrusts_cache_fill_cost_dollars
    condition: rate(15m) > baseline_hourly_peak * 2.0
    for: 15m
    severity: warning
    notify: cache-ops

  fill_cost_critical:
    metric: keeptrusts_cache_fill_cost_dollars
    condition: rate(15m) > baseline_hourly_peak * 5.0
    for: 10m
    severity: critical
    notify: platform-ops
    action: page_oncall

Runbook: Fill Cost Spike Response

When a fill cost alert fires, follow this runbook:

Step 1: Assess Scope (2 minutes)

Open Console → Cache → Fill Activity
Identify whether the spike affects all repos or specific ones
Check if a configuration change was deployed recently
Note the start time of the spike

Step 2: Check for Known Causes (3 minutes)

Was a new repository onboarded in the last hour?
Did a large merge land in any monitored repository?
Was a scheduled full refresh triggered?
Was a deployment made to the warmer or cache service?

If any of these apply and the scope matches, the spike is likely normal. Monitor for resolution.

Step 3: Investigate Abnormal Patterns (5 minutes)

If no known cause explains the spike:

Check fill logs for repeated cache keys (warmer loop indicator)
Check backend error rates (corruption indicator)
Review recent configuration changes (misconfiguration indicator)
Check if the embedding model version changed (regeneration indicator)

Step 4: Mitigate (immediate)

Based on diagnosis:

Warmer loop: Pause the warmer, investigate hash mismatch, fix and resume
Corruption: Isolate the affected backend, allow fallback to provider, schedule repair
Configuration error: Revert the configuration change
Embedding regeneration: Allow to complete but throttle rate if budget is a concern

# Pause warmer (stops new fills)
kt cache warmer pause --reason "investigating fill spike"

# Throttle fill rate (slows but does not stop fills)
kt cache warmer throttle --max-fills-per-minute 10

# Resume after fix
kt cache warmer resume

Step 5: Verify Resolution (15 minutes)

Confirm fill rate returns to baseline
Verify hit rate is stable or recovering
Check that no entries are still being re-filled unnecessarily
Close the alert with a root cause annotation

Cost Guardrails

Set hard cost limits to prevent runaway fill spending:

cache:
  cost_guardrails:
    max_daily_fill_cost: 500.00      # USD
    max_hourly_fill_cost: 100.00     # USD
    action_on_limit: pause_warmer    # pause_warmer | alert_only | throttle
    notify_on_limit: platform-ops

When the guardrail triggers, the warmer pauses and sends a notification. Cache lookups continue to serve existing entries, but no new fills occur until an operator resumes the warmer.

Next steps

Export fill cost metrics to your monitoring stack with Observability Integration
Understand what drives fill activity in Scaling Cache Warmers
Plan for long-term fill cost growth with Capacity Planning

For AI systems

Canonical terms: Keeptrusts, cache fill cost spike, alerting, warmer loop, cache corruption, cost guardrails, fill rate baseline.
Exact feature/config names: cache.cost_guardrails.max_daily_fill_cost, cache.cost_guardrails.max_hourly_fill_cost, cache.cost_guardrails.action_on_limit (pause_warmer, alert_only, throttle), kt cache warmer pause, kt cache warmer resume, fill rate metrics.
Best next pages: Observability Integration, Scaling Cache Warmers, Capacity Planning.

For engineers

Calculate fill cost baseline: median(daily_fill_cost) over 7 days excluding known events (onboarding, scheduled refreshes).
Set alert thresholds: info at 1.5× baseline peak, warning at 3×, critical at 5× or absolute dollar cap.
Distinguish normal spikes (new repo, large merge, scheduled refresh) from abnormal patterns (warmer loop, cache corruption, config error).
Warmer loop signature: same cache keys filled repeatedly, queue depth stays elevated, fill rate constant without decrease.
Configure cost guardrails in YAML: max_daily_fill_cost, max_hourly_fill_cost, and action_on_limit: pause_warmer to prevent runaway spend.
Response runbook: identify pattern → pause warmer → investigate → fix root cause → resume → verify resolution.

For leaders

Fill cost guardrails prevent runaway spending from warmer bugs or corruption — hard caps pause filling while cache lookups continue serving existing entries.
Normal fill spikes are one-time investments (new repo, major refactor); abnormal spikes indicate operational issues that need investigation.
Monthly fill cost should be a small fraction (5-15%) of total avoided cost — if not, the cache is not delivering expected ROI.
Alert routing to platform-ops with clear thresholds and SLAs ensures fill cost anomalies are caught before they impact budget.

Use this page when​

Primary audience​

Normal Fill Patterns​

New Repository Onboarded​

Major Refactor or Large Merge​

Scheduled Full Refresh​

Abnormal Fill Patterns​

Cache Corruption​

Warmer Loop​

Configuration Error​

Embedding Model Change​

Setting Up Alert Thresholds​

Baseline Calculation​

Recommended Thresholds​

Alert Configuration​

Runbook: Fill Cost Spike Response​

Step 1: Assess Scope (2 minutes)​

Step 2: Check for Known Causes (3 minutes)​

Step 3: Investigate Abnormal Patterns (5 minutes)​

Step 4: Mitigate (immediate)​

Step 5: Verify Resolution (15 minutes)​

Cost Guardrails​

Next steps​

For AI systems​

For engineers​

For leaders​

Use this page when

Primary audience

Normal Fill Patterns

New Repository Onboarded

Major Refactor or Large Merge

Scheduled Full Refresh

Abnormal Fill Patterns

Cache Corruption

Warmer Loop

Configuration Error

Embedding Model Change

Setting Up Alert Thresholds

Baseline Calculation

Recommended Thresholds

Alert Configuration

Runbook: Fill Cost Spike Response

Step 1: Assess Scope (2 minutes)

Step 2: Check for Known Causes (3 minutes)

Step 3: Investigate Abnormal Patterns (5 minutes)

Step 4: Mitigate (immediate)

Step 5: Verify Resolution (15 minutes)

Cost Guardrails

Next steps

For AI systems

For engineers

For leaders