Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Alerting on Cache Fill Cost Spikes

Cache fills have a real cost — each fill triggers one or more provider calls to generate the content that gets cached. Under normal operation, fill costs are predictable and proportional to code change velocity. A sudden spike in fill cost indicates something unexpected is happening: cache corruption, a warmer loop, a misconfiguration, or a legitimate but large-scale change.

Use this page when

  • You need to configure alerts for unexpected spikes in cache fill costs.
  • You are investigating a fill cost spike and want to determine if it's expected (cold start, invalidation) or anomalous.
  • You want to set up cost-based alert thresholds tied to your org's normal fill patterns.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Normal Fill Patterns

Before setting up alerts, understand what normal fill activity looks like so you can distinguish expected spikes from problems.

New Repository Onboarded

When you add a new repository to the cache warmer, it generates fills for all indexable content. This creates a one-time spike proportional to the repository size.

Expected signature:

  • Single burst of fill activity lasting minutes to hours
  • Fill cost proportional to repository size
  • No repeat after initial population
  • Associated with a warmer configuration change

Major Refactor or Large Merge

When a large pull request merges or a major refactor lands, many cache entries become stale simultaneously. The warmer refreshes them in a burst.

Expected signature:

  • Spike correlates with a specific merge event
  • Fill count is proportional to files changed
  • Duration is bounded (hours, not days)
  • Hit rate recovers after the burst completes

Scheduled Full Refresh

Some organizations configure periodic full refreshes to ensure maximum freshness. These create predictable, recurring spikes.

Expected signature:

  • Occurs at the same time on a regular schedule
  • Duration and cost are consistent across occurrences
  • No degradation between scheduled refreshes

Abnormal Fill Patterns

These patterns indicate problems that need investigation.

Cache Corruption

When cached data becomes corrupted (bit rot, incomplete writes, storage failures), the cache rejects entries on read and triggers re-fills.

Signature:

  • Fill rate increases without corresponding code changes
  • Same entries are filled repeatedly
  • Backend error rates are elevated
  • Fill cost rises but hit rate does not improve

Warmer Loop

A warmer loop occurs when the warmer repeatedly fills the same entries because it fails to recognize them as already cached. This typically stems from a hash computation mismatch between the warmer and the lookup path.

Signature:

  • Specific cache keys appear repeatedly in fill logs
  • Fill rate is constant and does not decrease over time
  • Queue depth stays elevated despite active workers
  • Fill cost grows linearly without bound

Configuration Error

A misconfiguration can cause the warmer to treat all entries as stale or to operate on a broader scope than intended.

Signature:

  • Fill rate jumps immediately after a configuration change
  • Entries that were previously stable are being re-filled
  • The scope of fills is broader than expected (all repos instead of one)
  • Reverting the configuration change stops the spike

Embedding Model Change

If the embedding model version changes, all existing vector embeddings become incompatible. The warmer regenerates all embeddings, causing a large spike.

Signature:

  • Massive spike affecting all repositories simultaneously
  • Correlates with a deployment that updated the embedding model
  • One-time event that resolves after full regeneration
  • Qdrant write latency increases during the spike

Setting Up Alert Thresholds

Baseline Calculation

Calculate your fill cost baseline over a 7-day window, excluding known events (onboarding, scheduled refreshes):

baseline_fill_rate = median(daily_fill_cost) over 7 days
baseline_hourly_peak = p95(hourly_fill_cost) over 7 days
Alert LevelConditionResponse Time
InfoFill rate > 1.5× baseline hourly peakNext business day
WarningFill rate > 2× baseline hourly peak for 15 minutesWithin 2 hours
CriticalFill rate > 5× baseline hourly peak for 10 minutesImmediate

Alert Configuration

alerts:
fill_cost_info:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 1.5
severity: info
notify: cache-ops-channel

fill_cost_warning:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 2.0
for: 15m
severity: warning
notify: cache-ops

fill_cost_critical:
metric: keeptrusts_cache_fill_cost_dollars
condition: rate(15m) > baseline_hourly_peak * 5.0
for: 10m
severity: critical
notify: platform-ops
action: page_oncall

Runbook: Fill Cost Spike Response

When a fill cost alert fires, follow this runbook:

Step 1: Assess Scope (2 minutes)

  1. Open Console → Cache → Fill Activity
  2. Identify whether the spike affects all repos or specific ones
  3. Check if a configuration change was deployed recently
  4. Note the start time of the spike

Step 2: Check for Known Causes (3 minutes)

  • Was a new repository onboarded in the last hour?
  • Did a large merge land in any monitored repository?
  • Was a scheduled full refresh triggered?
  • Was a deployment made to the warmer or cache service?

If any of these apply and the scope matches, the spike is likely normal. Monitor for resolution.

Step 3: Investigate Abnormal Patterns (5 minutes)

If no known cause explains the spike:

  1. Check fill logs for repeated cache keys (warmer loop indicator)
  2. Check backend error rates (corruption indicator)
  3. Review recent configuration changes (misconfiguration indicator)
  4. Check if the embedding model version changed (regeneration indicator)

Step 4: Mitigate (immediate)

Based on diagnosis:

  • Warmer loop: Pause the warmer, investigate hash mismatch, fix and resume
  • Corruption: Isolate the affected backend, allow fallback to provider, schedule repair
  • Configuration error: Revert the configuration change
  • Embedding regeneration: Allow to complete but throttle rate if budget is a concern
# Pause warmer (stops new fills)
kt cache warmer pause --reason "investigating fill spike"

# Throttle fill rate (slows but does not stop fills)
kt cache warmer throttle --max-fills-per-minute 10

# Resume after fix
kt cache warmer resume

Step 5: Verify Resolution (15 minutes)

  1. Confirm fill rate returns to baseline
  2. Verify hit rate is stable or recovering
  3. Check that no entries are still being re-filled unnecessarily
  4. Close the alert with a root cause annotation

Cost Guardrails

Set hard cost limits to prevent runaway fill spending:

cache:
cost_guardrails:
max_daily_fill_cost: 500.00 # USD
max_hourly_fill_cost: 100.00 # USD
action_on_limit: pause_warmer # pause_warmer | alert_only | throttle
notify_on_limit: platform-ops

When the guardrail triggers, the warmer pauses and sends a notification. Cache lookups continue to serve existing entries, but no new fills occur until an operator resumes the warmer.

Next steps

For AI systems

  • Canonical terms: Keeptrusts, cache fill cost spike, alerting, warmer loop, cache corruption, cost guardrails, fill rate baseline.
  • Exact feature/config names: cache.cost_guardrails.max_daily_fill_cost, cache.cost_guardrails.max_hourly_fill_cost, cache.cost_guardrails.action_on_limit (pause_warmer, alert_only, throttle), kt cache warmer pause, kt cache warmer resume, fill rate metrics.
  • Best next pages: Observability Integration, Scaling Cache Warmers, Capacity Planning.

For engineers

  • Calculate fill cost baseline: median(daily_fill_cost) over 7 days excluding known events (onboarding, scheduled refreshes).
  • Set alert thresholds: info at 1.5× baseline peak, warning at 3×, critical at 5× or absolute dollar cap.
  • Distinguish normal spikes (new repo, large merge, scheduled refresh) from abnormal patterns (warmer loop, cache corruption, config error).
  • Warmer loop signature: same cache keys filled repeatedly, queue depth stays elevated, fill rate constant without decrease.
  • Configure cost guardrails in YAML: max_daily_fill_cost, max_hourly_fill_cost, and action_on_limit: pause_warmer to prevent runaway spend.
  • Response runbook: identify pattern → pause warmer → investigate → fix root cause → resume → verify resolution.

For leaders

  • Fill cost guardrails prevent runaway spending from warmer bugs or corruption — hard caps pause filling while cache lookups continue serving existing entries.
  • Normal fill spikes are one-time investments (new repo, major refactor); abnormal spikes indicate operational issues that need investigation.
  • Monthly fill cost should be a small fraction (5-15%) of total avoided cost — if not, the cache is not delivering expected ROI.
  • Alert routing to platform-ops with clear thresholds and SLAs ensures fill cost anomalies are caught before they impact budget.