Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Setting Budget Alerts for Cache Fill Phases

When you onboard a new repository to the org-shared cache, the initial fill phase sends every unique request upstream. This is the most expensive period — and it is predictable, bounded, and one-time. Setting budget alerts ensures you stay informed and in control during fill.

Use this page when

  • You are onboarding a new repository and need to set spending alerts and limits for the initial cache fill phase.
  • You want to estimate fill cost before starting and communicate budget expectations to stakeholders.
  • You need recommended alert thresholds by team size and repository complexity.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

Understanding Fill Phase Economics

The fill phase follows a predictable pattern:

WeekHit RateFill Cost (relative)Description
Week 110-30%HighMost requests are novel; cache is building
Week 240-60%ModerateCommon patterns are cached; less common still fill
Week 360-80%LowLong-tail patterns filling; most traffic hits cache
Week 4+75-90%MinimalSteady state; fills only on new/changed code

After the fill phase completes, your ongoing fill cost is typically 5-15% of total request volume — only truly novel prompts or stale invalidations trigger upstream calls.

Estimating Fill Cost Before Starting

Estimate your fill cost before onboarding a repository:

Formula:

Estimated Fill Cost = Unique Prompt Patterns × Avg Cost per Request

Rules of thumb by repository size:

Repo SizeUnique Patterns (first month)Avg Cost/RequestEstimated Fill
Small (< 50K LOC)200-500$0.06$12-30
Medium (50-200K LOC)500-2,000$0.08$40-160
Large (200K-1M LOC)2,000-8,000$0.10$200-800
Monorepo (1M+ LOC)8,000-25,000$0.12$960-3,000

These estimates assume GPT-4o-class models. Cheaper models reduce fill cost proportionally.

Setting Wallet Alerts

Configure alerts before starting a fill phase:

  1. Navigate to Settings → Wallets in the console
  2. Select the wallet assigned to the team onboarding the new repository
  3. Click Alerts → Add Alert
  4. Configure:
SettingRecommended Value
Alert TypeSpending threshold
Threshold50% of estimated fill cost
PeriodWeekly
ChannelEmail + Slack
ActionNotify (do not block)
  1. Add a second alert at 90% of estimated fill cost with the same period
  2. Save

Setting Spending Limits

For hard cost control during fill, set a spending limit:

  1. Navigate to Settings → Wallets
  2. Select the target wallet
  3. Click Limits → Add Limit
  4. Configure:
SettingRecommended Value
Limit TypeDaily maximum
AmountEstimated monthly fill ÷ 20 working days × 1.5 buffer
EnforcementSoft limit (warn) or Hard limit (block after threshold)
ScopePer-repository or per-team

A soft limit notifies you but allows requests to continue. A hard limit blocks further upstream calls — subsequent requests return a cost-limit error until the next period.

Team SizeDaily Soft LimitDaily Hard LimitWeekly Alert
10 engineers$15$25$75
25 engineers$35$60$175
50 engineers$70$120$350
100 engineers$140$240$700

These assume a medium-to-large repository during peak fill phase. Adjust based on your estimated fill cost and risk tolerance.

Notification Channels

Budget alerts support multiple notification channels:

  • Email — Sent to wallet owner and configured recipients
  • Slack — Posts to a designated channel via webhook
  • Microsoft Teams — Posts via incoming webhook connector
  • Webhook — Generic HTTP POST for custom integrations
  • Console banner — In-app notification visible to all team members

Configure at least two channels to ensure visibility.

Monitoring Fill Progress

Track fill progress in real-time:

  1. Navigate to Cost Center → Cache Performance
  2. Watch the Hit Rate metric climb over days
  3. Check Fill Cost / Day trending downward
  4. Review New Cache Entries / Day — this should decrease as the cache fills

When new entries per day drops below 5% of daily request volume, you have reached steady state.

Fill Phase Best Practices

  1. Start with a smaller team — Let 5-10 engineers fill the cache before scaling to 100
  2. Fill during off-peak — Spread fill cost over a week rather than a single day
  3. Monitor miss reasons — If stale misses spike, your TTL may be too aggressive
  4. Communicate to the team — Let engineers know the first week costs more; savings come after
  5. Don't over-restrict — Hard limits during fill slow down cache population and delay ROI

Adjusting After Fill Phase

Once your repository reaches steady-state hit rate (70%+):

  1. Lower daily spending limits by 60-80%
  2. Remove fill-phase-specific alerts
  3. Set long-term alerts based on expected ongoing fill cost (10-20% of pre-cache baseline)
  4. Review monthly to ensure limits match actual spend patterns

Handling Unexpected Fill Spikes

Occasional fill spikes occur when:

  • Major code refactors invalidate cached entries
  • New models are deployed (different cache keys)
  • TTL expires on a large batch of entries simultaneously
  • A new team joins the codebase

These are temporary. If an alert fires unexpectedly, check Miss Reasons in the dashboard — stale misses during a code change are normal and self-resolving.

Next steps

For AI systems

For engineers

  • Set alerts at 50% and 90% of estimated fill cost with weekly period. Configure email + Slack channels.
  • Daily spending limit formula: estimated_monthly_fill ÷ 20 working_days × 1.5 buffer.
  • Soft limits notify but allow continued requests; hard limits block upstream calls until next period.
  • Monitor fill progress: Cost Center → Cache Performance → watch hit rate climb and fill cost/day trend downward.
  • Steady state reached when new cache entries/day drops below 5% of daily request volume.
  • After steady state: lower limits by 60–80%, remove fill-specific alerts, set long-term alerts at 10–20% of baseline.

For leaders

  • Fill cost is one-time, predictable, and bounded. It represents the investment required to unlock ongoing 80%+ cost reduction.
  • Recommended approach: start with 5–10 engineers filling the cache, then scale to 100 once steady-state is reached.
  • Fill spikes after major refactors are temporary and self-resolving — not a sign of misconfiguration.
  • Communicate to teams: first week costs more; savings compound from week 2 onward. ROI is typically < 1 week payback.