Setting Budget Alerts for Cache Fill Phases
When you onboard a new repository to the org-shared cache, the initial fill phase sends every unique request upstream. This is the most expensive period — and it is predictable, bounded, and one-time. Setting budget alerts ensures you stay informed and in control during fill.
Use this page when
- You are onboarding a new repository and need to set spending alerts and limits for the initial cache fill phase.
- You want to estimate fill cost before starting and communicate budget expectations to stakeholders.
- You need recommended alert thresholds by team size and repository complexity.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
Understanding Fill Phase Economics
The fill phase follows a predictable pattern:
| Week | Hit Rate | Fill Cost (relative) | Description |
|---|---|---|---|
| Week 1 | 10-30% | High | Most requests are novel; cache is building |
| Week 2 | 40-60% | Moderate | Common patterns are cached; less common still fill |
| Week 3 | 60-80% | Low | Long-tail patterns filling; most traffic hits cache |
| Week 4+ | 75-90% | Minimal | Steady state; fills only on new/changed code |
After the fill phase completes, your ongoing fill cost is typically 5-15% of total request volume — only truly novel prompts or stale invalidations trigger upstream calls.
Estimating Fill Cost Before Starting
Estimate your fill cost before onboarding a repository:
Formula:
Estimated Fill Cost = Unique Prompt Patterns × Avg Cost per Request
Rules of thumb by repository size:
| Repo Size | Unique Patterns (first month) | Avg Cost/Request | Estimated Fill |
|---|---|---|---|
| Small (< 50K LOC) | 200-500 | $0.06 | $12-30 |
| Medium (50-200K LOC) | 500-2,000 | $0.08 | $40-160 |
| Large (200K-1M LOC) | 2,000-8,000 | $0.10 | $200-800 |
| Monorepo (1M+ LOC) | 8,000-25,000 | $0.12 | $960-3,000 |
These estimates assume GPT-4o-class models. Cheaper models reduce fill cost proportionally.
Setting Wallet Alerts
Configure alerts before starting a fill phase:
- Navigate to Settings → Wallets in the console
- Select the wallet assigned to the team onboarding the new repository
- Click Alerts → Add Alert
- Configure:
| Setting | Recommended Value |
|---|---|
| Alert Type | Spending threshold |
| Threshold | 50% of estimated fill cost |
| Period | Weekly |
| Channel | Email + Slack |
| Action | Notify (do not block) |
- Add a second alert at 90% of estimated fill cost with the same period
- Save
Setting Spending Limits
For hard cost control during fill, set a spending limit:
- Navigate to Settings → Wallets
- Select the target wallet
- Click Limits → Add Limit
- Configure:
| Setting | Recommended Value |
|---|---|
| Limit Type | Daily maximum |
| Amount | Estimated monthly fill ÷ 20 working days × 1.5 buffer |
| Enforcement | Soft limit (warn) or Hard limit (block after threshold) |
| Scope | Per-repository or per-team |
A soft limit notifies you but allows requests to continue. A hard limit blocks further upstream calls — subsequent requests return a cost-limit error until the next period.
Recommended Thresholds by Team Size
| Team Size | Daily Soft Limit | Daily Hard Limit | Weekly Alert |
|---|---|---|---|
| 10 engineers | $15 | $25 | $75 |
| 25 engineers | $35 | $60 | $175 |
| 50 engineers | $70 | $120 | $350 |
| 100 engineers | $140 | $240 | $700 |
These assume a medium-to-large repository during peak fill phase. Adjust based on your estimated fill cost and risk tolerance.
Notification Channels
Budget alerts support multiple notification channels:
- Email — Sent to wallet owner and configured recipients
- Slack — Posts to a designated channel via webhook
- Microsoft Teams — Posts via incoming webhook connector
- Webhook — Generic HTTP POST for custom integrations
- Console banner — In-app notification visible to all team members
Configure at least two channels to ensure visibility.
Monitoring Fill Progress
Track fill progress in real-time:
- Navigate to Cost Center → Cache Performance
- Watch the Hit Rate metric climb over days
- Check Fill Cost / Day trending downward
- Review New Cache Entries / Day — this should decrease as the cache fills
When new entries per day drops below 5% of daily request volume, you have reached steady state.
Fill Phase Best Practices
- Start with a smaller team — Let 5-10 engineers fill the cache before scaling to 100
- Fill during off-peak — Spread fill cost over a week rather than a single day
- Monitor miss reasons — If
stalemisses spike, your TTL may be too aggressive - Communicate to the team — Let engineers know the first week costs more; savings come after
- Don't over-restrict — Hard limits during fill slow down cache population and delay ROI
Adjusting After Fill Phase
Once your repository reaches steady-state hit rate (70%+):
- Lower daily spending limits by 60-80%
- Remove fill-phase-specific alerts
- Set long-term alerts based on expected ongoing fill cost (10-20% of pre-cache baseline)
- Review monthly to ensure limits match actual spend patterns
Handling Unexpected Fill Spikes
Occasional fill spikes occur when:
- Major code refactors invalidate cached entries
- New models are deployed (different cache keys)
- TTL expires on a large batch of entries simultaneously
- A new team joins the codebase
These are temporary. If an alert fires unexpectedly, check Miss Reasons in the dashboard — stale misses during a code change are normal and self-resolving.
Next steps
- Tracking Avoided Cost — see savings offset fill cost
- ROI Calculation for a 100-Engineer Team — model fill cost into full ROI
- Single-Flight Fill — reduce fill cost further with deduplication
For AI systems
- Canonical terms: Keeptrusts, budget alerts, cache fill phase, wallet alerts, spending limits, fill cost estimation, soft limit, hard limit.
- Console paths: Settings → Wallets → Alerts → Add Alert, Settings → Wallets → Limits → Add Limit, Cost Center → Cache Performance.
- Best next pages: Estimating Fill Cost, ROI Calculation for a 100-Engineer Team, Savings Dashboard Walkthrough.
For engineers
- Set alerts at 50% and 90% of estimated fill cost with weekly period. Configure email + Slack channels.
- Daily spending limit formula:
estimated_monthly_fill ÷ 20 working_days × 1.5 buffer. - Soft limits notify but allow continued requests; hard limits block upstream calls until next period.
- Monitor fill progress: Cost Center → Cache Performance → watch hit rate climb and fill cost/day trend downward.
- Steady state reached when new cache entries/day drops below 5% of daily request volume.
- After steady state: lower limits by 60–80%, remove fill-specific alerts, set long-term alerts at 10–20% of baseline.
For leaders
- Fill cost is one-time, predictable, and bounded. It represents the investment required to unlock ongoing 80%+ cost reduction.
- Recommended approach: start with 5–10 engineers filling the cache, then scale to 100 once steady-state is reached.
- Fill spikes after major refactors are temporary and self-resolving — not a sign of misconfiguration.
- Communicate to teams: first week costs more; savings compound from week 2 onward. ROI is typically < 1 week payback.