Skip to main content

Billing Budgets: Soft Alert Thresholds Without Blocking Traffic

Keeptrusts supports soft budget alerts without blocking traffic by using the budget workflow on /usage/budgets to define threshold amounts and alert percentages while leaving the hard-limit option disabled, so teams get warned early in /usage and /notifications but requests keep flowing unless you intentionally pair that reporting model with wallet enforcement.

Use this page when

  • You need advance warning about spend without immediately stopping requests.
  • You want budget governance for teams, users, orgs, or agents without creating a brittle hard stop.
  • You want to separate alerting policy from wallet-based enforcement.

Primary audience

  • Primary: Technical Engineers and platform operators managing spend controls
  • Secondary: Technical Leaders balancing visibility with runtime continuity

The problem

Teams often treat all budget controls as if they should behave the same way. In practice, they should not.

Sometimes you want a hard boundary. That is where wallets and cost-ticket behavior make sense. Other times you want early warning only. You want to know that a team is moving faster than expected, but you do not want to interrupt customer traffic or operational work while you assess the situation.

If you use a hard stop too early, you create avoidable friction. If you avoid controls entirely, you only learn about overspend after the billing period has already closed. The right answer for many organizations is a soft budget policy first, then a hard control only where it is justified.

That distinction matters because budgeting and funding are different jobs. Budgets tell you whether spend is trending toward a problem. Wallets determine whether requests can still be funded.

The solution

The current budget workflow on /usage/budgets is designed to support both models, but you do not have to use both at once.

The page lets you define:

  • scope, such as organization, team, user, or agent
  • period window, such as weekly or monthly
  • threshold amount
  • alert threshold percentages
  • optional hard limit behavior
  • billing categories and time-zone details

For soft alerts, the key is to keep the budget as an observation and notification control.

That means you create the budget, set meaningful threshold percentages, and leave hard-limit behavior off. The budget still appears in current usage context, and warnings can surface through the console and notification flow, but the budget itself does not become an inline breaker for traffic.

This is especially useful when you are still learning a team's normal usage pattern or when you want finance and engineering to review a trend before you enforce a stricter control.

Implementation

The best starting pattern is to treat the first budget cycle as calibration.

  1. Open /usage/budgets.
  2. Create a budget for the right scope, such as one team with volatile usage or one agent with expensive model access.
  3. Set the period window that matches how your team reviews spend, usually weekly or monthly.
  4. Enter a threshold amount that is operationally meaningful.
  5. Keep hard limit disabled if your goal is warning rather than blocking.
  6. Use alert percentages such as 50, 75, and 90 to stage the warning intensity.
  7. Watch /usage and /notifications during the next cycle before deciding whether the policy should remain soft or become strict.

That workflow is deliberately conservative. The first goal is better visibility, not immediate restriction.

Here is a practical example.

Suppose support operations are onboarding a new AI workflow. You expect spend to move, but you do not yet know the steady-state range. A monthly team budget with alert thresholds gives you three useful signals:

  • early awareness at 50%
  • a stronger operator warning at 75%
  • an "act now" warning at 90%

None of those signals needs to stop live support traffic. They tell you when to inspect /usage, compare the team against others, and decide whether routing, prompt design, wallet allocation, or model choice needs adjustment.

This also prevents a common mistake: using wallets to solve every visibility problem. Wallets are excellent when you need a real funding boundary. They are unnecessarily sharp when you only need to know that a team is approaching a threshold.

The page becomes more useful when you pair it with two follow-up views:

  • /usage to understand which provider, model, team, or workflow is driving the spend
  • /notifications and /settings/notification-preferences to make sure the right people actually see the warning

That is the difference between a configured budget and an operating budget. The former exists in the UI. The latter changes team behavior before overspend becomes a surprise.

Results and impact

Soft budgets are often the right first control because they give you time to learn. Teams see approaching thresholds without abruptly losing service. Operators get a chance to inspect usage patterns, and leadership gets real data before choosing a stricter policy.

They also make later hard controls better. By the time you decide to enforce with wallet boundaries or stricter budget behavior, you have actual evidence about what the threshold should be. That reduces false alarms and avoids setting limits so low that they create routine disruption.

In practice, soft budgets work best for new teams, new models, and new rollouts. They let you observe the system while still keeping cost governance visible.

Key takeaways

  • Use /usage/budgets for early warning when you do not want to block traffic yet.
  • Budgets are for visibility and reporting; wallets are for funding and enforcement.
  • Threshold percentages such as 50, 75, and 90 create a useful progression of operator attention.
  • Pair budget warnings with /usage and /notifications so the signal leads to action.
  • Start soft, learn the real usage pattern, then decide whether you need a hard control.

Next steps