Skip to main content

Dev vs Production AI Costs: Separate Governance Profiles for Each

One of the fastest ways to lose control of AI spend is to treat development and production traffic as one budget problem. They are not the same workload, and they should not be governed the same way. Development traffic is exploratory, spiky, and often inefficient by design. Production traffic is recurring, user-facing, and tied to service expectations. When both live under the same governance profile, neither one is visible enough to manage well.

Keeptrusts gives teams a cleaner model. Separate wallets, billing budgets, routing defaults, caching settings, exports, and analytics views let you see experimentation spend without confusing it with customer-serving traffic. That distinction matters for budgeting, but it also matters for decision-making. The right response to a development overspend is usually different from the right response to a production increase.

Use this page when

  • You want clearer cost ownership between experimentation and live workloads.
  • You need development to stay flexible without letting it distort production budget planning.
  • You want separate routing and caching choices for dev and production environments.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, platform owners

Why one shared profile causes bad decisions

When development and production share the same dashboards and budget line, leaders lose the ability to interpret spend correctly. A spike in development prompt testing can look like rising production demand. A healthy production increase can be mistaken for sloppy internal experimentation. Finance sees one number and asks everyone to cut. Teams then make blunt changes that hurt the wrong workload.

The problem is not only reporting. Shared governance changes behavior. Developers hesitate to experiment because they know their activity will show up against the same spend target that production needs to meet. Or the opposite happens: production inherits loose development habits because the cheaper experimentation defaults were never separated from live traffic. Both are preventable.

Environment separation solves this by making the cost signal legible. Development can have smaller wallets, tighter soft budgets, lower-cost routing defaults, and more aggressive caching. Production can have a different budget shape, more conservative provider choices where necessary, and a dashboard view that reflects real user demand rather than internal trial activity.

What should differ between dev and production

The most important difference is financial intent. Development budgets are there to support learning. You expect some waste because teams are testing prompts, flows, and provider mix. Production budgets are there to support stable business output. Waste is less acceptable because the workload should already be understood.

The second difference is routing. Development should usually start on lower-cost lanes unless a team is deliberately validating premium behavior. Production routing should reflect the actual business requirement for each workflow, not a broad habit inherited from prototyping.

The third difference is caching. Development often benefits from longer cache windows for repeated prompt iterations, since teams may re-run the same structure many times while tuning. Production caching should be set to match the workload and freshness expectation of the live experience.

The fourth difference is review cadence. Development exports are useful for weekly experimentation review. Production exports are usually more important for monthly finance and operating reviews. Mixing the two makes both conversations harder.

Implementation

A simple pattern is to maintain separate configs for development and production so each environment has its own wallet attribution, routing defaults, and cache profile.

# policy-config.dev.yaml
cache:
enabled: true
ttl_seconds: 7200
max_entries: 5000
match_strategy: exact
providers:
routing:
strategy: usage_based
targets:
- id: openai-gpt4o-mini
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-haiku
provider: anthropic
model: claude-haiku
secret_key_ref:
env: ANTHROPIC_API_KEY
consumer_groups:
- name: product-dev
api_key: kt_cg_product_dev
wallet_team_id: team_product_dev
cost_tracking:
enabled: true
wallet_enforcement: true

# policy-config.prod.yaml
cache:
enabled: true
ttl_seconds: 900
max_entries: 20000
match_strategy: exact
providers:
routing:
strategy: usage_based
targets:
- id: openai-gpt4o
provider: openai
model: gpt-5.4-mini
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-gpt4o-mini
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
consumer_groups:
- name: product-prod
api_key: kt_cg_product_prod
wallet_team_id: team_product_prod
cost_tracking:
enabled: true
wallet_enforcement: true

The exact targets will vary by workload, but the structure is the point. Development traffic has its own consumer group and wallet scope. Production traffic has its own. Caching and routing are allowed to differ. That creates two clean analytics streams instead of one confusing blend.

Once the separation exists, add billing budgets that match the purpose of each environment. Development budgets should alert early because experimentation can expand fast. Production budgets should reflect expected service demand and seasonality. Wallets provide the hard boundary, while budgets give leaders time to react before a limit becomes a stop.

How teams use the split

For development, the dashboard becomes a tool for managing experimentation discipline. Leaders can see whether one team is running far more prompt iterations than expected, whether low-cost routes are actually being used, and whether caching is reducing repeated test traffic. That makes development spend discussable without framing all experimentation as waste.

For production, the dashboard becomes a service management tool. If spend rises, leaders can ask whether request volume increased, whether provider mix shifted, or whether cache behavior changed. Those are meaningful production questions. They are much harder to answer when development traffic is mixed into the same chart.

Exports become more useful as well. A weekly development export supports prompt review and experimentation retrospectives. A monthly production export supports finance close, budget planning, and leadership reporting. Because the environments are already separated at runtime, the evidence does not have to be reconstructed later.

Results and impact

Teams that split development and production governance usually find that both environments become easier to defend. Development spending becomes easier to justify because it is framed as bounded experimentation with visible wallets and budgets. Production spending becomes easier to explain because the numbers now reflect real user-facing demand rather than internal tuning noise.

This also improves cost optimization. Development can move aggressively toward cheaper routes and heavy cache reuse without confusing leaders about customer-facing performance. Production can keep the routes and budgets that match business value. The result is not just cleaner reporting. It is cleaner management.

Key takeaways

  • Development and production AI traffic serve different goals and should not share one governance profile.
  • Separate wallets, billing budgets, routing defaults, caching behavior, and exports make cost data easier to interpret.
  • Development visibility supports disciplined experimentation, while production visibility supports reliable service management.
  • The best budget conversations happen when leaders can explain each environment with its own evidence stream.

Next steps