Dev vs Production AI Costs: Separate Governance Profiles for Each

One of the fastest ways to lose control of AI spend is to treat development and production traffic as one budget problem. They are not the same workload, and they should not be governed the same way. Development traffic is exploratory, spiky, and often inefficient by design. Production traffic is recurring, user-facing, and tied to service expectations. When both live under the same governance profile, neither one is visible enough to manage well.

Keeptrusts gives teams a cleaner model. Separate wallets, billing budgets, routing defaults, caching settings, exports, and analytics views let you see experimentation spend without confusing it with customer-serving traffic. That distinction matters for budgeting, but it also matters for decision-making. The right response to a development overspend is usually different from the right response to a production increase.

Use this page when

You want clearer cost ownership between experimentation and live workloads.
You need development to stay flexible without letting it distort production budget planning.
You want separate routing and caching choices for dev and production environments.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, platform owners

Why one shared profile causes bad decisions

When development and production share the same dashboards and budget line, leaders lose the ability to interpret spend correctly. A spike in development prompt testing can look like rising production demand. A healthy production increase can be mistaken for sloppy internal experimentation. Finance sees one number and asks everyone to cut. Teams then make blunt changes that hurt the wrong workload.

The problem is not only reporting. Shared governance changes behavior. Developers hesitate to experiment because they know their activity will show up against the same spend target that production needs to meet. Or the opposite happens: production inherits loose development habits because the cheaper experimentation defaults were never separated from live traffic. Both are preventable.

Environment separation solves this by making the cost signal legible. Development can have smaller wallets, tighter soft budgets, lower-cost routing defaults, and more aggressive caching. Production can have a different budget shape, more conservative provider choices where necessary, and a dashboard view that reflects real user demand rather than internal trial activity.

What should differ between dev and production

The most important difference is financial intent. Development budgets are there to support learning. You expect some waste because teams are testing prompts, flows, and provider mix. Production budgets are there to support stable business output. Waste is less acceptable because the workload should already be understood.

The second difference is routing. Development should usually start on lower-cost lanes unless a team is deliberately validating premium behavior. Production routing should reflect the actual business requirement for each workflow, not a broad habit inherited from prototyping.

The third difference is caching. Development often benefits from longer cache windows for repeated prompt iterations, since teams may re-run the same structure many times while tuning. Production caching should be set to match the workload and freshness expectation of the live experience.

The fourth difference is review cadence. Development exports are useful for weekly experimentation review. Production exports are usually more important for monthly finance and operating reviews. Mixing the two makes both conversations harder.

Implementation

A simple pattern is to maintain separate configs for development and production so each environment has its own wallet attribution, routing defaults, and cache profile.

# policy-config.dev.yaml
cache:
  enabled: true
  ttl_seconds: 7200
  max_entries: 5000
  match_strategy: exact
providers:
  routing:
    strategy: usage_based
  targets:
    - id: openai-gpt4o-mini
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
    - id: anthropic-haiku
      provider: anthropic
      model: claude-haiku
      secret_key_ref:
        env: ANTHROPIC_API_KEY
consumer_groups:
  - name: product-dev
    api_key: kt_cg_product_dev
    wallet_team_id: team_product_dev
cost_tracking:
  enabled: true
  wallet_enforcement: true

# policy-config.prod.yaml
cache:
  enabled: true
  ttl_seconds: 900
  max_entries: 20000
  match_strategy: exact
providers:
  routing:
    strategy: usage_based
  targets:
    - id: openai-gpt4o
      provider: openai
      model: gpt-5.4-mini
      secret_key_ref:
        env: OPENAI_API_KEY
    - id: openai-gpt4o-mini
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
consumer_groups:
  - name: product-prod
    api_key: kt_cg_product_prod
    wallet_team_id: team_product_prod
cost_tracking:
  enabled: true
  wallet_enforcement: true

The exact targets will vary by workload, but the structure is the point. Development traffic has its own consumer group and wallet scope. Production traffic has its own. Caching and routing are allowed to differ. That creates two clean analytics streams instead of one confusing blend.

Once the separation exists, add billing budgets that match the purpose of each environment. Development budgets should alert early because experimentation can expand fast. Production budgets should reflect expected service demand and seasonality. Wallets provide the hard boundary, while budgets give leaders time to react before a limit becomes a stop.

How teams use the split

For development, the dashboard becomes a tool for managing experimentation discipline. Leaders can see whether one team is running far more prompt iterations than expected, whether low-cost routes are actually being used, and whether caching is reducing repeated test traffic. That makes development spend discussable without framing all experimentation as waste.

For production, the dashboard becomes a service management tool. If spend rises, leaders can ask whether request volume increased, whether provider mix shifted, or whether cache behavior changed. Those are meaningful production questions. They are much harder to answer when development traffic is mixed into the same chart.

Exports become more useful as well. A weekly development export supports prompt review and experimentation retrospectives. A monthly production export supports finance close, budget planning, and leadership reporting. Because the environments are already separated at runtime, the evidence does not have to be reconstructed later.

Results and impact

Teams that split development and production governance usually find that both environments become easier to defend. Development spending becomes easier to justify because it is framed as bounded experimentation with visible wallets and budgets. Production spending becomes easier to explain because the numbers now reflect real user-facing demand rather than internal tuning noise.

This also improves cost optimization. Development can move aggressively toward cheaper routes and heavy cache reuse without confusing leaders about customer-facing performance. Production can keep the routes and budgets that match business value. The result is not just cleaner reporting. It is cleaner management.

Key takeaways

Development and production AI traffic serve different goals and should not share one governance profile.
Separate wallets, billing budgets, routing defaults, caching behavior, and exports make cost data easier to interpret.
Development visibility supports disciplined experimentation, while production visibility supports reliable service management.
The best budget conversations happen when leaders can explain each environment with its own evidence stream.

Dev vs Production AI Costs: Separate Governance Profiles for Each

Use this page when​

Primary audience​

Why one shared profile causes bad decisions​

What should differ between dev and production​

Implementation​

How teams use the split​

Results and impact​

Key takeaways​

Next steps​