Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

CIO Guide: Cutting AI Infrastructure Costs by 40%

AI infrastructure costs are the fastest-growing line item in most enterprise IT budgets. Without centralized controls, teams over-provision expensive models for simple tasks, duplicate API keys across projects, and lack visibility into who is spending what.

Use this page when

  • You are setting up wallet-based budget allocation and per-team spend caps
  • You need to understand the reserve/settle flow for gateway-level cost enforcement
  • You want to implement model routing optimization (expensive models for complex tasks, cheap models for simple tasks)
  • You are building chargeback reports from console usage and wallet reporting for finance/procurement

Keeptrusts gives you console-level cost controls, wallet-based budget allocation, intelligent model routing, and API-driven analytics to cut AI spend by 40% or more — with full audit trail.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

Usage & Wallets Deep-Dive

The console Usage and Wallets pages give you a complete picture for AI spend management. They provide:

  • Real-time spend tracking per team, project, and individual user
  • Budget allocation with hard and soft caps
  • Trend analysis with daily, weekly, and monthly views
  • Anomaly detection when spend deviates from historical patterns
  • Export capabilities for finance and procurement workflows

Screenshot reference: Console Usage showing team-level spend breakdown, budget utilization bars, and monthly trend chart.

SurfaceWhat It ShowsAction
UsageTotal spend, top consumers, and budget healthIdentify over-spenders
WalletsPer-team allocation and current balancesAdjust allocations
Events APISpend per model across all teamsIdentify optimization targets
Cost and SpendHistorical spend with forecasting inputsPlan budget cycles
AlertsBudget threshold notificationsConfigure warning levels

Wallet Allocation Strategy

Wallets are the budget enforcement primitive. Each wallet maps to a team or project and has a configurable balance that decrements with every LLM request.

Allocation Hierarchy

Organization Wallet ($50,000/month)
├── Engineering ($25,000)
│ ├── Search Team ($10,000)
│ ├── Platform Team ($8,000)
│ └── ML Team ($7,000)
├── Product ($15,000)
│ ├── Customer Support ($8,000)
│ └── Analytics ($7,000)
└── Operations ($10,000)
├── DevOps ($5,000)
└── Security ($5,000)
# Allocate budget to the search team
curl -X POST https://api.keeptrusts.com/v1/wallets/allocate \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "search",
"amount": 10000,
"currency": "USD",
"period": "monthly"
}'

Reserve/Settle Flow

When the gateway processes an LLM request:

  1. Reserve — Estimated cost is held against the team's wallet before forwarding to the provider
  2. Forward — Request goes to the upstream LLM provider
  3. Settle — Actual cost (based on token usage) replaces the reservation
  4. Block — If insufficient balance, the request is held until funds are available or an admin approves

This prevents overspend even under concurrent high-volume usage.

Per-Team Spend Caps

Configure hard caps to prevent runaway costs and soft caps for early warnings.

# Policy config with spend controls
spend_controls:
- team: search
hard_cap: 10000
soft_cap: 8000
period: monthly
on_soft_cap: notify
on_hard_cap: block
- team: customer-support
hard_cap: 8000
soft_cap: 6000
period: monthly
on_soft_cap: notify
on_hard_cap: escalate

Console checkpoint: Usage and wallet reporting show which teams are approaching their soft cap and which have been blocked by hard cap enforcement.

Model Routing for Cost Optimization

The biggest cost lever is routing requests to the most cost-effective model that meets quality requirements. Most teams default to the most expensive model out of habit, not necessity.

Cost Comparison Matrix

ModelInput (per 1M tokens)Output (per 1M tokens)Best For
GPT-4o$2.50$10.00Complex reasoning
GPT-4o mini$0.15$0.60Simple classification
Claude Sonnet$3.00$15.00Long-form content
Claude Haiku$0.25$1.25Summarization

Smart Routing Configuration

# Route simple tasks to cheaper models automatically
model_groups:
- name: cost-optimized
routing: cost-priority
models:
- provider: openai
model: gpt-4o-mini
max_tokens: 1000
priority: 1 # Try cheapest first
- provider: openai
model: gpt-4o
priority: 2 # Fallback for complex requests

Typical savings: Organizations report 30–50% cost reduction by routing classification, summarization, and extraction tasks to smaller models.

Caching ROI

Enable response caching for deterministic queries to avoid paying for identical requests.

cache:
enabled: true
ttl: 3600 # 1 hour
scope: team # Cache is team-scoped for isolation

Impact metrics:

MetricWithout CachingWith CachingSavings
Daily requests (search team)50,00050,000
Unique requests50,00032,000
Cached responses018,00036%
Daily cost$125$80$45/day
Monthly cost$3,750$2,400$1,350/month

API Spend Analytics Endpoints

Build automated cost reports and alerting using the control-plane API:

# Get current wallet balance for a team
curl https://api.keeptrusts.com/v1/wallets/balance?team_id=search \
-H "Authorization: Bearer $API_TOKEN"

# Query spend events with aggregation
curl "https://api.keeptrusts.com/v1/events?team_id=search&since=30d&aggregate=daily" \
-H "Authorization: Bearer $API_TOKEN"

Finance Integration Example

# Monthly chargeback report via CLI
kt events list \
--since 30d \
--format csv \
--fields team,provider,model,cost,timestamp \
> monthly-chargeback-$(date +%Y%m).csv

Budget Forecasting with /v1/wallets

Use wallet balance trends to forecast future spend and proactively adjust allocations:

# Get wallet transaction history for forecasting
curl "https://api.keeptrusts.com/v1/wallets/transactions?team_id=search&since=90d" \
-H "Authorization: Bearer $API_TOKEN"

Build forecasting into your BI pipeline:

MonthActual SpendForecastVariance
Jan$22,400
Feb$24,100$23,500+2.6%
Mar$21,800$25,200-13.5%
Apr (forecast)$22,800

ROI Summary

OptimizationSavingsImplementation Effort
Model routing (expensive → optimized)30–50%Policy config change
Response caching20–35%Policy config change
Budget caps (eliminate overspend)10–15%Wallet allocation
Shadow AI eliminationVariableGateway + network policy
Combined40–60%2–4 weeks

Next steps

  1. Open the console Usage and Wallets pages and review current team-level spend
  2. Identify the top 3 teams by AI spend and audit their model usage
  3. Implement model routing for simple tasks (classification, extraction)
  4. Set soft caps at 80% of current spend to establish baselines
  5. Schedule monthly cost review using automated CSV exports

See also: CIO Guide: Building an Enterprise AI Governance Framework · CTO Guide: Multi-Provider AI Strategy

For AI systems

  • Canonical terms: console usage, wallets, POST /v1/wallets/allocate, reserve/settle flow, spend_controls, hard_cap, soft_cap, model routing, model_allowlist, budget allocation hierarchy, cost per token, team spend breakdown
  • Key console pages: Usage, Wallets, Cost and Spend
  • Best next pages: CTO: Multi-Provider Strategy, CIO: AI Governance Framework, CTO: API Economy

For engineers

  • Allocate budget: POST /v1/wallets/allocate with team_id, amount, currency: USD, period: monthly
  • Reserve/settle: gateway reserves estimated cost before upstream call, settles to actual cost on response; insufficient balance = request held
  • Spend caps: set hard_cap (blocks at limit) and soft_cap (alerts at threshold) per team in policy-config.yaml
  • Model routing: use model_allowlist: [gpt-4o-mini] for cost-sensitive workloads to prevent teams from using expensive models
  • Console checkpoint: Usage and wallet reporting show per-team allocation vs actual spend and identify optimization targets

For leaders

  • Wallet-based budgets enforce cost controls in real time at the request level — not after the invoice arrives
  • The reserve/settle pattern prevents overspend under concurrent high-volume usage, even with multiple services sharing a budget
  • Model routing optimization (directing simple tasks to gpt-4o-mini instead of gpt-4o) typically reduces costs by 30–50% with minimal quality impact
  • Usage analytics and wallet reporting enable show-back and charge-back models that align AI spend with business unit accountability