Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Reduce AI Spend by 40% with Gateway Controls

Most organizations overspend on AI because every team picks its own provider, nobody tracks per-request costs, and there are no guardrails to prevent runaway usage. Keeptrusts fixes all three problems at the gateway layer.

Use this page when

  • You want to cut AI costs through provider routing, response caching, model downgrading, and wallet enforcement.
  • You need to understand the five cost-reduction levers and their typical savings percentages.
  • You are setting up cost-optimized routing across multiple providers with a quality floor.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What you'll achieve

  • 40% average cost reduction through intelligent provider routing and caching
  • Per-team budget enforcement with wallet controls that prevent overspend before it happens
  • Real-time spend visibility across every team, user, and model
  • Automatic model selection that routes to the cheapest capable model for each request type

The five levers of cost reduction

1. Provider routing — stop overpaying for the same capability

Different providers charge different rates for equivalent models. Keeptrusts lets you route traffic to the cheapest option automatically.

pack:
name: reduce-ai-spend-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: azure-gpt4o
provider: azure-openai
model: gpt-4o
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: openai-gpt4o
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-sonnet
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

With this config, Keeptrusts automatically routes to Azure (cheapest) when available, failing over to others only when needed.

2. Response caching — stop paying twice for the same answer

Enable caching to serve repeated or similar requests from a local cache instead of making a new upstream call.

caching:
enabled: true
ttl_seconds: 3600
max_entries: 10000
match_strategy: exact

Typical savings: 15–25% reduction in upstream API calls for workloads with repeated queries (customer support, FAQ bots, code completion).

3. Model selection — use the right model for the job

Not every request needs GPT-4o. Route simple classification and extraction tasks to cheaper models.

providers:
model_groups:
- name: fast-cheap
models:
- provider: openai
model: gpt-4o-mini
- provider: anthropic
model: claude-haiku
routing: lowest_latency

- name: high-quality
models:
- provider: openai
model: gpt-4o
- provider: anthropic
model: claude-sonnet-4-20250514
routing: cost_optimized

Applications can target fast-cheap for simple tasks and high-quality for complex reasoning — cutting costs without sacrificing quality where it matters.

4. Wallet limits — enforce budgets before they're exceeded

Keeptrusts wallets enforce hard spend limits at every scope: organization, team, and individual user.

# Allocate $500 to the engineering team
curl -X POST https://api.keeptrusts.com/v1/wallets/allocate \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"team_id": "eng-team-id",
"amount": 500.00,
"currency": "USD"
}'

When a wallet runs dry:

  • The gateway creates a cost ticket instead of forwarding the request
  • The request is queued until an admin replenishes the wallet or approves the ticket
  • No surprise bills — ever

See Wallets for the full cascade logic (user → team → org).

5. Spend dashboards — see where the money goes

The console Spend page gives you real-time visibility into:

  • Cost per team — identify which teams are the biggest spenders
  • Cost per model — see which models are driving costs
  • Cost per request — drill into individual expensive requests
  • Trend analysis — spot cost spikes before they become budget problems

Quick wins

  1. Add pricing blocks to your provider targets — this alone gives you per-request cost tracking
  2. Enable caching for any workload with repeated queries — immediate 15–25% savings
  3. Set wallet limits on your top-spending team — prevent the next cost surprise
  4. Switch simple workloads to gpt-4o-mini or equivalent — 80% cheaper for classification and extraction

Measuring your savings

Track these metrics in the Spend page before and after implementing controls:

MetricWhere to find itTarget improvement
Total monthly spendSpend → Overview30–40% reduction
Cache hit rateGateway metrics> 20% for repetitive workloads
Cost per 1K tokens (blended)Spend → Model breakdownLower blended rate
Budget utilization per teamWallets → Team view< 90% of allocation

Example: full cost-optimized config

pack:
name: cost-optimized-gateway
version: '1.0'
provider_routing:
strategy: cost_optimized
cost_optimization:
prefer_lowest_cost: true
quality_floor: 0.7
caching:
enabled: true
ttl_seconds: 3600
max_entries: 10000
providers:
targets:
- id: azure-gpt4o-mini
provider: azure-openai
model: gpt-4o-mini
base_url: https://my-resource.openai.azure.com
secret_key_ref:
env: AZURE_OPENAI_KEY
- id: openai-gpt4o
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
retention_days: 90

For AI systems

  • Canonical terms: cost_optimized routing, response caching, model selection, wallet limits, spend dashboard.
  • Config keys: provider_routing.strategy: cost_optimized, caching.enabled, caching.ttl_seconds, providers.targets[].pricing, wallet API.
  • Five levers: provider routing (15–25%), model downgrading (20–40%), caching (15–25%), wallet limits, spend visibility.
  • Best next pages: ROI Calculator, Rate Limiting, Wallets, Provider Routing.

For engineers

  • Prerequisites: gateway running with at least two provider targets with pricing defined.
  • Set provider_routing.strategy: cost_optimized with quality_floor: 0.7 to route to the cheapest capable provider.
  • Enable caching.enabled: true with ttl_seconds: 3600 for repetitive workloads (FAQ bots, code completion).
  • Configure wallet allocations per team via the Wallets API to enforce hard budget caps.
  • Validate: compare the Spend page totals before and after enabling routing/caching over a 1-week period.

For leaders

  • Organizations typically achieve 30–50% cost reduction by combining routing, caching, and model selection.
  • At $50K/month spend, that’s $180K–$300K/year in savings — payback period is typically under one month.
  • Wallet limits eliminate surprise invoices: no team or user can exceed their allocation without admin approval.
  • Real-time spend visibility enables proactive budget forecasting instead of reactive invoice shock.

Next steps