CTO Guide: Building an Internal AI API Economy
Treating AI capabilities as internal API products transforms how organizations consume, govern, and fund LLM usage. Instead of every team managing their own provider relationships, the platform team operates an internal AI marketplace — with the Keeptrusts gateway as the API layer and wallets as the billing system.
Use this page when
- You are building an internal AI marketplace where teams consume LLM capabilities as API products
- You need token management strategies (production, development, sandbox) with different rate limits and model access
- You want to implement usage metering and chargeback models for AI consumption across business units
- You are designing a self-service token lifecycle (provision, use, monitor, rotate, revoke)
This guide covers the marketplace architecture, token management, metering, and chargeback models for building an internal AI economy.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
The Internal Marketplace Model
┌─────────────────────────────────────────────────────┐
│ Internal AI Marketplace │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ GPT-4o │ │ Claude │ │ Gemini Flash │ │
│ │ Premium │ │ Enterprise │ │ Economy │ │
│ │ $0.01/req │ │ $0.015/req │ │ $0.001/req │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ Keeptrusts Gateway │
│ (Routing, Policy, Metering) │
│ │ │
│ ┌─────────┬───────────┼───────────┬─────────┐ │
│ │ │ │ │ │ │
│ Team A Team B Team C Team D Team E │
│ $5K/mo $8K/mo $3K/mo $12K/mo $2K/mo │
└─────────────────────────────────────────────────────┘
Marketplace Benefits
| Benefit | Traditional Approach | Marketplace Approach |
|---|---|---|
| Provider management | Every team manages keys | Platform team manages centrally |
| Cost attribution | Manual reconciliation | Automatic per-request metering |
| Governance | Per-team implementation | Centralized, consistent |
| Onboarding | Days to weeks | Minutes (gateway key) |
| Budget control | None or quarterly | Real-time with caps |
Token Management for Teams
Gateway keys are the access tokens for the internal marketplace. Each key represents a "subscription" to AI capabilities.
Token Issuance Strategy
# Production service key — high rate limit, restricted models
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "search-service-prod",
"description": "Production search service",
"rate_limit": 5000,
"allowed_models": ["gpt-4o-mini"]
}'
# Development key — lower rate limit, broader model access
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "search-team-dev",
"description": "Search team development and testing",
"rate_limit": 500,
"allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]
}'
# Sandbox key — minimal rate limit, all models for experimentation
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "innovation-sandbox",
"description": "Innovation team sandbox",
"rate_limit": 100,
"allowed_models": ["*"]
}'
Token Lifecycle
| Stage | Trigger | API Action |
|---|---|---|
| Provisioning | Team request or self-service | POST /v1/tokens |
| Activation | First use | Automatic |
| Monitoring | Ongoing | GET /v1/events?gateway_key=... |
| Rotation | Policy (every 90 days) | POST /v1/tokens + revoke old |
| Suspension | Budget exhaustion or policy violation | PATCH /v1/tokens/{id} |
| Revocation | Project end or security incident | DELETE /v1/tokens/{id} |
Screenshot reference: Console Settings → Gateway Keys showing the full key inventory with status indicators, rate limits, last active timestamps, and associated teams.
Rate Limiting per Consumer Group
Rate limiting ensures fair access to shared AI infrastructure and prevents runaway automation from exhausting capacity.
Rate Limit Configuration
# policy-config.yaml — rate limits per consumer group
consumer_groups:
- name: production-services
rate_limit:
requests_per_minute: 5000
tokens_per_minute: 500000
concurrent_requests: 100
- name: development
rate_limit:
requests_per_minute: 500
tokens_per_minute: 50000
concurrent_requests: 20
- name: sandbox
rate_limit:
requests_per_minute: 50
tokens_per_minute: 10000
concurrent_requests: 5
Rate Limit Monitoring
# Current rate limit utilization by consumer group
curl "https://api.keeptrusts.com/v1/events?since=1h&aggregate=rate_by_consumer_group" \
-H "Authorization: Bearer $TOKEN"
Rate Limit Tiers
| Tier | RPM | TPM | Concurrent | Monthly Cost |
|---|---|---|---|---|
| Free (sandbox) | 50 | 10K | 5 | $0 (pool budget) |
| Standard (dev) | 500 | 50K | 20 | Team wallet allocation |
| Professional (prod) | 5,000 | 500K | 100 | Team wallet allocation |
| Enterprise (critical) | 50,000 | 5M | 500 | Dedicated budget |
Usage Metering via Events API
Every gateway request generates a metered event with complete cost attribution.
Event Structure for Metering
{
"id": "evt_abc123",
"timestamp": "2026-04-23T14:30:00Z",
"gateway_key": "kt_gk_search_prod",
"consumer_group": "production-services",
"team_id": "search-team",
"user_id": "usr_jane_doe",
"provider": "openai",
"model": "gpt-4o-mini",
"tokens_input": 245,
"tokens_output": 512,
"cost_usd": 0.00045,
"latency_ms": 890,
"policy_actions": ["pii_redacted"],
"cache_hit": false
}
Usage Analytics Queries
# Monthly usage summary by team
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=usage_by_team" \
-H "Authorization: Bearer $TOKEN" | jq '.teams[] | {
team: .name,
requests: .count,
tokens: .total_tokens,
cost: .total_cost
}'
# Usage trends (daily for forecasting)
curl "https://api.keeptrusts.com/v1/events?since=90d&aggregate=usage_daily" \
-H "Authorization: Bearer $TOKEN"
# Model usage distribution
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=by_model" \
-H "Authorization: Bearer $TOKEN"
Screenshot reference: Console Usage showing per-team usage metering with request counts, token volumes, and cost breakdowns in a sortable table.
Chargeback Models Using Wallets
Wallets enable three common chargeback models for internal AI consumption.
Model 1: Fixed Monthly Allocation
Each team receives a fixed monthly budget. Simple to administer, suitable for predictable workloads.
# Allocate fixed monthly budget
curl -X POST https://api.keeptrusts.com/v1/wallets/allocate \
-H "Authorization: Bearer $TOKEN" \
-d '{"team_id": "search-team", "amount": 5000.00, "period": "monthly"}'
Model 2: Pay-Per-Use Chargeback
Teams are billed based on actual usage. The wallet tracks consumption and finance reconciles monthly.
# Query team consumption for billing
curl "https://api.keeptrusts.com/v1/events?since=30d&team_id=search-team&aggregate=cost_total" \
-H "Authorization: Bearer $TOKEN"
Model 3: Tiered Pricing
Teams pay different rates based on their consumption tier, incentivizing efficient usage.
| Tier | Monthly Usage | Effective Rate | Incentive |
|---|---|---|---|
| Base | 0–$1,000 | 1.0x (actual cost) | Standard access |
| Growth | $1,001–$5,000 | 0.9x (10% discount) | Rewards adoption |
| Scale | $5,001–$20,000 | 0.8x (20% discount) | Rewards scale |
| Enterprise | $20,001+ | 0.7x (30% discount) | Rewards commitment |
Chargeback Report Generation
#!/bin/bash
# monthly-chargeback.sh — Generate chargeback report
API="https://api.keeptrusts.com"
TOKEN="$API_TOKEN"
MONTH=$(date -v-1m +%Y-%m)
echo "# AI Chargeback Report: $MONTH"
echo ""
echo "| Team | Requests | Tokens | Cost | Tier |"
echo "|------|----------|--------|------|------|"
# Fetch per-team usage
curl -s "$API/v1/events?since=${MONTH}-01&until=$(date +%Y-%m-01)&aggregate=cost_by_team" \
-H "Authorization: Bearer $TOKEN" | jq -r '.teams[] | "| \(.name) | \(.count) | \(.total_tokens) | $\(.total_cost) | \(.tier) |"'
Building the Marketplace Catalog
Document your internal AI offerings as a service catalog.
Service Definition
| Service | Model | SLA | Rate Limit | Cost |
|---|---|---|---|---|
| AI Completions — Standard | GPT-4o-mini | 99.5% | 500 RPM | $0.0006/req |
| AI Completions — Premium | GPT-4o | 99.5% | 500 RPM | $0.012/req |
| AI Analysis — Long Context | Claude Sonnet | 99.0% | 200 RPM | $0.018/req |
| AI Classification — Economy | Gemini Flash | 99.5% | 2000 RPM | $0.0003/req |
| AI Chat — Governed | Multi-model | 99.0% | 100 RPM | Per wallet |
Onboarding Flow
Team requests AI access
→ Platform team reviews request
→ Issue gateway key (scoped to approved services)
→ Allocate wallet (fixed or pay-per-use)
→ Developer starts using AI via OpenAI-compatible SDK
→ Usage metered automatically
→ Monthly chargeback reconciliation
Governance as a Platform Feature
In the marketplace model, governance is not overhead — it is a feature:
- Developers get instant, self-service access with guardrails that prevent compliance mistakes
- Finance gets automated cost attribution and chargeback
- Security gets complete audit trails and DLP enforcement
- Compliance gets automated evidence generation
- Leadership gets real-time dashboards for decision-making
Key Takeaways
- Treat AI capabilities as internal API products with clear service definitions and pricing
- Use gateway keys as subscription tokens with rate limits, model access, and cost controls
- Meter every request through the events API for accurate usage attribution
- Choose a chargeback model that matches your organizational culture — fixed, pay-per-use, or tiered
- Position governance as a marketplace feature, not a tax on innovation
Next steps
- CTO Guide: AI Platform Engineering
- CTO Guide: Multi-Provider AI Strategy
- CIO Guide: Cutting AI Costs by 40%
For AI systems
- Canonical terms: internal marketplace, gateway keys (
kt_gk_...),POST /v1/tokens,token_type: gateway,rate_limit,allowed_models, consumer groups, wallets, chargeback, usage metering, token lifecycle - Key configuration:
token_type: gatewaywithrate_limit(RPM),allowed_models(provider-model list or["*"]for sandbox) - Best next pages: CTO: Platform Engineering, CTO: Developer Velocity, CIO: Cost Optimization
For engineers
- Production keys:
POST /v1/tokenswithrate_limit: 5000,allowed_models: ["gpt-4o-mini"]— high throughput, restricted models - Development keys:
rate_limit: 500,allowed_models: ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]— broader access for testing - Sandbox keys:
rate_limit: 100,allowed_models: ["*"]— experimentation with all models at low volume - Automate provisioning in CI/CD: call
POST /v1/tokensto issue short-lived keys for deployment pipelines - Monitor usage: Console Usage shows per-key/per-team consumption with real-time metering
For leaders
- The internal marketplace model eliminates duplicate provider relationships, centralizes billing, and enables instant team onboarding (minutes vs days)
- Tiered gateway keys (production/dev/sandbox) enforce appropriate access levels without separate approval workflows
- Chargeback via wallet-based metering makes AI costs visible to business unit owners, creating natural incentives for efficiency
- Platform team staffing: 1 engineer can operate the marketplace for 10+ product teams once templates are established