CTO Guide: Building an Internal AI API Economy

Treating AI capabilities as internal API products transforms how organizations consume, govern, and fund LLM usage. Instead of every team managing their own provider relationships, the platform team operates an internal AI marketplace — with the Keeptrusts gateway as the API layer and wallets as the billing system.

Use this page when

You are building an internal AI marketplace where teams consume LLM capabilities as API products
You need token management strategies (production, development, sandbox) with different rate limits and model access
You want to implement usage metering and chargeback models for AI consumption across business units
You are designing a self-service token lifecycle (provision, use, monitor, rotate, revoke)

This guide covers the marketplace architecture, token management, metering, and chargeback models for building an internal AI economy.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

The Internal Marketplace Model

┌─────────────────────────────────────────────────────┐
│                 Internal AI Marketplace               │
│                                                       │
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────┐ │
│  │ GPT-4o      │  │ Claude      │  │ Gemini Flash │ │
│  │ Premium     │  │ Enterprise  │  │ Economy      │ │
│  │ $0.01/req   │  │ $0.015/req  │  │ $0.001/req   │ │
│  └──────┬──────┘  └──────┬──────┘  └──────┬───────┘ │
│         │                │                │          │
│         └────────────────┼────────────────┘          │
│                          │                            │
│              Keeptrusts Gateway                       │
│              (Routing, Policy, Metering)              │
│                          │                            │
│    ┌─────────┬───────────┼───────────┬─────────┐     │
│    │         │           │           │         │     │
│  Team A   Team B     Team C     Team D    Team E     │
│  $5K/mo   $8K/mo    $3K/mo    $12K/mo   $2K/mo     │
└─────────────────────────────────────────────────────┘

Marketplace Benefits

Benefit	Traditional Approach	Marketplace Approach
Provider management	Every team manages keys	Platform team manages centrally
Cost attribution	Manual reconciliation	Automatic per-request metering
Governance	Per-team implementation	Centralized, consistent
Onboarding	Days to weeks	Minutes (gateway key)
Budget control	None or quarterly	Real-time with caps

Token Management for Teams

Gateway keys are the access tokens for the internal marketplace. Each key represents a "subscription" to AI capabilities.

Token Issuance Strategy

# Production service key — high rate limit, restricted models
curl -X POST https://api.keeptrusts.com/v1/tokens \
  -H "Authorization: Bearer $PLATFORM_TOKEN" \
  -d '{
    "token_type": "gateway",
    "name": "search-service-prod",
    "description": "Production search service",
    "rate_limit": 5000,
    "allowed_models": ["gpt-4o-mini"]
  }'

# Development key — lower rate limit, broader model access
curl -X POST https://api.keeptrusts.com/v1/tokens \
  -H "Authorization: Bearer $PLATFORM_TOKEN" \
  -d '{
    "token_type": "gateway",
    "name": "search-team-dev",
    "description": "Search team development and testing",
    "rate_limit": 500,
    "allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]
  }'

# Sandbox key — minimal rate limit, all models for experimentation
curl -X POST https://api.keeptrusts.com/v1/tokens \
  -H "Authorization: Bearer $PLATFORM_TOKEN" \
  -d '{
    "token_type": "gateway",
    "name": "innovation-sandbox",
    "description": "Innovation team sandbox",
    "rate_limit": 100,
    "allowed_models": ["*"]
  }'

Token Lifecycle

Stage	Trigger	API Action
Provisioning	Team request or self-service	`POST /v1/tokens`
Activation	First use	Automatic
Monitoring	Ongoing	`GET /v1/events?gateway_key=...`
Rotation	Policy (every 90 days)	`POST /v1/tokens` + revoke old
Suspension	Budget exhaustion or policy violation	`PATCH /v1/tokens/{id}`
Revocation	Project end or security incident	`DELETE /v1/tokens/{id}`

Screenshot reference: Console Settings → Gateway Keys showing the full key inventory with status indicators, rate limits, last active timestamps, and associated teams.

Rate Limiting per Consumer Group

Rate limiting ensures fair access to shared AI infrastructure and prevents runaway automation from exhausting capacity.

Rate Limit Configuration

# policy-config.yaml — rate limits per consumer group
consumer_groups:
  - name: production-services
    rate_limit:
      requests_per_minute: 5000
      tokens_per_minute: 500000
      concurrent_requests: 100

  - name: development
    rate_limit:
      requests_per_minute: 500
      tokens_per_minute: 50000
      concurrent_requests: 20

  - name: sandbox
    rate_limit:
      requests_per_minute: 50
      tokens_per_minute: 10000
      concurrent_requests: 5

Rate Limit Monitoring

# Current rate limit utilization by consumer group
curl "https://api.keeptrusts.com/v1/events?since=1h&aggregate=rate_by_consumer_group" \
  -H "Authorization: Bearer $TOKEN"

Rate Limit Tiers

Tier	RPM	TPM	Concurrent	Monthly Cost
Free (sandbox)	50	10K	5	$0 (pool budget)
Standard (dev)	500	50K	20	Team wallet allocation
Professional (prod)	5,000	500K	100	Team wallet allocation
Enterprise (critical)	50,000	5M	500	Dedicated budget

Usage Metering via Events API

Every gateway request generates a metered event with complete cost attribution.

Event Structure for Metering

{
  "id": "evt_abc123",
  "timestamp": "2026-04-23T14:30:00Z",
  "gateway_key": "kt_gk_search_prod",
  "consumer_group": "production-services",
  "team_id": "search-team",
  "user_id": "usr_jane_doe",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "tokens_input": 245,
  "tokens_output": 512,
  "cost_usd": 0.00045,
  "latency_ms": 890,
  "policy_actions": ["pii_redacted"],
  "cache_hit": false
}

Usage Analytics Queries

# Monthly usage summary by team
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=usage_by_team" \
  -H "Authorization: Bearer $TOKEN" | jq '.teams[] | {
    team: .name,
    requests: .count,
    tokens: .total_tokens,
    cost: .total_cost
  }'

# Usage trends (daily for forecasting)
curl "https://api.keeptrusts.com/v1/events?since=90d&aggregate=usage_daily" \
  -H "Authorization: Bearer $TOKEN"

# Model usage distribution
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=by_model" \
  -H "Authorization: Bearer $TOKEN"

Screenshot reference: Console Usage showing per-team usage metering with request counts, token volumes, and cost breakdowns in a sortable table.

Chargeback Models Using Wallets

Wallets enable three common chargeback models for internal AI consumption.

Model 1: Fixed Monthly Allocation

Each team receives a fixed monthly budget. Simple to administer, suitable for predictable workloads.

# Allocate fixed monthly budget
curl -X POST https://api.keeptrusts.com/v1/wallets/allocate \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"team_id": "search-team", "amount": 5000.00, "period": "monthly"}'

Model 2: Pay-Per-Use Chargeback

Teams are billed based on actual usage. The wallet tracks consumption and finance reconciles monthly.

# Query team consumption for billing
curl "https://api.keeptrusts.com/v1/events?since=30d&team_id=search-team&aggregate=cost_total" \
  -H "Authorization: Bearer $TOKEN"

Model 3: Tiered Pricing

Teams pay different rates based on their consumption tier, incentivizing efficient usage.

Tier	Monthly Usage	Effective Rate	Incentive
Base	0–$1,000	1.0x (actual cost)	Standard access
Growth	$1,001–$5,000	0.9x (10% discount)	Rewards adoption
Scale	$5,001–$20,000	0.8x (20% discount)	Rewards scale
Enterprise	$20,001+	0.7x (30% discount)	Rewards commitment

Chargeback Report Generation

#!/bin/bash
# monthly-chargeback.sh — Generate chargeback report

API="https://api.keeptrusts.com"
TOKEN="$API_TOKEN"
MONTH=$(date -v-1m +%Y-%m)

echo "# AI Chargeback Report: $MONTH"
echo ""
echo "| Team | Requests | Tokens | Cost | Tier |"
echo "|------|----------|--------|------|------|"

# Fetch per-team usage
curl -s "$API/v1/events?since=${MONTH}-01&until=$(date +%Y-%m-01)&aggregate=cost_by_team" \
  -H "Authorization: Bearer $TOKEN" | jq -r '.teams[] | "| \(.name) | \(.count) | \(.total_tokens) | $\(.total_cost) | \(.tier) |"'

Building the Marketplace Catalog

Document your internal AI offerings as a service catalog.

Service Definition

Service	Model	SLA	Rate Limit	Cost
AI Completions — Standard	GPT-4o-mini	99.5%	500 RPM	$0.0006/req
AI Completions — Premium	GPT-4o	99.5%	500 RPM	$0.012/req
AI Analysis — Long Context	Claude Sonnet	99.0%	200 RPM	$0.018/req
AI Classification — Economy	Gemini Flash	99.5%	2000 RPM	$0.0003/req
AI Chat — Governed	Multi-model	99.0%	100 RPM	Per wallet

Onboarding Flow

Team requests AI access
  → Platform team reviews request
    → Issue gateway key (scoped to approved services)
      → Allocate wallet (fixed or pay-per-use)
        → Developer starts using AI via OpenAI-compatible SDK
          → Usage metered automatically
            → Monthly chargeback reconciliation

Governance as a Platform Feature

In the marketplace model, governance is not overhead — it is a feature:

Developers get instant, self-service access with guardrails that prevent compliance mistakes
Finance gets automated cost attribution and chargeback
Security gets complete audit trails and DLP enforcement
Compliance gets automated evidence generation
Leadership gets real-time dashboards for decision-making

Key Takeaways

Treat AI capabilities as internal API products with clear service definitions and pricing
Use gateway keys as subscription tokens with rate limits, model access, and cost controls
Meter every request through the events API for accurate usage attribution
Choose a chargeback model that matches your organizational culture — fixed, pay-per-use, or tiered
Position governance as a marketplace feature, not a tax on innovation

Next steps

For AI systems

Canonical terms: internal marketplace, gateway keys (kt_gk_...), POST /v1/tokens, token_type: gateway, rate_limit, allowed_models, consumer groups, wallets, chargeback, usage metering, token lifecycle
Key configuration: token_type: gateway with rate_limit (RPM), allowed_models (provider-model list or ["*"] for sandbox)
Best next pages: CTO: Platform Engineering, CTO: Developer Velocity, CIO: Cost Optimization

For engineers

Production keys: POST /v1/tokens with rate_limit: 5000, allowed_models: ["gpt-4o-mini"] — high throughput, restricted models
Development keys: rate_limit: 500, allowed_models: ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"] — broader access for testing
Sandbox keys: rate_limit: 100, allowed_models: ["*"] — experimentation with all models at low volume
Automate provisioning in CI/CD: call POST /v1/tokens to issue short-lived keys for deployment pipelines
Monitor usage: Console Usage shows per-key/per-team consumption with real-time metering

For leaders

The internal marketplace model eliminates duplicate provider relationships, centralizes billing, and enables instant team onboarding (minutes vs days)
Tiered gateway keys (production/dev/sandbox) enforce appropriate access levels without separate approval workflows
Chargeback via wallet-based metering makes AI costs visible to business unit owners, creating natural incentives for efficiency
Platform team staffing: 1 engineer can operate the marketplace for 10+ product teams once templates are established

Use this page when​

Primary audience​

The Internal Marketplace Model​

Marketplace Benefits​

Token Management for Teams​

Token Issuance Strategy​

Token Lifecycle​

Rate Limiting per Consumer Group​

Rate Limit Configuration​

Rate Limit Monitoring​

Rate Limit Tiers​

Usage Metering via Events API​

Event Structure for Metering​

Usage Analytics Queries​

Chargeback Models Using Wallets​

Model 1: Fixed Monthly Allocation​

Model 2: Pay-Per-Use Chargeback​

Model 3: Tiered Pricing​

Chargeback Report Generation​

Building the Marketplace Catalog​

Service Definition​

Onboarding Flow​

Governance as a Platform Feature​

Key Takeaways​

Next steps​

For AI systems​

For engineers​

For leaders​