Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

CTO Guide: Building an Internal AI API Economy

Treating AI capabilities as internal API products transforms how organizations consume, govern, and fund LLM usage. Instead of every team managing their own provider relationships, the platform team operates an internal AI marketplace — with the Keeptrusts gateway as the API layer and wallets as the billing system.

Use this page when

  • You are building an internal AI marketplace where teams consume LLM capabilities as API products
  • You need token management strategies (production, development, sandbox) with different rate limits and model access
  • You want to implement usage metering and chargeback models for AI consumption across business units
  • You are designing a self-service token lifecycle (provision, use, monitor, rotate, revoke)

This guide covers the marketplace architecture, token management, metering, and chargeback models for building an internal AI economy.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The Internal Marketplace Model

┌─────────────────────────────────────────────────────┐
│ Internal AI Marketplace │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │
│ │ GPT-4o │ │ Claude │ │ Gemini Flash │ │
│ │ Premium │ │ Enterprise │ │ Economy │ │
│ │ $0.01/req │ │ $0.015/req │ │ $0.001/req │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ Keeptrusts Gateway │
│ (Routing, Policy, Metering) │
│ │ │
│ ┌─────────┬───────────┼───────────┬─────────┐ │
│ │ │ │ │ │ │
│ Team A Team B Team C Team D Team E │
│ $5K/mo $8K/mo $3K/mo $12K/mo $2K/mo │
└─────────────────────────────────────────────────────┘

Marketplace Benefits

BenefitTraditional ApproachMarketplace Approach
Provider managementEvery team manages keysPlatform team manages centrally
Cost attributionManual reconciliationAutomatic per-request metering
GovernancePer-team implementationCentralized, consistent
OnboardingDays to weeksMinutes (gateway key)
Budget controlNone or quarterlyReal-time with caps

Token Management for Teams

Gateway keys are the access tokens for the internal marketplace. Each key represents a "subscription" to AI capabilities.

Token Issuance Strategy

# Production service key — high rate limit, restricted models
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "search-service-prod",
"description": "Production search service",
"rate_limit": 5000,
"allowed_models": ["gpt-4o-mini"]
}'

# Development key — lower rate limit, broader model access
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "search-team-dev",
"description": "Search team development and testing",
"rate_limit": 500,
"allowed_models": ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]
}'

# Sandbox key — minimal rate limit, all models for experimentation
curl -X POST https://api.keeptrusts.com/v1/tokens \
-H "Authorization: Bearer $PLATFORM_TOKEN" \
-d '{
"token_type": "gateway",
"name": "innovation-sandbox",
"description": "Innovation team sandbox",
"rate_limit": 100,
"allowed_models": ["*"]
}'

Token Lifecycle

StageTriggerAPI Action
ProvisioningTeam request or self-servicePOST /v1/tokens
ActivationFirst useAutomatic
MonitoringOngoingGET /v1/events?gateway_key=...
RotationPolicy (every 90 days)POST /v1/tokens + revoke old
SuspensionBudget exhaustion or policy violationPATCH /v1/tokens/{id}
RevocationProject end or security incidentDELETE /v1/tokens/{id}

Screenshot reference: Console Settings → Gateway Keys showing the full key inventory with status indicators, rate limits, last active timestamps, and associated teams.

Rate Limiting per Consumer Group

Rate limiting ensures fair access to shared AI infrastructure and prevents runaway automation from exhausting capacity.

Rate Limit Configuration

# policy-config.yaml — rate limits per consumer group
consumer_groups:
- name: production-services
rate_limit:
requests_per_minute: 5000
tokens_per_minute: 500000
concurrent_requests: 100

- name: development
rate_limit:
requests_per_minute: 500
tokens_per_minute: 50000
concurrent_requests: 20

- name: sandbox
rate_limit:
requests_per_minute: 50
tokens_per_minute: 10000
concurrent_requests: 5

Rate Limit Monitoring

# Current rate limit utilization by consumer group
curl "https://api.keeptrusts.com/v1/events?since=1h&aggregate=rate_by_consumer_group" \
-H "Authorization: Bearer $TOKEN"

Rate Limit Tiers

TierRPMTPMConcurrentMonthly Cost
Free (sandbox)5010K5$0 (pool budget)
Standard (dev)50050K20Team wallet allocation
Professional (prod)5,000500K100Team wallet allocation
Enterprise (critical)50,0005M500Dedicated budget

Usage Metering via Events API

Every gateway request generates a metered event with complete cost attribution.

Event Structure for Metering

{
"id": "evt_abc123",
"timestamp": "2026-04-23T14:30:00Z",
"gateway_key": "kt_gk_search_prod",
"consumer_group": "production-services",
"team_id": "search-team",
"user_id": "usr_jane_doe",
"provider": "openai",
"model": "gpt-4o-mini",
"tokens_input": 245,
"tokens_output": 512,
"cost_usd": 0.00045,
"latency_ms": 890,
"policy_actions": ["pii_redacted"],
"cache_hit": false
}

Usage Analytics Queries

# Monthly usage summary by team
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=usage_by_team" \
-H "Authorization: Bearer $TOKEN" | jq '.teams[] | {
team: .name,
requests: .count,
tokens: .total_tokens,
cost: .total_cost
}'

# Usage trends (daily for forecasting)
curl "https://api.keeptrusts.com/v1/events?since=90d&aggregate=usage_daily" \
-H "Authorization: Bearer $TOKEN"

# Model usage distribution
curl "https://api.keeptrusts.com/v1/events?since=30d&aggregate=by_model" \
-H "Authorization: Bearer $TOKEN"

Screenshot reference: Console Usage showing per-team usage metering with request counts, token volumes, and cost breakdowns in a sortable table.

Chargeback Models Using Wallets

Wallets enable three common chargeback models for internal AI consumption.

Model 1: Fixed Monthly Allocation

Each team receives a fixed monthly budget. Simple to administer, suitable for predictable workloads.

# Allocate fixed monthly budget
curl -X POST https://api.keeptrusts.com/v1/wallets/allocate \
-H "Authorization: Bearer $TOKEN" \
-d '{"team_id": "search-team", "amount": 5000.00, "period": "monthly"}'

Model 2: Pay-Per-Use Chargeback

Teams are billed based on actual usage. The wallet tracks consumption and finance reconciles monthly.

# Query team consumption for billing
curl "https://api.keeptrusts.com/v1/events?since=30d&team_id=search-team&aggregate=cost_total" \
-H "Authorization: Bearer $TOKEN"

Model 3: Tiered Pricing

Teams pay different rates based on their consumption tier, incentivizing efficient usage.

TierMonthly UsageEffective RateIncentive
Base0–$1,0001.0x (actual cost)Standard access
Growth$1,001–$5,0000.9x (10% discount)Rewards adoption
Scale$5,001–$20,0000.8x (20% discount)Rewards scale
Enterprise$20,001+0.7x (30% discount)Rewards commitment

Chargeback Report Generation

#!/bin/bash
# monthly-chargeback.sh — Generate chargeback report

API="https://api.keeptrusts.com"
TOKEN="$API_TOKEN"
MONTH=$(date -v-1m +%Y-%m)

echo "# AI Chargeback Report: $MONTH"
echo ""
echo "| Team | Requests | Tokens | Cost | Tier |"
echo "|------|----------|--------|------|------|"

# Fetch per-team usage
curl -s "$API/v1/events?since=${MONTH}-01&until=$(date +%Y-%m-01)&aggregate=cost_by_team" \
-H "Authorization: Bearer $TOKEN" | jq -r '.teams[] | "| \(.name) | \(.count) | \(.total_tokens) | $\(.total_cost) | \(.tier) |"'

Building the Marketplace Catalog

Document your internal AI offerings as a service catalog.

Service Definition

ServiceModelSLARate LimitCost
AI Completions — StandardGPT-4o-mini99.5%500 RPM$0.0006/req
AI Completions — PremiumGPT-4o99.5%500 RPM$0.012/req
AI Analysis — Long ContextClaude Sonnet99.0%200 RPM$0.018/req
AI Classification — EconomyGemini Flash99.5%2000 RPM$0.0003/req
AI Chat — GovernedMulti-model99.0%100 RPMPer wallet

Onboarding Flow

Team requests AI access
→ Platform team reviews request
→ Issue gateway key (scoped to approved services)
→ Allocate wallet (fixed or pay-per-use)
→ Developer starts using AI via OpenAI-compatible SDK
→ Usage metered automatically
→ Monthly chargeback reconciliation

Governance as a Platform Feature

In the marketplace model, governance is not overhead — it is a feature:

  • Developers get instant, self-service access with guardrails that prevent compliance mistakes
  • Finance gets automated cost attribution and chargeback
  • Security gets complete audit trails and DLP enforcement
  • Compliance gets automated evidence generation
  • Leadership gets real-time dashboards for decision-making

Key Takeaways

  1. Treat AI capabilities as internal API products with clear service definitions and pricing
  2. Use gateway keys as subscription tokens with rate limits, model access, and cost controls
  3. Meter every request through the events API for accurate usage attribution
  4. Choose a chargeback model that matches your organizational culture — fixed, pay-per-use, or tiered
  5. Position governance as a marketplace feature, not a tax on innovation

Next steps

For AI systems

  • Canonical terms: internal marketplace, gateway keys (kt_gk_...), POST /v1/tokens, token_type: gateway, rate_limit, allowed_models, consumer groups, wallets, chargeback, usage metering, token lifecycle
  • Key configuration: token_type: gateway with rate_limit (RPM), allowed_models (provider-model list or ["*"] for sandbox)
  • Best next pages: CTO: Platform Engineering, CTO: Developer Velocity, CIO: Cost Optimization

For engineers

  • Production keys: POST /v1/tokens with rate_limit: 5000, allowed_models: ["gpt-4o-mini"] — high throughput, restricted models
  • Development keys: rate_limit: 500, allowed_models: ["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"] — broader access for testing
  • Sandbox keys: rate_limit: 100, allowed_models: ["*"] — experimentation with all models at low volume
  • Automate provisioning in CI/CD: call POST /v1/tokens to issue short-lived keys for deployment pipelines
  • Monitor usage: Console Usage shows per-key/per-team consumption with real-time metering

For leaders

  • The internal marketplace model eliminates duplicate provider relationships, centralizes billing, and enables instant team onboarding (minutes vs days)
  • Tiered gateway keys (production/dev/sandbox) enforce appropriate access levels without separate approval workflows
  • Chargeback via wallet-based metering makes AI costs visible to business unit owners, creating natural incentives for efficiency
  • Platform team staffing: 1 engineer can operate the marketplace for 10+ product teams once templates are established