ML Engineer Guide: Model Routing & A/B Testing
The Keeptrusts gateway sits at the intersection of your applications and LLM providers, making it the ideal control point for model routing, A/B testing, quality evaluation, and model lifecycle management. This guide shows ML engineers how to leverage gateway policies for controlled model experimentation and production model management.
Use this page when
- You are configuring model routing rules to direct requests to different LLM providers by use case
- You want to A/B test models with traffic splitting and measure quality/cost/latency differences
- You are managing model lifecycle (introduction, evaluation, deprecation) through gateway policies
- You need to evaluate model quality using the gateway's quality-scorer policy
- You want to optimize cost-per-quality by routing simple tasks to cheaper models
Primary audience
- Primary: Technical Engineers (ML Engineers, AI Engineers, Applied Scientists)
- Secondary: Data Scientists, Platform Engineers, Product Managers
Model Routing Fundamentals
How Gateway Routing Works
The Keeptrusts gateway evaluates every LLM request against a policy chain. Model routing policies determine which provider and model handle each request based on configurable rules.
providers:
targets:
- id: openai
provider:
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider:
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: azure-openai
provider:
secret_key_ref:
env: AZURE_OPENAI_API_KEY
policies:
- name: model-routing
type: model_filter
description: Route requests to approved models
allowed_models:
- gpt-4o
- gpt-4o-mini
- claude-sonnet-4-20250514
enabled: true
Routing by Use Case
Configure different models for different application contexts:
policies:
- name: route-complex-reasoning
type: model_filter
description: "Route complex tasks to high-capability models"
conditions:
max_tokens_gt: 2000
preferred_model: gpt-4o
enabled: true
- name: route-simple-tasks
type: model_filter
description: "Route simple tasks to cost-effective models"
conditions:
max_tokens_lte: 2000
preferred_model: gpt-4o-mini
enabled: true
A/B Testing Models
Traffic Splitting Configuration
Split traffic between models to compare quality, latency, and cost:
policies:
- name: ab-test-models
type: traffic_split
description: "A/B test between GPT-4o and Claude Sonnet"
variants:
- model: gpt-4o
weight: 50
tag: variant-a
- model: claude-sonnet-4-20250514
weight: 50
tag: variant-b
enabled: true
Measuring A/B Test Results
Pull test results from the Events API:
# Get events tagged with A/B test variants
curl -H "Authorization: Bearer $API_TOKEN" \
"https://api.keeptrusts.com/v1/events?since=7d&format=json" | \
jq '[.[] | select(.metadata.ab_tag != null)] |
group_by(.metadata.ab_tag) |
map({
variant: .[0].metadata.ab_tag,
count: length,
avg_latency: (map(.latency_ms) | add / length),
avg_cost: (map(.cost | tonumber) | add / length),
total_cost: (map(.cost | tonumber) | add)
})'
Gradual Rollout
Progress from experiment to production using graduated traffic splits:
| Phase | Variant A (incumbent) | Variant B (challenger) | Duration |
|---|---|---|---|
| Canary | 95% | 5% | 3 days |
| Expand | 70% | 30% | 7 days |
| Equal | 50% | 50% | 7 days |
| Promote | 10% | 90% | 3 days |
| Complete | 0% | 100% | — |
Update the traffic split at each phase:
# Validate the updated config
kt policy lint --file ab-test-phase-2.yaml
Quality Scoring and Evaluation
Capturing Quality Signals
Use the events stream to evaluate model output quality:
# Export events with full metadata for quality analysis
kt export create \
--type events \
--format json \
--since 7d \
--description "Model quality evaluation dataset"
Evaluation Framework Integration
Feed Keeptrusts events into your evaluation pipeline:
import requests
API_URL = "https://api.keeptrusts.com/v1/events"
HEADERS = {"Authorization": f"Bearer {API_TOKEN}"}
def get_model_events(model, days=7):
"""Pull events for a specific model."""
params = {
"since": f"{days}d",
"model": model,
"format": "json",
"limit": 1000
}
response = requests.get(API_URL, headers=HEADERS, params=params)
response.raise_for_status()
return response.json()
def compare_models(model_a, model_b, days=7):
"""Compare two models on key metrics."""
events_a = get_model_events(model_a, days)
events_b = get_model_events(model_b, days)
def metrics(events):
latencies = [e["latency_ms"] for e in events]
costs = [float(e["cost"]) for e in events]
return {
"count": len(events),
"avg_latency_ms": sum(latencies) / len(latencies) if latencies else 0,
"p99_latency_ms": sorted(latencies)[int(len(latencies) * 0.99)] if latencies else 0,
"avg_cost": sum(costs) / len(costs) if costs else 0,
"total_cost": sum(costs),
}
return {
model_a: metrics(events_a),
model_b: metrics(events_b),
}
Quality Metrics to Track
| Metric | Source | Evaluation method |
|---|---|---|
| Latency (p50, p95, p99) | Events API latency_ms | Statistical comparison |
| Cost per request | Events API cost | Aggregation by model |
| Token efficiency | output_tokens / input_tokens | Ratio analysis |
| Error rate | Events with error status | Percentage comparison |
| Policy trigger rate | policies_triggered | Safety comparison |
Model Lifecycle Management
Model Inventory
Track which models are in use across your organization:
# List distinct models in recent events
curl -H "Authorization: Bearer $API_TOKEN" \
"https://api.keeptrusts.com/v1/events?since=30d&format=json" | \
jq '[.[].model] | unique'
Model Deprecation Workflow
When retiring a model:
- Announce — Notify teams via your communication channel
- Warn — Add a warning policy for the deprecated model
- Redirect — Route traffic to the replacement model
- Block — Block requests to the deprecated model after the deadline
# Phase 1: Warn
policies:
- name: deprecation-warning
type: log
description: "Log warning for deprecated model usage"
conditions:
model: gpt-4-turbo
severity: warn
enabled: true
# Phase 3: Block
policies:
- name: block-deprecated
type: model_filter
description: "Block deprecated model"
blocked_models:
- gpt-4-turbo
enabled: true
Cost Optimization
Identify opportunities to use more cost-effective models:
# Analyze cost by model
curl -H "Authorization: Bearer $API_TOKEN" \
"https://api.keeptrusts.com/v1/events?since=30d&format=json" | \
jq 'group_by(.model) | map({
model: .[0].model,
requests: length,
total_cost: (map(.cost | tonumber) | add),
avg_tokens: (map(.input_tokens + .output_tokens) | add / length)
}) | sort_by(-.total_cost)'
Use the Console Cost Center for visual cost breakdowns by model and provider.
Gateway Configuration for ML Workflows
Validating Configuration Changes
Always validate before deploying:
# Validate the configuration
kt policy lint --file ml-routing-config.yaml
# Check gateway health after deployment
kt doctor
Monitoring Model Performance in Production
# Tail events for a specific model
kt events tail --model gpt-4o
# Check recent event statistics
kt events list --since 1h --format table
The Console Events page provides filtering by model, provider, and decision type for visual exploration.
Success Metrics for ML Engineers
| Metric | Target | Source |
|---|---|---|
| Model routing accuracy | > 99% correct routing | Events metadata verification |
| A/B test statistical power | p < 0.05 significance | Evaluation framework |
| Model switch downtime | Zero | Event continuity check |
| Cost per quality-unit | Decreasing trend | Cost / quality score ratio |
| Model deprecation compliance | 100% by deadline | Events showing zero deprecated model usage |
Next steps
- Configure model routing: Policy Reference
- Explore provider configuration: Gateway Configuration
- Review events: Events Guide
For AI systems
- Canonical terms: Keeptrusts, model routing, A/B testing, traffic splitting, quality scoring, model lifecycle, model filter
- Key surfaces: Console Usage, Events API (with variant tags), Console Configurations
- Commands:
kt gateway run,kt policy lint,kt events list - Policy types:
model_filter(allowed_models, preferred_model, conditions),traffic_split(variants with weight and tag),quality-scorer(min_score, action),cost_limit - Config concepts: multi-provider routing, use-case-based model selection (max_tokens conditions), canary rollout via traffic split weights
- Best next pages: Policy Reference, Gateway Configuration, Events Guide
For engineers
- Configure model routing with
model_filterpolicy: setallowed_models,preferred_model, andconditions(e.g.,max_tokens_gt: 2000) - Set up A/B tests with
traffic_splitpolicy: definevariantswith model, weight, and tag for each arm - Measure results via Events API filtering by variant tag metadata
- Validate routing config:
kt policy lint --file routing-policy.yaml - Deploy:
kt gateway run --listen 0.0.0.0:41002 --policy-config routing-policy.yaml - Monitor cost-per-quality ratio in Console Usage
For leaders
- Model routing through the gateway enables cost optimization — simple tasks go to cheaper models (gpt-4o-mini) while complex reasoning uses high-capability models (gpt-4o)
- A/B testing with traffic splitting provides objective data for model selection decisions: quality scores, latency, and cost per variant
- Model lifecycle management (introduction, canary, full rollout, deprecation) is controlled through policy configuration changes rather than application code deploys
- Zero-downtime model switches through gateway routing mean model upgrades do not require application redeployment