Tutorial: Traffic Mirroring for Shadow Testing
This tutorial shows you how to configure traffic mirroring in the Keeptrusts gateway to send live requests to both a primary and a shadow provider, compare responses for quality, and make data-driven provider migration decisions — without affecting end users.
Use this page when
- You want to shadow-test a new LLM provider without affecting end users.
- You are configuring traffic mirroring to compare primary and shadow provider responses.
- You need to measure latency, token count, and content similarity between two providers.
- You are making a data-driven provider migration decision.
Primary audience
- Primary: ML engineers and platform teams evaluating provider migrations with zero user risk
- Secondary: Product managers comparing model quality; finance teams assessing cost of provider switch
Prerequisites
ktCLI installed (first-run tutorial)- Two LLM provider API keys (e.g.,
OPENAI_API_KEYandANTHROPIC_API_KEY) curlandjqinstalled
How Traffic Mirroring Works
Traffic mirroring duplicates incoming requests to a shadow provider asynchronously. The primary provider's response is returned to the caller immediately. The shadow provider's response is logged for comparison but never shown to the user.
Request ──┬──▶ Primary Provider (openai) ──▶ Response to Caller
│
└──▶ Shadow Provider (anthropic) ──▶ Logged for comparison only
Step 1: Create the Mirror Configuration
Create policy-config.yaml with traffic mirroring enabled:
version: '1'
providers:
targets:
- id: openai
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider: anthropic
secret_key_ref:
env: ANTHROPIC_API_KEY
traffic_mirror:
enabled: true
primary: openai
shadow: anthropic
model_mapping:
gpt-4o-mini: claude-sonnet-4-20250514
gpt-4o: claude-sonnet-4-20250514
sample_rate: 1.0
async: true
log_shadow_response: true
compare_metrics:
- latency
- token_count
- content_similarity
policies:
- name: content-filter
type: content_filter
action: flag
Configuration breakdown
| Field | Purpose |
|---|---|
primary | Provider whose response is returned to the caller |
shadow | Provider that receives mirrored traffic for comparison |
model_mapping | Maps primary models to equivalent shadow models |
sample_rate | Fraction of traffic to mirror (1.0 = 100%, 0.1 = 10%) |
async | Mirror requests asynchronously to avoid added latency |
log_shadow_response | Store shadow responses in decision events |
compare_metrics | Metrics to compute between primary and shadow responses |
Step 2: Validate and Start the Gateway
kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002
Expected output:
INFO keeptrusts::gateway Loaded 2 provider(s), 1 policy(ies)
INFO keeptrusts::gateway Traffic mirror: primary=openai, shadow=anthropic, sample_rate=100%
INFO keeptrusts::gateway Model mapping: gpt-4o-mini→claude-sonnet-4-20250514, gpt-4o→claude-sonnet-4-20250514
INFO keeptrusts::gateway Gateway ready
Step 3: Send Test Requests
Send a request through the gateway. The caller receives only the primary (OpenAI) response:
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Explain the concept of technical debt in software engineering."}]
}' | jq '{model: .model, provider: .provider}'
Expected:
{
"model": "gpt-4o-mini",
"provider": "openai"
}
The shadow request to Anthropic runs asynchronously — the caller sees no added latency.
Step 4: Review Mirror Comparison Events
Check the decision events to see both responses and comparison metrics:
kt events tail --last 1 --format json | jq '.traffic_mirror'
Expected output:
{
"primary": {
"provider": "openai",
"model": "gpt-4o-mini",
"latency_ms": 834,
"input_tokens": 18,
"output_tokens": 245,
"status": "success"
},
"shadow": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"latency_ms": 1102,
"input_tokens": 18,
"output_tokens": 312,
"status": "success"
},
"comparison": {
"latency_diff_ms": 268,
"token_count_diff": 67,
"content_similarity": 0.87
}
}
Step 5: Run a Batch Comparison
Send multiple requests to gather statistically meaningful data:
PROMPTS=(
"What is Kubernetes?"
"Explain CORS in simple terms."
"Write a Python function to calculate Fibonacci numbers."
"What are the SOLID principles?"
"Describe the differences between SQL and NoSQL databases."
)
for prompt in "${PROMPTS[@]}"; do
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"$prompt\"}]}" > /dev/null
echo "Sent: $prompt"
done
Step 6: Analyze Aggregate Results
Query events for mirror comparison summaries:
kt events list --last 50 --format json \
| jq '[.[] | select(.traffic_mirror != null) | .traffic_mirror] | {
total_mirrored: length,
avg_primary_latency_ms: (map(.primary.latency_ms) | add / length | round),
avg_shadow_latency_ms: (map(.shadow.latency_ms) | add / length | round),
avg_similarity: (map(.comparison.content_similarity) | add / length * 100 | round / 100),
avg_primary_output_tokens: (map(.primary.output_tokens) | add / length | round),
avg_shadow_output_tokens: (map(.shadow.output_tokens) | add / length | round),
shadow_failures: (map(select(.shadow.status != "success")) | length)
}'
Example output:
{
"total_mirrored": 50,
"avg_primary_latency_ms": 780,
"avg_shadow_latency_ms": 1050,
"avg_similarity": 0.84,
"avg_primary_output_tokens": 230,
"avg_shadow_output_tokens": 298,
"shadow_failures": 1
}
Step 7: Reduce Sample Rate for Production
For high-traffic gateways, mirror a fraction of traffic to limit shadow provider costs:
traffic_mirror:
enabled: true
primary: openai
shadow: anthropic
model_mapping:
gpt-4o-mini: claude-sonnet-4-20250514
sample_rate: 0.1 # Mirror 10% of traffic
async: true
log_shadow_response: true
Restart the local gateway so it picks up the updated config:
kt gateway run --policy-config policy-config.yaml
Step 8: Mirror to a Staging Endpoint
You can also mirror traffic to a staging version of the same provider for pre-release testing:
pack:
name: traffic-mirroring-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-production
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-staging
provider: openai
base_url: https://staging-api.openai.com
secret_key_ref:
env: OPENAI_API_KEY_STAGING
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
For AI systems
- Canonical terms: Keeptrusts gateway, traffic mirroring, shadow testing, primary provider, shadow provider, model mapping, sample rate.
- Config fields:
traffic_mirror.enabled,traffic_mirror.primary,traffic_mirror.shadow,traffic_mirror.model_mapping,traffic_mirror.sample_rate,traffic_mirror.async,traffic_mirror.log_shadow_response,traffic_mirror.compare_metrics. - CLI commands:
kt gateway run,kt policy lint,kt events tail --json. - Best next pages: Model Routing A/B Test, Multi-Provider Failover, Cost Tracking & Budgets.
For engineers
- Prerequisites:
ktCLI, API keys for two providers,curlandjq. - Validate:
kt policy lintconfirms mirror config, model mapping, and sample rate. - Shadow is async: mirror requests don't add latency to the primary path.
- Sample rate: start at
1.0(mirror everything) for evaluation, reduce to0.1in production to control shadow provider cost. - Compare:
kt events tail --json | jq '.mirror'shows per-request comparison metrics (latency, tokens, similarity).
For leaders
- Traffic mirroring enables risk-free provider evaluation — shadow responses are never shown to users.
- Comparison data (quality, latency, cost) supports evidence-based migration decisions.
- Sample rate controls shadow provider costs during evaluation periods.
- After sufficient evidence, transition to weighted A/B testing for gradual rollout.
Next steps
- Model Routing A/B Test — transition from mirroring to weighted traffic splits
- Cost Tracking & Budgets — monitor shadow provider usage costs
- Export Compliance Evidence — export comparison data for provider evaluation reports