Skip to main content
Browse docs

Tutorial: Traffic Mirroring for Shadow Testing

This tutorial shows you how to configure traffic mirroring in the Keeptrusts gateway to send live requests to both a primary and a shadow provider, compare responses for quality, and make data-driven provider migration decisions — without affecting end users.

Use this page when

  • You want to shadow-test a new LLM provider without affecting end users.
  • You are configuring traffic mirroring to compare primary and shadow provider responses.
  • You need to measure latency, token count, and content similarity between two providers.
  • You are making a data-driven provider migration decision.

Primary audience

  • Primary: ML engineers and platform teams evaluating provider migrations with zero user risk
  • Secondary: Product managers comparing model quality; finance teams assessing cost of provider switch

Prerequisites

  • kt CLI installed (first-run tutorial)
  • Two LLM provider API keys (e.g., OPENAI_API_KEY and ANTHROPIC_API_KEY)
  • curl and jq installed

How Traffic Mirroring Works

Traffic mirroring duplicates incoming requests to a shadow provider asynchronously. The primary provider's response is returned to the caller immediately. The shadow provider's response is logged for comparison but never shown to the user.

Request ──┬──▶ Primary Provider (openai) ──▶ Response to Caller

└──▶ Shadow Provider (anthropic) ──▶ Logged for comparison only

Step 1: Create the Mirror Configuration

Create policy-config.yaml with traffic mirroring enabled:

version: '1'
providers:
targets:
- id: openai
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic
provider: anthropic
secret_key_ref:
env: ANTHROPIC_API_KEY
traffic_mirror:
enabled: true
primary: openai
shadow: anthropic
model_mapping:
gpt-4o-mini: claude-sonnet-4-20250514
gpt-4o: claude-sonnet-4-20250514
sample_rate: 1.0
async: true
log_shadow_response: true
compare_metrics:
- latency
- token_count
- content_similarity
policies:
- name: content-filter
type: content_filter
action: flag

Configuration breakdown

FieldPurpose
primaryProvider whose response is returned to the caller
shadowProvider that receives mirrored traffic for comparison
model_mappingMaps primary models to equivalent shadow models
sample_rateFraction of traffic to mirror (1.0 = 100%, 0.1 = 10%)
asyncMirror requests asynchronously to avoid added latency
log_shadow_responseStore shadow responses in decision events
compare_metricsMetrics to compute between primary and shadow responses

Step 2: Validate and Start the Gateway

kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002

Expected output:

INFO keeptrusts::gateway Loaded 2 provider(s), 1 policy(ies)
INFO keeptrusts::gateway Traffic mirror: primary=openai, shadow=anthropic, sample_rate=100%
INFO keeptrusts::gateway Model mapping: gpt-4o-mini→claude-sonnet-4-20250514, gpt-4o→claude-sonnet-4-20250514
INFO keeptrusts::gateway Gateway ready

Step 3: Send Test Requests

Send a request through the gateway. The caller receives only the primary (OpenAI) response:

curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Explain the concept of technical debt in software engineering."}]
}' | jq '{model: .model, provider: .provider}'

Expected:

{
"model": "gpt-4o-mini",
"provider": "openai"
}

The shadow request to Anthropic runs asynchronously — the caller sees no added latency.

Step 4: Review Mirror Comparison Events

Check the decision events to see both responses and comparison metrics:

kt events tail --last 1 --format json | jq '.traffic_mirror'

Expected output:

{
"primary": {
"provider": "openai",
"model": "gpt-4o-mini",
"latency_ms": 834,
"input_tokens": 18,
"output_tokens": 245,
"status": "success"
},
"shadow": {
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"latency_ms": 1102,
"input_tokens": 18,
"output_tokens": 312,
"status": "success"
},
"comparison": {
"latency_diff_ms": 268,
"token_count_diff": 67,
"content_similarity": 0.87
}
}

Step 5: Run a Batch Comparison

Send multiple requests to gather statistically meaningful data:

PROMPTS=(
"What is Kubernetes?"
"Explain CORS in simple terms."
"Write a Python function to calculate Fibonacci numbers."
"What are the SOLID principles?"
"Describe the differences between SQL and NoSQL databases."
)

for prompt in "${PROMPTS[@]}"; do
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"$prompt\"}]}" > /dev/null
echo "Sent: $prompt"
done

Step 6: Analyze Aggregate Results

Query events for mirror comparison summaries:

kt events list --last 50 --format json \
| jq '[.[] | select(.traffic_mirror != null) | .traffic_mirror] | {
total_mirrored: length,
avg_primary_latency_ms: (map(.primary.latency_ms) | add / length | round),
avg_shadow_latency_ms: (map(.shadow.latency_ms) | add / length | round),
avg_similarity: (map(.comparison.content_similarity) | add / length * 100 | round / 100),
avg_primary_output_tokens: (map(.primary.output_tokens) | add / length | round),
avg_shadow_output_tokens: (map(.shadow.output_tokens) | add / length | round),
shadow_failures: (map(select(.shadow.status != "success")) | length)
}'

Example output:

{
"total_mirrored": 50,
"avg_primary_latency_ms": 780,
"avg_shadow_latency_ms": 1050,
"avg_similarity": 0.84,
"avg_primary_output_tokens": 230,
"avg_shadow_output_tokens": 298,
"shadow_failures": 1
}

Step 7: Reduce Sample Rate for Production

For high-traffic gateways, mirror a fraction of traffic to limit shadow provider costs:

traffic_mirror:
enabled: true
primary: openai
shadow: anthropic
model_mapping:
gpt-4o-mini: claude-sonnet-4-20250514
sample_rate: 0.1 # Mirror 10% of traffic
async: true
log_shadow_response: true

Restart the local gateway so it picks up the updated config:

kt gateway run --policy-config policy-config.yaml

Step 8: Mirror to a Staging Endpoint

You can also mirror traffic to a staging version of the same provider for pre-release testing:

pack:
name: traffic-mirroring-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-production
provider: openai
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-staging
provider: openai
base_url: https://staging-api.openai.com
secret_key_ref:
env: OPENAI_API_KEY_STAGING
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

For AI systems

  • Canonical terms: Keeptrusts gateway, traffic mirroring, shadow testing, primary provider, shadow provider, model mapping, sample rate.
  • Config fields: traffic_mirror.enabled, traffic_mirror.primary, traffic_mirror.shadow, traffic_mirror.model_mapping, traffic_mirror.sample_rate, traffic_mirror.async, traffic_mirror.log_shadow_response, traffic_mirror.compare_metrics.
  • CLI commands: kt gateway run, kt policy lint, kt events tail --json.
  • Best next pages: Model Routing A/B Test, Multi-Provider Failover, Cost Tracking & Budgets.

For engineers

  • Prerequisites: kt CLI, API keys for two providers, curl and jq.
  • Validate: kt policy lint confirms mirror config, model mapping, and sample rate.
  • Shadow is async: mirror requests don't add latency to the primary path.
  • Sample rate: start at 1.0 (mirror everything) for evaluation, reduce to 0.1 in production to control shadow provider cost.
  • Compare: kt events tail --json | jq '.mirror' shows per-request comparison metrics (latency, tokens, similarity).

For leaders

  • Traffic mirroring enables risk-free provider evaluation — shadow responses are never shown to users.
  • Comparison data (quality, latency, cost) supports evidence-based migration decisions.
  • Sample rate controls shadow provider costs during evaluation periods.
  • After sufficient evidence, transition to weighted A/B testing for gradual rollout.

Next steps