Tutorial: Traffic Mirroring for Shadow Testing

This tutorial shows you how to configure traffic mirroring in the Keeptrusts gateway to send live requests to both a primary and a shadow provider, compare responses for quality, and make data-driven provider migration decisions — without affecting end users.

Use this page when

You want to shadow-test a new LLM provider without affecting end users.
You are configuring traffic mirroring to compare primary and shadow provider responses.
You need to measure latency, token count, and content similarity between two providers.
You are making a data-driven provider migration decision.

Primary audience

Primary: ML engineers and platform teams evaluating provider migrations with zero user risk
Secondary: Product managers comparing model quality; finance teams assessing cost of provider switch

Prerequisites

kt CLI installed (first-run tutorial)
Two LLM provider API keys (e.g., OPENAI_API_KEY and ANTHROPIC_API_KEY)
curl and jq installed

How Traffic Mirroring Works

Traffic mirroring duplicates incoming requests to a shadow provider asynchronously. The primary provider's response is returned to the caller immediately. The shadow provider's response is logged for comparison but never shown to the user.

Request ──┬──▶ Primary Provider (openai) ──▶ Response to Caller
          │
          └──▶ Shadow Provider (anthropic) ──▶ Logged for comparison only

Step 1: Create the Mirror Configuration

Create policy-config.yaml with traffic mirroring enabled:

version: '1'
providers:
  targets:
  - id: openai
    provider: openai
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic
    provider: anthropic
    secret_key_ref:
      env: ANTHROPIC_API_KEY
traffic_mirror:
  enabled: true
  primary: openai
  shadow: anthropic
  model_mapping:
    gpt-4o-mini: claude-sonnet-4-20250514
    gpt-4o: claude-sonnet-4-20250514
  sample_rate: 1.0
  async: true
  log_shadow_response: true
  compare_metrics:
  - latency
  - token_count
  - content_similarity
policies:
- name: content-filter
  type: content_filter
  action: flag

Configuration breakdown

Field	Purpose
`primary`	Provider whose response is returned to the caller
`shadow`	Provider that receives mirrored traffic for comparison
`model_mapping`	Maps primary models to equivalent shadow models
`sample_rate`	Fraction of traffic to mirror (1.0 = 100%, 0.1 = 10%)
`async`	Mirror requests asynchronously to avoid added latency
`log_shadow_response`	Store shadow responses in decision events
`compare_metrics`	Metrics to compute between primary and shadow responses

Step 2: Validate and Start the Gateway

kt policy lint --file policy-config.yaml
kt gateway run --policy-config policy-config.yaml --port 41002

Expected output:

INFO  keeptrusts::gateway Loaded 2 provider(s), 1 policy(ies)
INFO  keeptrusts::gateway Traffic mirror: primary=openai, shadow=anthropic, sample_rate=100%
INFO  keeptrusts::gateway Model mapping: gpt-4o-mini→claude-sonnet-4-20250514, gpt-4o→claude-sonnet-4-20250514
INFO  keeptrusts::gateway Gateway ready

Step 3: Send Test Requests

Send a request through the gateway. The caller receives only the primary (OpenAI) response:

curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Explain the concept of technical debt in software engineering."}]
  }' | jq '{model: .model, provider: .provider}'

Expected:

{
  "model": "gpt-4o-mini",
  "provider": "openai"
}

The shadow request to Anthropic runs asynchronously — the caller sees no added latency.

Step 4: Review Mirror Comparison Events

Check the decision events to see both responses and comparison metrics:

kt events tail --last 1 --format json | jq '.traffic_mirror'

Expected output:

{
  "primary": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "latency_ms": 834,
    "input_tokens": 18,
    "output_tokens": 245,
    "status": "success"
  },
  "shadow": {
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514",
    "latency_ms": 1102,
    "input_tokens": 18,
    "output_tokens": 312,
    "status": "success"
  },
  "comparison": {
    "latency_diff_ms": 268,
    "token_count_diff": 67,
    "content_similarity": 0.87
  }
}

Step 5: Run a Batch Comparison

Send multiple requests to gather statistically meaningful data:

PROMPTS=(
  "What is Kubernetes?"
  "Explain CORS in simple terms."
  "Write a Python function to calculate Fibonacci numbers."
  "What are the SOLID principles?"
  "Describe the differences between SQL and NoSQL databases."
)

for prompt in "${PROMPTS[@]}"; do
  curl -s http://localhost:41002/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d "{\"model\":\"gpt-4o-mini\",\"messages\":[{\"role\":\"user\",\"content\":\"$prompt\"}]}" > /dev/null
  echo "Sent: $prompt"
done

Step 6: Analyze Aggregate Results

Query events for mirror comparison summaries:

kt events list --last 50 --format json \
  | jq '[.[] | select(.traffic_mirror != null) | .traffic_mirror] | {
    total_mirrored: length,
    avg_primary_latency_ms: (map(.primary.latency_ms) | add / length | round),
    avg_shadow_latency_ms: (map(.shadow.latency_ms) | add / length | round),
    avg_similarity: (map(.comparison.content_similarity) | add / length * 100 | round / 100),
    avg_primary_output_tokens: (map(.primary.output_tokens) | add / length | round),
    avg_shadow_output_tokens: (map(.shadow.output_tokens) | add / length | round),
    shadow_failures: (map(select(.shadow.status != "success")) | length)
  }'

Example output:

{
  "total_mirrored": 50,
  "avg_primary_latency_ms": 780,
  "avg_shadow_latency_ms": 1050,
  "avg_similarity": 0.84,
  "avg_primary_output_tokens": 230,
  "avg_shadow_output_tokens": 298,
  "shadow_failures": 1
}

Step 7: Reduce Sample Rate for Production

For high-traffic gateways, mirror a fraction of traffic to limit shadow provider costs:

traffic_mirror:
  enabled: true
  primary: openai
  shadow: anthropic
  model_mapping:
    gpt-4o-mini: claude-sonnet-4-20250514
  sample_rate: 0.1   # Mirror 10% of traffic
  async: true
  log_shadow_response: true

Restart the local gateway so it picks up the updated config:

kt gateway run --policy-config policy-config.yaml

Step 8: Mirror to a Staging Endpoint

You can also mirror traffic to a staging version of the same provider for pre-release testing:

pack:
  name: traffic-mirroring-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-production
    provider: openai
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: openai-staging
    provider: openai
    base_url: https://staging-api.openai.com
    secret_key_ref:
      env: OPENAI_API_KEY_STAGING
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

For AI systems

Canonical terms: Keeptrusts gateway, traffic mirroring, shadow testing, primary provider, shadow provider, model mapping, sample rate.
Config fields: traffic_mirror.enabled, traffic_mirror.primary, traffic_mirror.shadow, traffic_mirror.model_mapping, traffic_mirror.sample_rate, traffic_mirror.async, traffic_mirror.log_shadow_response, traffic_mirror.compare_metrics.
CLI commands: kt gateway run, kt policy lint, kt events tail --json.
Best next pages: Model Routing A/B Test, Multi-Provider Failover, Cost Tracking & Budgets.

For engineers

Prerequisites: kt CLI, API keys for two providers, curl and jq.
Validate: kt policy lint confirms mirror config, model mapping, and sample rate.
Shadow is async: mirror requests don't add latency to the primary path.
Sample rate: start at 1.0 (mirror everything) for evaluation, reduce to 0.1 in production to control shadow provider cost.
Compare: kt events tail --json | jq '.mirror' shows per-request comparison metrics (latency, tokens, similarity).

For leaders

Traffic mirroring enables risk-free provider evaluation — shadow responses are never shown to users.
Comparison data (quality, latency, cost) supports evidence-based migration decisions.
Sample rate controls shadow provider costs during evaluation periods.
After sufficient evidence, transition to weighted A/B testing for gradual rollout.

Next steps

Model Routing A/B Test — transition from mirroring to weighted traffic splits
Cost Tracking & Budgets — monitor shadow provider usage costs
Export Compliance Evidence — export comparison data for provider evaluation reports

Use this page when​

Primary audience​

Prerequisites​

How Traffic Mirroring Works​

Step 1: Create the Mirror Configuration​

Configuration breakdown​

Step 2: Validate and Start the Gateway​

Step 3: Send Test Requests​

Step 4: Review Mirror Comparison Events​

Step 5: Run a Batch Comparison​

Step 6: Analyze Aggregate Results​

Step 7: Reduce Sample Rate for Production​

Step 8: Mirror to a Staging Endpoint​

For AI systems​

For engineers​

For leaders​

Next steps​