CTO Guide: Multi-Provider AI Strategy Without Vendor Lock-in

Single-provider dependency is the silent risk in most AI deployments. When your sole provider has an outage, raises prices, or deprecates a model, your entire AI capability goes down. Keeptrusts makes multi-provider a configuration concern, not an engineering project.

Use this page when

You are configuring multi-provider routing with failover chains to eliminate single-vendor dependency
You need to set up A/B testing between models or providers for quality and cost comparison
You want latency-based or cost-based routing to optimize for performance or budget
You are monitoring provider health and comparing costs across vendors in the console

This guide covers provider routing, failover chains, A/B testing, latency-based routing, health monitoring, and cost comparison.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

Provider Routing Configuration

The gateway supports multiple providers simultaneously. Application code points at the gateway — provider selection is a policy decision, not a code decision.

pack:
  name: cto-multi-provider-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai
    provider: 
    base_url: https://api.openai.com/v1
    secret_key_ref:
      store: OPENAI_API_KEY
  - id: anthropic
    provider: 
    base_url: https://api.anthropic.com/v1
    secret_key_ref:
      store: ANTHROPIC_API_KEY
  - id: azure-openai
    provider: 
    base_url: https://your-instance.openai.azure.com
    secret_key_ref:
      store: AZURE_OPENAI_API_KEY
  - id: bedrock
    provider: 
    base_url: https://bedrock-runtime.us-east-1.amazonaws.com
    secret_key_ref:
      store: AWS_BEDROCK_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Application code never changes:

# Same code, any provider — routing is gateway configuration
client = openai.OpenAI(
    api_key="kt_gk_...",
    base_url="https://gateway.company.com/v1"
)

# Request routes to whichever provider the policy selects
response = client.chat.completions.create(
    model="gpt-4o",  # Gateway resolves this to the configured provider
    messages=[{"role": "user", "content": "Analyze this data"}]
)

Failover Chains

Configure automatic failover so that if a primary provider is unavailable, requests route to a secondary provider without application-level error handling.

model_groups:
  - name: production-llm
    routing: failover
    models:
      - provider: openai
        model: gpt-4o
        priority: 1
        timeout_ms: 5000
      - provider: azure-openai
        model: gpt-4o
        priority: 2
        timeout_ms: 8000
      - provider: anthropic
        model: claude-sonnet-4-20250514
        priority: 3
        timeout_ms: 10000

Failover Behavior

Scenario	Primary	Fallback	Application Impact
OpenAI timeout	openai/gpt-4o	azure-openai/gpt-4o	None — transparent retry
OpenAI + Azure down	Both fail	anthropic/claude-sonnet	Slight latency increase
All providers down	All fail	503 with retry-after	Application handles 503

Console checkpoint: The Overview dashboard shows provider health status. Failed requests with successful failover appear as events with failover: true metadata.

Model Group A/B Testing

Test new providers or models in production without risk by routing a small percentage of traffic.

model_groups:
  - name: summarization-ab-test
    routing: weighted
    models:
      - provider: openai
        model: gpt-4o
        weight: 80
      - provider: anthropic
        model: claude-sonnet-4-20250514
        weight: 15
      - provider: openai
        model: gpt-4o-mini
        weight: 5

Measuring A/B Test Results

Use the events API to compare provider performance during the test period:

# Compare latency across providers for the test period
curl "https://api.keeptrusts.com/v1/events?model_group=summarization-ab-test&since=7d&group_by=provider&metrics=latency_p50,latency_p99,cost" \
  -H "Authorization: Bearer $API_TOKEN"

Provider	p50 Latency	p99 Latency	Cost/1M tokens	Quality Score
openai/gpt-4o	1.2s	3.8s	$12.50	92%
anthropic/claude-sonnet	1.4s	4.2s	$18.00	94%
openai/gpt-4o-mini	0.6s	1.8s	$0.75	85%

Decision framework: If gpt-4o-mini meets your quality threshold (e.g., > 88%), promote it to a higher weight for a 90%+ cost reduction on that workload.

Latency-Based Routing

For latency-sensitive applications, configure the gateway to route based on recent provider response times.

model_groups:
  - name: real-time-assistant
    routing: latency-priority
    models:
      - provider: openai
        model: gpt-4o
        region: us-east
      - provider: azure-openai
        model: gpt-4o
        region: westeurope
      - provider: anthropic
        model: claude-sonnet-4-20250514
        region: us-east
    latency_window: 60s  # Use last 60 seconds of latency data
    max_latency_ms: 3000  # Skip providers above this threshold

The gateway maintains a rolling latency window and routes to the provider with the lowest recent response time. Providers exceeding max_latency_ms are temporarily removed from the rotation.

Provider Health Monitoring in Console

The console provides real-time provider health visibility:

Health Dashboard Panels

Panel	Metric	Alert Threshold
Availability	Success rate per provider	< 99.5%
Latency	p50 and p99 per provider	p99 > 5s
Error Rate	4xx and 5xx per provider	> 1%
Rate Limiting	429 responses per provider	> 5/minute
Cost Efficiency	Cost per successful request	> 2x baseline

Screenshot reference: Console provider health dashboard showing availability, latency, and error rate for four configured providers.

Console checkpoint: Navigate to Settings → Gateways and select a gateway to see per-provider health metrics. Configure webhook notifications for health threshold breaches.

Cost Comparison Across Providers

Use the console Cost Center to compare actual per-provider costs based on real traffic patterns.

Monthly Provider Cost Comparison

Provider	Requests	Input Tokens	Output Tokens	Total Cost	Cost/Request
openai	450,000	900M	180M	$4,050	$0.009
anthropic	120,000	240M	60M	$1,620	$0.014
azure-openai	180,000	360M	72M	$1,620	$0.009
bedrock	50,000	100M	20M	$350	$0.007

# CLI: generate a provider cost comparison report
kt events list \
  --since 30d \
  --group-by provider \
  --fields provider,request_count,total_input_tokens,total_output_tokens,total_cost \
  --format table

Migration Playbook: Single → Multi-Provider

Phase	Action	Duration
1. Assess	Audit current provider usage via gateway events	1 week
2. Configure	Add secondary provider(s) to gateway config	1 day
3. Test	Route 5% traffic to new provider, monitor quality	2 weeks
4. Validate	Compare latency, cost, and quality metrics	1 week
5. Promote	Increase traffic split based on test results	Ongoing
6. Failover	Enable automatic failover chains	1 day

ROI Summary

Capability	Risk Mitigated	Financial Impact
Failover chains	Provider outage downtime	Avoid $X/hour of lost productivity
A/B testing	Overpaying for equivalent quality	15–40% cost reduction
Latency routing	User experience degradation	Improved response times
Vendor diversification	Pricing leverage	Negotiating power on renewals
Health monitoring	Silent degradation	Early detection, faster resolution

Next steps

Add a second provider to your gateway configuration
Route 5–10% of traffic to the new provider for evaluation
Monitor provider health in the console for 14 days
Compare cost and latency metrics across providers
Enable failover chains for production workloads

For AI systems

Canonical terms: multi-provider routing, failover chains, model_groups, routing: failover, routing: weighted, A/B testing, latency-based routing, provider health, providers[].priority, timeout_ms, model mapping, console Overview dashboard
Key configuration: model_groups[].routing (failover | weighted | latency), models[].provider, models[].priority, models[].timeout_ms
Best next pages: CIO: Cost Optimization, CTO: Platform Engineering, Resilience Engineering

For engineers

Failover: set priority: 1 (primary), priority: 2 (fallback) with timeout_ms per provider; gateway auto-retries on 5xx/timeout
Model mapping: gpt-4o → openai/gpt-4o (priority 1), azure-openai/gpt-4o (priority 2), anthropic/claude-sonnet-4-20250514 (priority 3)
A/B testing: configure routing: weighted with percentage split (e.g., 80/20) to compare quality metrics across providers
Application code never changes — same openai.OpenAI(api_key="kt_gk_...", base_url=...) call, routing is gateway configuration
Console checkpoint: Overview dashboard shows provider health status; failed requests with successful failover appear as events with failover: true

For leaders

Multi-provider eliminates single-vendor lock-in risk — provider outages, price increases, and model deprecations become configuration changes, not engineering projects
A/B testing enables data-driven provider selection based on actual quality, latency, and cost metrics — not vendor marketing
Failover transparency means applications never see provider outages unless all configured providers are simultaneously down
Cost comparison across providers enables negotiation leverage and budget optimization without code changes

Use this page when​

Primary audience​

Provider Routing Configuration​

Failover Chains​

Failover Behavior​

Model Group A/B Testing​

Measuring A/B Test Results​

Latency-Based Routing​

Provider Health Monitoring in Console​

Health Dashboard Panels​

Cost Comparison Across Providers​

Monthly Provider Cost Comparison​

Migration Playbook: Single → Multi-Provider​

ROI Summary​

Next steps​

For AI systems​

For engineers​

For leaders​