CTO Guide: Multi-Provider AI Strategy Without Vendor Lock-in
Single-provider dependency is the silent risk in most AI deployments. When your sole provider has an outage, raises prices, or deprecates a model, your entire AI capability goes down. Keeptrusts makes multi-provider a configuration concern, not an engineering project.
Use this page when
- You are configuring multi-provider routing with failover chains to eliminate single-vendor dependency
- You need to set up A/B testing between models or providers for quality and cost comparison
- You want latency-based or cost-based routing to optimize for performance or budget
- You are monitoring provider health and comparing costs across vendors in the console
This guide covers provider routing, failover chains, A/B testing, latency-based routing, health monitoring, and cost comparison.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
Provider Routing Configuration
The gateway supports multiple providers simultaneously. Application code points at the gateway — provider selection is a policy decision, not a code decision.
pack:
name: cto-multi-provider-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
store: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
store: ANTHROPIC_API_KEY
- id: azure-openai
provider:
base_url: https://your-instance.openai.azure.com
secret_key_ref:
store: AZURE_OPENAI_API_KEY
- id: bedrock
provider:
base_url: https://bedrock-runtime.us-east-1.amazonaws.com
secret_key_ref:
store: AWS_BEDROCK_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Application code never changes:
# Same code, any provider — routing is gateway configuration
client = openai.OpenAI(
api_key="kt_gk_...",
base_url="https://gateway.company.com/v1"
)
# Request routes to whichever provider the policy selects
response = client.chat.completions.create(
model="gpt-4o", # Gateway resolves this to the configured provider
messages=[{"role": "user", "content": "Analyze this data"}]
)
Failover Chains
Configure automatic failover so that if a primary provider is unavailable, requests route to a secondary provider without application-level error handling.
model_groups:
- name: production-llm
routing: failover
models:
- provider: openai
model: gpt-4o
priority: 1
timeout_ms: 5000
- provider: azure-openai
model: gpt-4o
priority: 2
timeout_ms: 8000
- provider: anthropic
model: claude-sonnet-4-20250514
priority: 3
timeout_ms: 10000
Failover Behavior
| Scenario | Primary | Fallback | Application Impact |
|---|---|---|---|
| OpenAI timeout | openai/gpt-4o | azure-openai/gpt-4o | None — transparent retry |
| OpenAI + Azure down | Both fail | anthropic/claude-sonnet | Slight latency increase |
| All providers down | All fail | 503 with retry-after | Application handles 503 |
Console checkpoint: The Overview dashboard shows provider health status. Failed requests with successful failover appear as events with failover: true metadata.
Model Group A/B Testing
Test new providers or models in production without risk by routing a small percentage of traffic.
model_groups:
- name: summarization-ab-test
routing: weighted
models:
- provider: openai
model: gpt-4o
weight: 80
- provider: anthropic
model: claude-sonnet-4-20250514
weight: 15
- provider: openai
model: gpt-4o-mini
weight: 5
Measuring A/B Test Results
Use the events API to compare provider performance during the test period:
# Compare latency across providers for the test period
curl "https://api.keeptrusts.com/v1/events?model_group=summarization-ab-test&since=7d&group_by=provider&metrics=latency_p50,latency_p99,cost" \
-H "Authorization: Bearer $API_TOKEN"
| Provider | p50 Latency | p99 Latency | Cost/1M tokens | Quality Score |
|---|---|---|---|---|
| openai/gpt-4o | 1.2s | 3.8s | $12.50 | 92% |
| anthropic/claude-sonnet | 1.4s | 4.2s | $18.00 | 94% |
| openai/gpt-4o-mini | 0.6s | 1.8s | $0.75 | 85% |
Decision framework: If gpt-4o-mini meets your quality threshold (e.g., > 88%), promote it to a higher weight for a 90%+ cost reduction on that workload.
Latency-Based Routing
For latency-sensitive applications, configure the gateway to route based on recent provider response times.
model_groups:
- name: real-time-assistant
routing: latency-priority
models:
- provider: openai
model: gpt-4o
region: us-east
- provider: azure-openai
model: gpt-4o
region: westeurope
- provider: anthropic
model: claude-sonnet-4-20250514
region: us-east
latency_window: 60s # Use last 60 seconds of latency data
max_latency_ms: 3000 # Skip providers above this threshold
The gateway maintains a rolling latency window and routes to the provider with the lowest recent response time. Providers exceeding max_latency_ms are temporarily removed from the rotation.
Provider Health Monitoring in Console
The console provides real-time provider health visibility:
Health Dashboard Panels
| Panel | Metric | Alert Threshold |
|---|---|---|
| Availability | Success rate per provider | < 99.5% |
| Latency | p50 and p99 per provider | p99 > 5s |
| Error Rate | 4xx and 5xx per provider | > 1% |
| Rate Limiting | 429 responses per provider | > 5/minute |
| Cost Efficiency | Cost per successful request | > 2x baseline |
Screenshot reference: Console provider health dashboard showing availability, latency, and error rate for four configured providers.
Console checkpoint: Navigate to Settings → Gateways and select a gateway to see per-provider health metrics. Configure webhook notifications for health threshold breaches.
Cost Comparison Across Providers
Use the console Cost Center to compare actual per-provider costs based on real traffic patterns.
Monthly Provider Cost Comparison
| Provider | Requests | Input Tokens | Output Tokens | Total Cost | Cost/Request |
|---|---|---|---|---|---|
| openai | 450,000 | 900M | 180M | $4,050 | $0.009 |
| anthropic | 120,000 | 240M | 60M | $1,620 | $0.014 |
| azure-openai | 180,000 | 360M | 72M | $1,620 | $0.009 |
| bedrock | 50,000 | 100M | 20M | $350 | $0.007 |
# CLI: generate a provider cost comparison report
kt events list \
--since 30d \
--group-by provider \
--fields provider,request_count,total_input_tokens,total_output_tokens,total_cost \
--format table
Migration Playbook: Single → Multi-Provider
| Phase | Action | Duration |
|---|---|---|
| 1. Assess | Audit current provider usage via gateway events | 1 week |
| 2. Configure | Add secondary provider(s) to gateway config | 1 day |
| 3. Test | Route 5% traffic to new provider, monitor quality | 2 weeks |
| 4. Validate | Compare latency, cost, and quality metrics | 1 week |
| 5. Promote | Increase traffic split based on test results | Ongoing |
| 6. Failover | Enable automatic failover chains | 1 day |
ROI Summary
| Capability | Risk Mitigated | Financial Impact |
|---|---|---|
| Failover chains | Provider outage downtime | Avoid $X/hour of lost productivity |
| A/B testing | Overpaying for equivalent quality | 15–40% cost reduction |
| Latency routing | User experience degradation | Improved response times |
| Vendor diversification | Pricing leverage | Negotiating power on renewals |
| Health monitoring | Silent degradation | Early detection, faster resolution |
Next steps
- Add a second provider to your gateway configuration
- Route 5–10% of traffic to the new provider for evaluation
- Monitor provider health in the console for 14 days
- Compare cost and latency metrics across providers
- Enable failover chains for production workloads
See also: CTO Guide: AI Platform Engineering · CIO Guide: Cutting AI Infrastructure Costs by 40%
For AI systems
- Canonical terms: multi-provider routing, failover chains,
model_groups,routing: failover,routing: weighted, A/B testing, latency-based routing, provider health,providers[].priority,timeout_ms, model mapping, console Overview dashboard - Key configuration:
model_groups[].routing(failover | weighted | latency),models[].provider,models[].priority,models[].timeout_ms - Best next pages: CIO: Cost Optimization, CTO: Platform Engineering, Resilience Engineering
For engineers
- Failover: set
priority: 1(primary),priority: 2(fallback) withtimeout_msper provider; gateway auto-retries on 5xx/timeout - Model mapping:
gpt-4o→openai/gpt-4o(priority 1),azure-openai/gpt-4o(priority 2),anthropic/claude-sonnet-4-20250514(priority 3) - A/B testing: configure
routing: weightedwith percentage split (e.g., 80/20) to compare quality metrics across providers - Application code never changes — same
openai.OpenAI(api_key="kt_gk_...", base_url=...)call, routing is gateway configuration - Console checkpoint: Overview dashboard shows provider health status; failed requests with successful failover appear as events with
failover: true
For leaders
- Multi-provider eliminates single-vendor lock-in risk — provider outages, price increases, and model deprecations become configuration changes, not engineering projects
- A/B testing enables data-driven provider selection based on actual quality, latency, and cost metrics — not vendor marketing
- Failover transparency means applications never see provider outages unless all configured providers are simultaneously down
- Cost comparison across providers enables negotiation leverage and budget optimization without code changes