Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

CTO Guide: Multi-Provider AI Strategy Without Vendor Lock-in

Single-provider dependency is the silent risk in most AI deployments. When your sole provider has an outage, raises prices, or deprecates a model, your entire AI capability goes down. Keeptrusts makes multi-provider a configuration concern, not an engineering project.

Use this page when

  • You are configuring multi-provider routing with failover chains to eliminate single-vendor dependency
  • You need to set up A/B testing between models or providers for quality and cost comparison
  • You want latency-based or cost-based routing to optimize for performance or budget
  • You are monitoring provider health and comparing costs across vendors in the console

This guide covers provider routing, failover chains, A/B testing, latency-based routing, health monitoring, and cost comparison.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

Provider Routing Configuration

The gateway supports multiple providers simultaneously. Application code points at the gateway — provider selection is a policy decision, not a code decision.

pack:
name: cto-multi-provider-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai
provider:
base_url: https://api.openai.com/v1
secret_key_ref:
store: OPENAI_API_KEY
- id: anthropic
provider:
base_url: https://api.anthropic.com/v1
secret_key_ref:
store: ANTHROPIC_API_KEY
- id: azure-openai
provider:
base_url: https://your-instance.openai.azure.com
secret_key_ref:
store: AZURE_OPENAI_API_KEY
- id: bedrock
provider:
base_url: https://bedrock-runtime.us-east-1.amazonaws.com
secret_key_ref:
store: AWS_BEDROCK_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Application code never changes:

# Same code, any provider — routing is gateway configuration
client = openai.OpenAI(
api_key="kt_gk_...",
base_url="https://gateway.company.com/v1"
)

# Request routes to whichever provider the policy selects
response = client.chat.completions.create(
model="gpt-4o", # Gateway resolves this to the configured provider
messages=[{"role": "user", "content": "Analyze this data"}]
)

Failover Chains

Configure automatic failover so that if a primary provider is unavailable, requests route to a secondary provider without application-level error handling.

model_groups:
- name: production-llm
routing: failover
models:
- provider: openai
model: gpt-4o
priority: 1
timeout_ms: 5000
- provider: azure-openai
model: gpt-4o
priority: 2
timeout_ms: 8000
- provider: anthropic
model: claude-sonnet-4-20250514
priority: 3
timeout_ms: 10000

Failover Behavior

ScenarioPrimaryFallbackApplication Impact
OpenAI timeoutopenai/gpt-4oazure-openai/gpt-4oNone — transparent retry
OpenAI + Azure downBoth failanthropic/claude-sonnetSlight latency increase
All providers downAll fail503 with retry-afterApplication handles 503

Console checkpoint: The Overview dashboard shows provider health status. Failed requests with successful failover appear as events with failover: true metadata.

Model Group A/B Testing

Test new providers or models in production without risk by routing a small percentage of traffic.

model_groups:
- name: summarization-ab-test
routing: weighted
models:
- provider: openai
model: gpt-4o
weight: 80
- provider: anthropic
model: claude-sonnet-4-20250514
weight: 15
- provider: openai
model: gpt-4o-mini
weight: 5

Measuring A/B Test Results

Use the events API to compare provider performance during the test period:

# Compare latency across providers for the test period
curl "https://api.keeptrusts.com/v1/events?model_group=summarization-ab-test&since=7d&group_by=provider&metrics=latency_p50,latency_p99,cost" \
-H "Authorization: Bearer $API_TOKEN"
Providerp50 Latencyp99 LatencyCost/1M tokensQuality Score
openai/gpt-4o1.2s3.8s$12.5092%
anthropic/claude-sonnet1.4s4.2s$18.0094%
openai/gpt-4o-mini0.6s1.8s$0.7585%

Decision framework: If gpt-4o-mini meets your quality threshold (e.g., > 88%), promote it to a higher weight for a 90%+ cost reduction on that workload.

Latency-Based Routing

For latency-sensitive applications, configure the gateway to route based on recent provider response times.

model_groups:
- name: real-time-assistant
routing: latency-priority
models:
- provider: openai
model: gpt-4o
region: us-east
- provider: azure-openai
model: gpt-4o
region: westeurope
- provider: anthropic
model: claude-sonnet-4-20250514
region: us-east
latency_window: 60s # Use last 60 seconds of latency data
max_latency_ms: 3000 # Skip providers above this threshold

The gateway maintains a rolling latency window and routes to the provider with the lowest recent response time. Providers exceeding max_latency_ms are temporarily removed from the rotation.

Provider Health Monitoring in Console

The console provides real-time provider health visibility:

Health Dashboard Panels

PanelMetricAlert Threshold
AvailabilitySuccess rate per provider< 99.5%
Latencyp50 and p99 per providerp99 > 5s
Error Rate4xx and 5xx per provider> 1%
Rate Limiting429 responses per provider> 5/minute
Cost EfficiencyCost per successful request> 2x baseline

Screenshot reference: Console provider health dashboard showing availability, latency, and error rate for four configured providers.

Console checkpoint: Navigate to Settings → Gateways and select a gateway to see per-provider health metrics. Configure webhook notifications for health threshold breaches.

Cost Comparison Across Providers

Use the console Cost Center to compare actual per-provider costs based on real traffic patterns.

Monthly Provider Cost Comparison

ProviderRequestsInput TokensOutput TokensTotal CostCost/Request
openai450,000900M180M$4,050$0.009
anthropic120,000240M60M$1,620$0.014
azure-openai180,000360M72M$1,620$0.009
bedrock50,000100M20M$350$0.007
# CLI: generate a provider cost comparison report
kt events list \
--since 30d \
--group-by provider \
--fields provider,request_count,total_input_tokens,total_output_tokens,total_cost \
--format table

Migration Playbook: Single → Multi-Provider

PhaseActionDuration
1. AssessAudit current provider usage via gateway events1 week
2. ConfigureAdd secondary provider(s) to gateway config1 day
3. TestRoute 5% traffic to new provider, monitor quality2 weeks
4. ValidateCompare latency, cost, and quality metrics1 week
5. PromoteIncrease traffic split based on test resultsOngoing
6. FailoverEnable automatic failover chains1 day

ROI Summary

CapabilityRisk MitigatedFinancial Impact
Failover chainsProvider outage downtimeAvoid $X/hour of lost productivity
A/B testingOverpaying for equivalent quality15–40% cost reduction
Latency routingUser experience degradationImproved response times
Vendor diversificationPricing leverageNegotiating power on renewals
Health monitoringSilent degradationEarly detection, faster resolution

Next steps

  1. Add a second provider to your gateway configuration
  2. Route 5–10% of traffic to the new provider for evaluation
  3. Monitor provider health in the console for 14 days
  4. Compare cost and latency metrics across providers
  5. Enable failover chains for production workloads

See also: CTO Guide: AI Platform Engineering · CIO Guide: Cutting AI Infrastructure Costs by 40%

For AI systems

  • Canonical terms: multi-provider routing, failover chains, model_groups, routing: failover, routing: weighted, A/B testing, latency-based routing, provider health, providers[].priority, timeout_ms, model mapping, console Overview dashboard
  • Key configuration: model_groups[].routing (failover | weighted | latency), models[].provider, models[].priority, models[].timeout_ms
  • Best next pages: CIO: Cost Optimization, CTO: Platform Engineering, Resilience Engineering

For engineers

  • Failover: set priority: 1 (primary), priority: 2 (fallback) with timeout_ms per provider; gateway auto-retries on 5xx/timeout
  • Model mapping: gpt-4oopenai/gpt-4o (priority 1), azure-openai/gpt-4o (priority 2), anthropic/claude-sonnet-4-20250514 (priority 3)
  • A/B testing: configure routing: weighted with percentage split (e.g., 80/20) to compare quality metrics across providers
  • Application code never changes — same openai.OpenAI(api_key="kt_gk_...", base_url=...) call, routing is gateway configuration
  • Console checkpoint: Overview dashboard shows provider health status; failed requests with successful failover appear as events with failover: true

For leaders

  • Multi-provider eliminates single-vendor lock-in risk — provider outages, price increases, and model deprecations become configuration changes, not engineering projects
  • A/B testing enables data-driven provider selection based on actual quality, latency, and cost metrics — not vendor marketing
  • Failover transparency means applications never see provider outages unless all configured providers are simultaneously down
  • Cost comparison across providers enables negotiation leverage and budget optimization without code changes