Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Advanced Provider Features

This page covers the provider-level features that sit above individual targets: traffic mirroring, A/B testing, model groups, shared scope limits, logging controls, context compression, and zero-completion handling.

Use this page when

  • You need the exact command, config, API, or integration details for Advanced Provider Features.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Traffic mirroring

Mirror a percentage of production traffic to a secondary target for comparison or audit without changing the caller-visible response.

pack:
name: config-advanced-provider-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-shadow
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
FieldTypeDefaultDescription
enabledbooleanfalseEnable or disable mirroring
mirror_targetstringTarget ID that receives mirrored requests
sample_ratenumber1.0Fraction of traffic to mirror
log_mirror_responsebooleanfalseRecord the mirrored response for analysis
timeout_msinteger5000Independent timeout for mirror calls

A/B testing

Split traffic between provider variants with sticky assignment.

providers:
ab_test:
enabled: true
sticky_by: user_id
variants:
- provider_id: openai-prod
weight: 80
- provider_id: anthropic-shadow
weight: 20
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
FieldTypeDefaultDescription
enabledbooleanfalseEnable or disable the experiment
sticky_bystringrandomrandom, user_id, or key_id
variants[].provider_idstringTarget ID used for that variant
variants[].weightintegerRelative traffic weight

Model groups

Model groups create virtual model families over concrete provider targets.

providers:
model_groups:
- name: fast
aliases:
- default-fast
description: Low-latency conversational targets
targets:
- openai-mini
- anthropic-haiku
fallback_group: standard
- name: standard
targets:
- openai-prod
- anthropic-shadow
FieldTypeDescription
namestringGroup name callers reference
aliasesstring[]Optional alternate names
descriptionstringHuman-readable description
targetsstring[]Concrete target IDs in the group
fallback_groupstringGroup to try if all local targets fail

Shared provider scope limits

These limits live under providers.scope_rate_limits, not inside each target.

providers:
scope_rate_limits:
per_key:
rpm: 100
tpm: 50000
per_user:
rpm: 20
tpm: 10000
per_team:
rpm: 300
tpm: 150000
global:
rpm: 1000
tpm: 500000
max_parallel_requests: 50
FieldTypeDescription
per_key.rpm / per_key.tpmintegerPer API key limits
per_user.rpm / per_user.tpmintegerPer user limits
per_team.rpm / per_team.tpmintegerPer team limits
global.rpm / global.tpmintegerProvider-wide limits
max_parallel_requestsintegerProvider-wide concurrency cap

Logging controls

providers:
logging:
redact_message_bodies: true
redact_api_keys: true
FieldTypeDefaultDescription
redact_message_bodiesbooleanfalseRemove request and response bodies from provider logs and callback payloads
redact_api_keysbooleanfalseRemove credential values from logs

Context compression

Reduce large conversations before provider dispatch.

providers:
context_compression:
enabled: true
strategy: middle_out
preserve_system_message: true
preserve_first_n: 1
preserve_last_n: 4
max_messages: 20
message_compression_strategy: halves
FieldTypeDefaultDescription
enabledbooleanfalseEnable compression
strategystringmiddle_out or oldest_first
preserve_system_messagebooleanAlways keep the system message
preserve_first_nintegerKeep the first N messages
preserve_last_nintegerKeep the last N messages
max_messagesintegerTarget conversation size after trimming
message_compression_strategystringCurrent supported value is halves

Zero completion insurance

Handle empty or zero-token completions from upstream providers.

providers:
zero_completion_insurance:
enabled: true
conditions:
- empty_content
action: retry
retry_with_fallback: true
FieldTypeDefaultDescription
enabledbooleantrueEnable insurance handling
conditionsstring[]Conditions that count as zero-completion events
actionstringsuppress_billing, retry, or log_only
retry_with_fallbackbooleanUse configured fallback logic on retry

Complete example

pack:
name: advanced-provider-controls
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-mini
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-shadow
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
routing:
strategy: weighted_round_robin
traffic_mirror:
enabled: true
mirror_target: anthropic-shadow
sample_rate: 0.05
ab_test:
enabled: true
sticky_by: user_id
variants:
- provider_id: openai-prod
weight: 70
- provider_id: openai-mini
weight: 30
model_groups:
- name: standard
targets:
- openai-prod
- anthropic-shadow
- name: fast
targets:
- openai-mini
fallback_group: standard
scope_rate_limits:
per_key:
rpm: 100
tpm: 50000
global:
rpm: 1000
tpm: 500000
max_parallel_requests: 50
logging:
redact_api_keys: true
context_compression:
enabled: true
strategy: middle_out
preserve_system_message: true
preserve_last_n: 4
max_messages: 20
message_compression_strategy: halves
zero_completion_insurance:
enabled: true
conditions:
- empty_content
action: retry
retry_with_fallback: true
policies:
chain:
- prompt-injection
- audit-logger

For AI systems

  • Canonical terms: Keeptrusts, policy-config.yaml, providers.traffic_mirror, providers.ab_test, providers.model_groups, providers.scope_rate_limits, providers.logging, providers.context_compression, providers.zero_completion_insurance.
  • These features live under providers: in the YAML, above individual target definitions.
  • Best next pages: Providers Configuration, Cloud Provider Configuration, Rate Limits Configuration.

For engineers

  • Traffic mirroring sends a copy of production traffic to a shadow target (e.g., for model evaluation) — the caller-visible response is always from the primary target.
  • A/B testing with sticky_by: user_id ensures consistent variant assignment per user. Use random for request-level randomization.
  • Model groups create virtual families; callers reference the group name, and the gateway resolves to concrete targets with fallback.
  • Context compression (middle_out strategy) reduces token usage on long conversations while preserving system messages and recent context.
  • Validate all advanced provider features with kt policy lint --file policy-config.yaml.

For leaders

  • Traffic mirroring enables model comparison and evaluation without impacting production traffic or user experience.
  • A/B testing supports data-driven model selection by measuring performance differences across provider variants.
  • Scope-level rate limits (per-key, per-user, per-team, global) provide cost containment and fair-use enforcement at multiple organizational levels.
  • Context compression reduces per-request token costs for long conversations, directly lowering LLM spend.
  • Zero-completion insurance prevents wasted billing on empty provider responses.

Next steps