Advanced Provider Features
This page covers the provider-level features that sit above individual targets: traffic mirroring, A/B testing, model groups, shared scope limits, logging controls, context compression, and zero-completion handling.
Use this page when
- You need the exact command, config, API, or integration details for Advanced Provider Features.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Traffic mirroring
Mirror a percentage of production traffic to a secondary target for comparison or audit without changing the caller-visible response.
pack:
name: config-advanced-provider-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-shadow
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable or disable mirroring |
mirror_target | string | — | Target ID that receives mirrored requests |
sample_rate | number | 1.0 | Fraction of traffic to mirror |
log_mirror_response | boolean | false | Record the mirrored response for analysis |
timeout_ms | integer | 5000 | Independent timeout for mirror calls |
A/B testing
Split traffic between provider variants with sticky assignment.
providers:
ab_test:
enabled: true
sticky_by: user_id
variants:
- provider_id: openai-prod
weight: 80
- provider_id: anthropic-shadow
weight: 20
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable or disable the experiment |
sticky_by | string | random | random, user_id, or key_id |
variants[].provider_id | string | — | Target ID used for that variant |
variants[].weight | integer | — | Relative traffic weight |
Model groups
Model groups create virtual model families over concrete provider targets.
providers:
model_groups:
- name: fast
aliases:
- default-fast
description: Low-latency conversational targets
targets:
- openai-mini
- anthropic-haiku
fallback_group: standard
- name: standard
targets:
- openai-prod
- anthropic-shadow
| Field | Type | Description |
|---|---|---|
name | string | Group name callers reference |
aliases | string[] | Optional alternate names |
description | string | Human-readable description |
targets | string[] | Concrete target IDs in the group |
fallback_group | string | Group to try if all local targets fail |
Shared provider scope limits
These limits live under providers.scope_rate_limits, not inside each target.
providers:
scope_rate_limits:
per_key:
rpm: 100
tpm: 50000
per_user:
rpm: 20
tpm: 10000
per_team:
rpm: 300
tpm: 150000
global:
rpm: 1000
tpm: 500000
max_parallel_requests: 50
| Field | Type | Description |
|---|---|---|
per_key.rpm / per_key.tpm | integer | Per API key limits |
per_user.rpm / per_user.tpm | integer | Per user limits |
per_team.rpm / per_team.tpm | integer | Per team limits |
global.rpm / global.tpm | integer | Provider-wide limits |
max_parallel_requests | integer | Provider-wide concurrency cap |
Logging controls
providers:
logging:
redact_message_bodies: true
redact_api_keys: true
| Field | Type | Default | Description |
|---|---|---|---|
redact_message_bodies | boolean | false | Remove request and response bodies from provider logs and callback payloads |
redact_api_keys | boolean | false | Remove credential values from logs |
Context compression
Reduce large conversations before provider dispatch.
providers:
context_compression:
enabled: true
strategy: middle_out
preserve_system_message: true
preserve_first_n: 1
preserve_last_n: 4
max_messages: 20
message_compression_strategy: halves
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable compression |
strategy | string | — | middle_out or oldest_first |
preserve_system_message | boolean | — | Always keep the system message |
preserve_first_n | integer | — | Keep the first N messages |
preserve_last_n | integer | — | Keep the last N messages |
max_messages | integer | — | Target conversation size after trimming |
message_compression_strategy | string | — | Current supported value is halves |
Zero completion insurance
Handle empty or zero-token completions from upstream providers.
providers:
zero_completion_insurance:
enabled: true
conditions:
- empty_content
action: retry
retry_with_fallback: true
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable insurance handling |
conditions | string[] | — | Conditions that count as zero-completion events |
action | string | — | suppress_billing, retry, or log_only |
retry_with_fallback | boolean | — | Use configured fallback logic on retry |
Complete example
pack:
name: advanced-provider-controls
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-prod
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-mini
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
- id: anthropic-shadow
provider: anthropic
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_API_KEY
routing:
strategy: weighted_round_robin
traffic_mirror:
enabled: true
mirror_target: anthropic-shadow
sample_rate: 0.05
ab_test:
enabled: true
sticky_by: user_id
variants:
- provider_id: openai-prod
weight: 70
- provider_id: openai-mini
weight: 30
model_groups:
- name: standard
targets:
- openai-prod
- anthropic-shadow
- name: fast
targets:
- openai-mini
fallback_group: standard
scope_rate_limits:
per_key:
rpm: 100
tpm: 50000
global:
rpm: 1000
tpm: 500000
max_parallel_requests: 50
logging:
redact_api_keys: true
context_compression:
enabled: true
strategy: middle_out
preserve_system_message: true
preserve_last_n: 4
max_messages: 20
message_compression_strategy: halves
zero_completion_insurance:
enabled: true
conditions:
- empty_content
action: retry
retry_with_fallback: true
policies:
chain:
- prompt-injection
- audit-logger
For AI systems
- Canonical terms: Keeptrusts, policy-config.yaml, providers.traffic_mirror, providers.ab_test, providers.model_groups, providers.scope_rate_limits, providers.logging, providers.context_compression, providers.zero_completion_insurance.
- These features live under
providers:in the YAML, above individual target definitions. - Best next pages: Providers Configuration, Cloud Provider Configuration, Rate Limits Configuration.
For engineers
- Traffic mirroring sends a copy of production traffic to a shadow target (e.g., for model evaluation) — the caller-visible response is always from the primary target.
- A/B testing with
sticky_by: user_idensures consistent variant assignment per user. Userandomfor request-level randomization. - Model groups create virtual families; callers reference the group name, and the gateway resolves to concrete targets with fallback.
- Context compression (
middle_outstrategy) reduces token usage on long conversations while preserving system messages and recent context. - Validate all advanced provider features with
kt policy lint --file policy-config.yaml.
For leaders
- Traffic mirroring enables model comparison and evaluation without impacting production traffic or user experience.
- A/B testing supports data-driven model selection by measuring performance differences across provider variants.
- Scope-level rate limits (per-key, per-user, per-team, global) provide cost containment and fair-use enforcement at multiple organizational levels.
- Context compression reduces per-request token costs for long conversations, directly lowering LLM spend.
- Zero-completion insurance prevents wasted billing on empty provider responses.
Next steps
- Providers Configuration — routing strategies and fallback logic
- Cloud Provider Configuration — cloud-specific target fields
- Rate Limits Configuration — distributed rate limiting