Advanced Provider Features

This page covers the provider-level features that sit above individual targets: traffic mirroring, A/B testing, model groups, shared scope limits, logging controls, context compression, and zero-completion handling.

Use this page when

You need the exact command, config, API, or integration details for Advanced Provider Features.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Traffic mirroring

Mirror a percentage of production traffic to a secondary target for comparison or audit without changing the caller-visible response.

pack:
  name: config-advanced-provider-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-prod
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-shadow
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Field	Type	Default	Description
`enabled`	boolean	`false`	Enable or disable mirroring
`mirror_target`	string	—	Target ID that receives mirrored requests
`sample_rate`	number	`1.0`	Fraction of traffic to mirror
`log_mirror_response`	boolean	`false`	Record the mirrored response for analysis
`timeout_ms`	integer	`5000`	Independent timeout for mirror calls

A/B testing

Split traffic between provider variants with sticky assignment.

providers:
  ab_test:
    enabled: true
    sticky_by: user_id
    variants:
    - provider_id: openai-prod
      weight: 80
    - provider_id: anthropic-shadow
      weight: 20
  targets:
  - id: openai-primary
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY

Field	Type	Default	Description
`enabled`	boolean	`false`	Enable or disable the experiment
`sticky_by`	string	`random`	`random`, `user_id`, or `key_id`
`variants[].provider_id`	string	—	Target ID used for that variant
`variants[].weight`	integer	—	Relative traffic weight

Model groups

Model groups create virtual model families over concrete provider targets.

providers:
  model_groups:
    - name: fast
      aliases:
        - default-fast
      description: Low-latency conversational targets
      targets:
        - openai-mini
        - anthropic-haiku
      fallback_group: standard
    - name: standard
      targets:
        - openai-prod
        - anthropic-shadow

Field	Type	Description
`name`	string	Group name callers reference
`aliases`	string[]	Optional alternate names
`description`	string	Human-readable description
`targets`	string[]	Concrete target IDs in the group
`fallback_group`	string	Group to try if all local targets fail

Shared provider scope limits

These limits live under providers.scope_rate_limits, not inside each target.

providers:
  scope_rate_limits:
    per_key:
      rpm: 100
      tpm: 50000
    per_user:
      rpm: 20
      tpm: 10000
    per_team:
      rpm: 300
      tpm: 150000
    global:
      rpm: 1000
      tpm: 500000
    max_parallel_requests: 50

Field	Type	Description
`per_key.rpm` / `per_key.tpm`	integer	Per API key limits
`per_user.rpm` / `per_user.tpm`	integer	Per user limits
`per_team.rpm` / `per_team.tpm`	integer	Per team limits
`global.rpm` / `global.tpm`	integer	Provider-wide limits
`max_parallel_requests`	integer	Provider-wide concurrency cap

Logging controls

providers:
  logging:
    redact_message_bodies: true
    redact_api_keys: true

Field	Type	Default	Description
`redact_message_bodies`	boolean	`false`	Remove request and response bodies from provider logs and callback payloads
`redact_api_keys`	boolean	`false`	Remove credential values from logs

Context compression

Reduce large conversations before provider dispatch.

providers:
  context_compression:
    enabled: true
    strategy: middle_out
    preserve_system_message: true
    preserve_first_n: 1
    preserve_last_n: 4
    max_messages: 20
    message_compression_strategy: halves

Field	Type	Default	Description
`enabled`	boolean	`false`	Enable compression
`strategy`	string	—	`middle_out` or `oldest_first`
`preserve_system_message`	boolean	—	Always keep the system message
`preserve_first_n`	integer	—	Keep the first N messages
`preserve_last_n`	integer	—	Keep the last N messages
`max_messages`	integer	—	Target conversation size after trimming
`message_compression_strategy`	string	—	Current supported value is `halves`

Zero completion insurance

Handle empty or zero-token completions from upstream providers.

providers:
  zero_completion_insurance:
    enabled: true
    conditions:
      - empty_content
    action: retry
    retry_with_fallback: true

Field	Type	Default	Description
`enabled`	boolean	`true`	Enable insurance handling
`conditions`	string[]	—	Conditions that count as zero-completion events
`action`	string	—	`suppress_billing`, `retry`, or `log_only`
`retry_with_fallback`	boolean	—	Use configured fallback logic on retry

Complete example

pack:
  name: advanced-provider-controls
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-prod
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: openai-mini
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: anthropic-shadow
    provider: anthropic
    model: claude-sonnet-4-20250514
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  routing:
    strategy: weighted_round_robin
  traffic_mirror:
    enabled: true
    mirror_target: anthropic-shadow
    sample_rate: 0.05
  ab_test:
    enabled: true
    sticky_by: user_id
    variants:
    - provider_id: openai-prod
      weight: 70
    - provider_id: openai-mini
      weight: 30
  model_groups:
  - name: standard
    targets:
    - openai-prod
    - anthropic-shadow
  - name: fast
    targets:
    - openai-mini
    fallback_group: standard
  scope_rate_limits:
    per_key:
      rpm: 100
      tpm: 50000
    global:
      rpm: 1000
      tpm: 500000
    max_parallel_requests: 50
  logging:
    redact_api_keys: true
  context_compression:
    enabled: true
    strategy: middle_out
    preserve_system_message: true
    preserve_last_n: 4
    max_messages: 20
    message_compression_strategy: halves
  zero_completion_insurance:
    enabled: true
    conditions:
    - empty_content
    action: retry
    retry_with_fallback: true
policies:
  chain:
  - prompt-injection
  - audit-logger

For AI systems

Canonical terms: Keeptrusts, policy-config.yaml, providers.traffic_mirror, providers.ab_test, providers.model_groups, providers.scope_rate_limits, providers.logging, providers.context_compression, providers.zero_completion_insurance.
These features live under providers: in the YAML, above individual target definitions.
Best next pages: Providers Configuration, Cloud Provider Configuration, Rate Limits Configuration.

For engineers

Traffic mirroring sends a copy of production traffic to a shadow target (e.g., for model evaluation) — the caller-visible response is always from the primary target.
A/B testing with sticky_by: user_id ensures consistent variant assignment per user. Use random for request-level randomization.
Model groups create virtual families; callers reference the group name, and the gateway resolves to concrete targets with fallback.
Context compression (middle_out strategy) reduces token usage on long conversations while preserving system messages and recent context.
Validate all advanced provider features with kt policy lint --file policy-config.yaml.

For leaders

Traffic mirroring enables model comparison and evaluation without impacting production traffic or user experience.
A/B testing supports data-driven model selection by measuring performance differences across provider variants.
Scope-level rate limits (per-key, per-user, per-team, global) provide cost containment and fair-use enforcement at multiple organizational levels.
Context compression reduces per-request token costs for long conversations, directly lowering LLM spend.
Zero-completion insurance prevents wasted billing on empty provider responses.

Next steps

Providers Configuration — routing strategies and fallback logic
Cloud Provider Configuration — cloud-specific target fields
Rate Limits Configuration — distributed rate limiting

Use this page when​

Primary audience​

Traffic mirroring​

A/B testing​

Model groups​

Shared provider scope limits​

Logging controls​

Context compression​

Zero completion insurance​

Complete example​

For AI systems​

For engineers​

For leaders​

Next steps​