Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Model Groups

Model groups let you define abstract model names that map to one or more concrete provider targets. Your application code sends a standard model field value (e.g., "production-llm" or "fast-llm"); the Keeptrusts gateway resolves that name to the appropriate backend without any client-side knowledge of which provider is actually serving the request. This decoupling lets you rotate providers, add capacity, or change models behind a feature flag without deploying application changes.

Use this page when

  • You need the exact command, config, API, or integration details for Model Groups.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Defining a Model Group

Model groups are declared under the top-level model_groups key. Each group has a name, a list of targets (provider IDs), optional aliases, and an optional fallback_group.

pack:
name: model-groups-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-gpt4o
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: azure-gpt4o
provider: azure:chat:gpt-4o
base_url: https://myorg.openai.azure.com/openai/deployments/gpt-4o
secret_key_ref:
env: AZURE_OPENAI_API_KEY
- id: anthropic-sonnet
provider: anthropic:chat:claude-3-5-sonnet-20241022
secret_key_ref:
env: ANTHROPIC_API_KEY
- id: groq-llama
provider: groq:chat:llama-3.3-70b-versatile
secret_key_ref:
env: GROQ_API_KEY
- id: cerebras-llama
provider: cerebras:chat:llama3.1-70b
secret_key_ref:
env: CEREBRAS_API_KEY
- id: openai-embed-small
provider: openai:embedding:text-embedding-3-small
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Field reference

FieldTypeRequiredDescription
namestringyesThe logical model name. Clients send this as the model field in API requests.
descriptionstringnoHuman-readable label shown in the Keeptrusts console.
targetslistyesOne or more provider IDs to include in the group.
targets[].idstringyesMust match a provider target id in the providers.targets list.
targets[].weightintegernoRelative routing weight. Defaults to 1. See Group Routing.
aliaseslist of stringsnoAlternative names that map to this group. See Aliases.
fallback_groupstringnoName of another model group to use when all targets in this group fail.

Group Routing

When a request arrives with model: "production-llm", the gateway resolves it to the production-llm group and distributes the request across the group's targets using the active provider_routing.strategy. The weight field on each target within the group controls proportional routing when strategy is weighted_round_robin.

provider_routing:
strategy: weighted_round_robin
fallback_enabled: true

model_groups:
- name: production-llm
targets:
- id: openai-gpt4o
weight: 7 # 70% of traffic
- id: azure-gpt4o
weight: 3 # 30% of traffic

You can also override the routing strategy at the group level using routing_strategy:

model_groups:
- name: fast-llm
routing_strategy: lowest_latency # override global strategy for this group
targets:
- id: groq-llama
weight: 1
- id: cerebras-llama
weight: 1

Each group's routing operates independently. production-llm may use weighted_round_robin while fast-llm uses lowest_latency — both coexist in the same config file.


Aliases

Aliases are alternative model names that transparently map to a group. Use aliases to present standard OpenAI model names to clients that already have model: "gpt-4" hardcoded, while routing those requests through your own provider pool.

model_groups:
- name: production-llm
aliases:
- gpt-4
- gpt-4o
- gpt-4-turbo
targets:
- id: openai-gpt4o
weight: 1

A client that sends model: "gpt-4" has its request transparently rewritten to target the production-llm group. The upstream model field is set to the actual provider's model ID before forwarding. The response model field is rewritten back to the alias the client originally requested, so application-level model-tracking remains consistent.

Multiple aliases per group

model_groups:
- name: fast-llm
aliases:
- gpt-3.5-turbo
- gpt-3.5-turbo-0125
- claude-haiku # map deprecated or custom names too
targets:
- id: groq-llama
weight: 1

- name: embeddings
aliases:
- text-embedding-ada-002 # legacy OpenAI embedding alias
- text-embedding-3-small
targets:
- id: openai-embed-small
weight: 1

Fallback Groups

When all targets within a group fail (connection errors, 5xx responses, rate limits that exhaust the retry budget), the gateway escalates to the fallback_group before returning an error to the client.

model_groups:
- name: production-llm
fallback_group: backup-llm
targets:
- id: openai-gpt4o
weight: 1

- name: backup-llm
fallback_group: last-resort-llm
targets:
- id: azure-gpt4o
weight: 1

- name: last-resort-llm
targets:
- id: anthropic-sonnet
weight: 1

This creates a three-tier cascading fallback chain: production-llmbackup-llmlast-resort-llm. Each group's retry and circuit-breaker policies are evaluated independently before escalating to the next group.

Fallback group with a cheaper model

A common pattern is to fall back to a lower-cost model when the primary pool is degraded, rather than returning an error:

pack:
name: model-groups-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: openai-gpt4o
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
- id: openai-gpt4o-mini
provider: openai:chat:gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Full Example: Multi-Tier Production Config

pack:
name: enterprise-model-groups
version: "2.0.0"
enabled: true

provider_routing:
strategy: weighted_round_robin
fallback_enabled: true
enable_pre_call_checks: false

model_groups:
- name: production-llm
description: "Primary GPT-4o pool with Azure failover"
aliases:
- gpt-4
- gpt-4o
- gpt-4-turbo
fallback_group: backup-llm
targets:
- id: openai-gpt4o
weight: 8
- id: azure-gpt4o
weight: 2

- name: backup-llm
description: "Anthropic fallback when OpenAI is degraded"
fallback_group: economy-llm
targets:
- id: anthropic-sonnet
weight: 1

- name: economy-llm
description: "Lowest-cost pool for graceful degradation"
targets:
- id: openai-gpt4o-mini
weight: 1

- name: fast-llm
description: "Inference-optimized pool for autocomplete"
aliases:
- gpt-3.5-turbo
- llama-fast
routing_strategy: lowest_latency
fallback_group: economy-llm
targets:
- id: groq-llama
weight: 6
- id: cerebras-llama
weight: 4

- name: embeddings
aliases:
- text-embedding-ada-002
- text-embedding-3-small
targets:
- id: openai-embed-small
weight: 1

policies:
- name: global-pii
rules:
- type: redact
entities: [email, phone, ssn]
phase: input

providers:
targets:
- id: openai-gpt4o
provider: "openai:chat:gpt-4o"
secret_key_ref:
env: OPENAI_API_KEY

- id: azure-gpt4o
provider: "azure:chat:gpt-4o"
secret_key_ref:
env: AZURE_OPENAI_API_KEY
base_url: "https://myorg.openai.azure.com/openai/deployments/gpt-4o"

- id: anthropic-sonnet
provider: "anthropic:chat:claude-3-5-sonnet-20241022"
secret_key_ref:
env: ANTHROPIC_API_KEY

- id: groq-llama
provider: "groq:chat:llama-3.3-70b-versatile"
secret_key_ref:
env: GROQ_API_KEY

- id: cerebras-llama
provider: "cerebras:chat:llama3.1-70b"
secret_key_ref:
env: CEREBRAS_API_KEY

- id: openai-gpt4o-mini
provider: "openai:chat:gpt-4o-mini"
secret_key_ref:
env: OPENAI_API_KEY

- id: openai-embed-small
provider: "openai:embedding:text-embedding-3-small"
secret_key_ref:
env: OPENAI_API_KEY

Dynamic Group Updates

Model group definitions are read from the config file at gateway startup. To update group membership, targets, or weights without restarting the gateway, reload the config at runtime:

kt gateway reload \
--name local-dev \
--gateway-url http://localhost:41002 \
--config-path policy-config.yaml

The gateway drains in-flight requests on the old configuration before applying the new group definitions. Zero-downtime reloads are guaranteed for group membership changes, weight adjustments, and alias additions. Removing a group that still has in-flight requests is allowed; those requests are completed against the old group before the group is torn down.

Watching for config changes

Use --watch to automatically reload the config whenever the file changes on disk:

kt gateway run --policy-config policy-config.yaml --watch

This is useful in Kubernetes environments where the config file is projected from a ConfigMap — changes to the ConfigMap are reflected in the mounted file, and the gateway reloads within a few seconds.


Observability

Every request processed through a model group emits structured event fields that identify the group, the resolved target, and the alias used (if any):

{
"event": "request.completed",
"model_group": "production-llm",
"resolved_target": "openai-gpt4o",
"model_alias": "gpt-4o",
"provider": "openai",
"model": "gpt-4o",
"latency_ms": 412,
"prompt_tokens": 841,
"completion_tokens": 256
}

Filter events by model_group in the Keeptrusts console to compare latency and token usage across groups, or to track how frequently a group's fallback is activated.

Metrics exposed by group

MetricDescription
model_group.requests_totalTotal requests processed per group.
model_group.fallback_activations_totalHow often the fallback group was triggered.
model_group.target_errors_totalErrors per target within the group.
model_group.alias_resolution_totalRequest count per alias name, labelled by the resolved group.

These metrics are emitted as OTLP gauges and counters and can be scraped by any OpenTelemetry-compatible collector.


Best Practices

  1. Name groups after their function, not their provider. production-llm is portable across provider changes; openai-gpt4o-group becomes misleading when you add Azure or Anthropic targets.

  2. Define at least one fallback group for every production group. Even if the fallback is a lower-tier model in the same group family, having a cascading chain prevents hard failures from reaching the client.

  3. Use aliases to match existing application model strings. Migrating from direct OpenAI calls to Keeptrusts is easiest when clients send model: "gpt-4o" and the gateway handles the resolution — no application code changes required.

  4. Keep weight values simple. Use single-digit integers (e.g., 3 and 7) rather than large numbers (e.g., 300 and 700). The gateway normalizes by sum, so the scale does not matter — only the ratio does.

  5. Override routing_strategy per group for different SLOs. A fast-llm group likely benefits from lowest_latency while a batch-llm group may prefer highest_throughput. Group-level overrides let you co-locate both patterns in one config.

  6. Validate your group resolution before deploying. Run kt policy lint --file policy-config.yaml to check that every targets[].id reference in every group resolves to a declared provider target, and that every fallback_group name resolves to a declared group.

For AI systems

  • Canonical terms: Keeptrusts Model Groups, model aliases, fallback group, group routing, virtual model names.
  • Config keys: model_groups[].name, model_groups[].targets[].id, model_groups[].targets[].weight, model_groups[].aliases, model_groups[].fallback_group, model_groups[].routing_strategy.
  • CLI commands: kt gateway reload (zero-downtime config reload), kt gateway run --watch (auto-reload on file change), kt policy lint --file policy-config.yaml (validate group references).
  • Event fields: model_group, resolved_target, model_alias.
  • Metrics: model_group.requests_total, model_group.fallback_activations_total, model_group.target_errors_total, model_group.alias_resolution_total.
  • Best next pages: Provider Routing, Provider Fallback, Circuit Breakers & Retry.

For engineers

  • Prerequisites: provider targets declared under providers.targets with matching IDs referenced by model_groups[].targets[].id.
  • Use kt policy lint --file policy-config.yaml to validate all target and fallback_group references resolve.
  • Define aliases to match existing application model strings (e.g., gpt-4o) for zero-code migration.
  • Group-level routing_strategy overrides the global provider_routing.strategy — use lowest_latency for real-time groups and highest_throughput for batch groups.
  • Hot-reload: kt gateway reload --config-path policy-config.yaml applies weight and group membership changes without dropping in-flight requests.
  • Monitor: filter Events by model_group to compare latency and fallback rates across groups.

For leaders

  • Provider abstraction: model groups decouple application code from provider choices — rotate providers or add capacity without any application deployment.
  • Resilience: cascading fallback_group chains (premium → standard → economy) ensure graceful degradation rather than hard failures.
  • Cost management: weighted routing across premium/economy providers lets you control spend while maintaining quality SLOs.
  • Zero-downtime model rotation: weight adjustments and alias additions are applied via config reload without gateway restarts.

Next steps