Model Groups

Model groups let you define abstract model names that map to one or more concrete provider targets. Your application code sends a standard model field value (e.g., "production-llm" or "fast-llm"); the Keeptrusts gateway resolves that name to the appropriate backend without any client-side knowledge of which provider is actually serving the request. This decoupling lets you rotate providers, add capacity, or change models behind a feature flag without deploying application changes.

Use this page when

You need the exact command, config, API, or integration details for Model Groups.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Defining a Model Group

Model groups are declared under the top-level model_groups key. Each group has a name, a list of targets (provider IDs), optional aliases, and an optional fallback_group.

pack:
  name: model-groups-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-gpt4o
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: azure-gpt4o
    provider: azure:chat:gpt-4o
    base_url: https://myorg.openai.azure.com/openai/deployments/gpt-4o
    secret_key_ref:
      env: AZURE_OPENAI_API_KEY
  - id: anthropic-sonnet
    provider: anthropic:chat:claude-3-5-sonnet-20241022
    secret_key_ref:
      env: ANTHROPIC_API_KEY
  - id: groq-llama
    provider: groq:chat:llama-3.3-70b-versatile
    secret_key_ref:
      env: GROQ_API_KEY
  - id: cerebras-llama
    provider: cerebras:chat:llama3.1-70b
    secret_key_ref:
      env: CEREBRAS_API_KEY
  - id: openai-embed-small
    provider: openai:embedding:text-embedding-3-small
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Field reference

Field	Type	Required	Description
`name`	string	yes	The logical model name. Clients send this as the `model` field in API requests.
`description`	string	no	Human-readable label shown in the Keeptrusts console.
`targets`	list	yes	One or more provider IDs to include in the group.
`targets[].id`	string	yes	Must match a provider target `id` in the `providers.targets` list.
`targets[].weight`	integer	no	Relative routing weight. Defaults to `1`. See Group Routing.
`aliases`	list of strings	no	Alternative names that map to this group. See Aliases.
`fallback_group`	string	no	Name of another model group to use when all targets in this group fail.

Group Routing

When a request arrives with model: "production-llm", the gateway resolves it to the production-llm group and distributes the request across the group's targets using the active provider_routing.strategy. The weight field on each target within the group controls proportional routing when strategy is weighted_round_robin.

provider_routing:
  strategy: weighted_round_robin
  fallback_enabled: true

model_groups:
  - name: production-llm
    targets:
      - id: openai-gpt4o
        weight: 7   # 70% of traffic
      - id: azure-gpt4o
        weight: 3   # 30% of traffic

You can also override the routing strategy at the group level using routing_strategy:

model_groups:
  - name: fast-llm
    routing_strategy: lowest_latency   # override global strategy for this group
    targets:
      - id: groq-llama
        weight: 1
      - id: cerebras-llama
        weight: 1

Each group's routing operates independently. production-llm may use weighted_round_robin while fast-llm uses lowest_latency — both coexist in the same config file.

Aliases

Aliases are alternative model names that transparently map to a group. Use aliases to present standard OpenAI model names to clients that already have model: "gpt-4" hardcoded, while routing those requests through your own provider pool.

model_groups:
  - name: production-llm
    aliases:
      - gpt-4
      - gpt-4o
      - gpt-4-turbo
    targets:
      - id: openai-gpt4o
        weight: 1

A client that sends model: "gpt-4" has its request transparently rewritten to target the production-llm group. The upstream model field is set to the actual provider's model ID before forwarding. The response model field is rewritten back to the alias the client originally requested, so application-level model-tracking remains consistent.

Multiple aliases per group

model_groups:
  - name: fast-llm
    aliases:
      - gpt-3.5-turbo
      - gpt-3.5-turbo-0125
      - claude-haiku          # map deprecated or custom names too
    targets:
      - id: groq-llama
        weight: 1

  - name: embeddings
    aliases:
      - text-embedding-ada-002       # legacy OpenAI embedding alias
      - text-embedding-3-small
    targets:
      - id: openai-embed-small
        weight: 1

Fallback Groups

When all targets within a group fail (connection errors, 5xx responses, rate limits that exhaust the retry budget), the gateway escalates to the fallback_group before returning an error to the client.

model_groups:
  - name: production-llm
    fallback_group: backup-llm
    targets:
      - id: openai-gpt4o
        weight: 1

  - name: backup-llm
    fallback_group: last-resort-llm
    targets:
      - id: azure-gpt4o
        weight: 1

  - name: last-resort-llm
    targets:
      - id: anthropic-sonnet
        weight: 1

This creates a three-tier cascading fallback chain: production-llm → backup-llm → last-resort-llm. Each group's retry and circuit-breaker policies are evaluated independently before escalating to the next group.

Fallback group with a cheaper model

A common pattern is to fall back to a lower-cost model when the primary pool is degraded, rather than returning an error:

pack:
  name: model-groups-providers-7
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: openai-gpt4o
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
  - id: openai-gpt4o-mini
    provider: openai:chat:gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Full Example: Multi-Tier Production Config

pack:
  name: enterprise-model-groups
  version: "2.0.0"
  enabled: true

provider_routing:
  strategy: weighted_round_robin
  fallback_enabled: true
  enable_pre_call_checks: false

model_groups:
  - name: production-llm
    description: "Primary GPT-4o pool with Azure failover"
    aliases:
      - gpt-4
      - gpt-4o
      - gpt-4-turbo
    fallback_group: backup-llm
    targets:
      - id: openai-gpt4o
        weight: 8
      - id: azure-gpt4o
        weight: 2

  - name: backup-llm
    description: "Anthropic fallback when OpenAI is degraded"
    fallback_group: economy-llm
    targets:
      - id: anthropic-sonnet
        weight: 1

  - name: economy-llm
    description: "Lowest-cost pool for graceful degradation"
    targets:
      - id: openai-gpt4o-mini
        weight: 1

  - name: fast-llm
    description: "Inference-optimized pool for autocomplete"
    aliases:
      - gpt-3.5-turbo
      - llama-fast
    routing_strategy: lowest_latency
    fallback_group: economy-llm
    targets:
      - id: groq-llama
        weight: 6
      - id: cerebras-llama
        weight: 4

  - name: embeddings
    aliases:
      - text-embedding-ada-002
      - text-embedding-3-small
    targets:
      - id: openai-embed-small
        weight: 1

policies:
  - name: global-pii
    rules:
      - type: redact
        entities: [email, phone, ssn]
        phase: input

providers:
  targets:
    - id: openai-gpt4o
      provider: "openai:chat:gpt-4o"
      secret_key_ref:
        env: OPENAI_API_KEY

    - id: azure-gpt4o
      provider: "azure:chat:gpt-4o"
      secret_key_ref:
        env: AZURE_OPENAI_API_KEY
      base_url: "https://myorg.openai.azure.com/openai/deployments/gpt-4o"

    - id: anthropic-sonnet
      provider: "anthropic:chat:claude-3-5-sonnet-20241022"
      secret_key_ref:
        env: ANTHROPIC_API_KEY

    - id: groq-llama
      provider: "groq:chat:llama-3.3-70b-versatile"
      secret_key_ref:
        env: GROQ_API_KEY

    - id: cerebras-llama
      provider: "cerebras:chat:llama3.1-70b"
      secret_key_ref:
        env: CEREBRAS_API_KEY

    - id: openai-gpt4o-mini
      provider: "openai:chat:gpt-4o-mini"
      secret_key_ref:
        env: OPENAI_API_KEY

    - id: openai-embed-small
      provider: "openai:embedding:text-embedding-3-small"
      secret_key_ref:
        env: OPENAI_API_KEY

Dynamic Group Updates

Model group definitions are read from the config file at gateway startup. To update group membership, targets, or weights without restarting the gateway, reload the config at runtime:

kt gateway reload \
  --name local-dev \
  --gateway-url http://localhost:41002 \
  --config-path policy-config.yaml

The gateway drains in-flight requests on the old configuration before applying the new group definitions. Zero-downtime reloads are guaranteed for group membership changes, weight adjustments, and alias additions. Removing a group that still has in-flight requests is allowed; those requests are completed against the old group before the group is torn down.

Watching for config changes

Use --watch to automatically reload the config whenever the file changes on disk:

kt gateway run --policy-config policy-config.yaml --watch

This is useful in Kubernetes environments where the config file is projected from a ConfigMap — changes to the ConfigMap are reflected in the mounted file, and the gateway reloads within a few seconds.

Observability

Every request processed through a model group emits structured event fields that identify the group, the resolved target, and the alias used (if any):

{
  "event": "request.completed",
  "model_group": "production-llm",
  "resolved_target": "openai-gpt4o",
  "model_alias": "gpt-4o",
  "provider": "openai",
  "model": "gpt-4o",
  "latency_ms": 412,
  "prompt_tokens": 841,
  "completion_tokens": 256
}

Filter events by model_group in the Keeptrusts console to compare latency and token usage across groups, or to track how frequently a group's fallback is activated.

Metrics exposed by group

Metric	Description
`model_group.requests_total`	Total requests processed per group.
`model_group.fallback_activations_total`	How often the fallback group was triggered.
`model_group.target_errors_total`	Errors per target within the group.
`model_group.alias_resolution_total`	Request count per alias name, labelled by the resolved group.

These metrics are emitted as OTLP gauges and counters and can be scraped by any OpenTelemetry-compatible collector.

Best Practices

Name groups after their function, not their provider. production-llm is portable across provider changes; openai-gpt4o-group becomes misleading when you add Azure or Anthropic targets.
Define at least one fallback group for every production group. Even if the fallback is a lower-tier model in the same group family, having a cascading chain prevents hard failures from reaching the client.
Use aliases to match existing application model strings. Migrating from direct OpenAI calls to Keeptrusts is easiest when clients send model: "gpt-4o" and the gateway handles the resolution — no application code changes required.
Keep weight values simple. Use single-digit integers (e.g., 3 and 7) rather than large numbers (e.g., 300 and 700). The gateway normalizes by sum, so the scale does not matter — only the ratio does.
Override routing_strategy per group for different SLOs. A fast-llm group likely benefits from lowest_latency while a batch-llm group may prefer highest_throughput. Group-level overrides let you co-locate both patterns in one config.
Validate your group resolution before deploying. Run kt policy lint --file policy-config.yaml to check that every targets[].id reference in every group resolves to a declared provider target, and that every fallback_group name resolves to a declared group.

For AI systems

Canonical terms: Keeptrusts Model Groups, model aliases, fallback group, group routing, virtual model names.
Config keys: model_groups[].name, model_groups[].targets[].id, model_groups[].targets[].weight, model_groups[].aliases, model_groups[].fallback_group, model_groups[].routing_strategy.
CLI commands: kt gateway reload (zero-downtime config reload), kt gateway run --watch (auto-reload on file change), kt policy lint --file policy-config.yaml (validate group references).
Event fields: model_group, resolved_target, model_alias.
Metrics: model_group.requests_total, model_group.fallback_activations_total, model_group.target_errors_total, model_group.alias_resolution_total.
Best next pages: Provider Routing, Provider Fallback, Circuit Breakers & Retry.

For engineers

Prerequisites: provider targets declared under providers.targets with matching IDs referenced by model_groups[].targets[].id.
Use kt policy lint --file policy-config.yaml to validate all target and fallback_group references resolve.
Define aliases to match existing application model strings (e.g., gpt-4o) for zero-code migration.
Group-level routing_strategy overrides the global provider_routing.strategy — use lowest_latency for real-time groups and highest_throughput for batch groups.
Hot-reload: kt gateway reload --config-path policy-config.yaml applies weight and group membership changes without dropping in-flight requests.
Monitor: filter Events by model_group to compare latency and fallback rates across groups.

For leaders

Provider abstraction: model groups decouple application code from provider choices — rotate providers or add capacity without any application deployment.
Resilience: cascading fallback_group chains (premium → standard → economy) ensure graceful degradation rather than hard failures.
Cost management: weighted routing across premium/economy providers lets you control spend while maintaining quality SLOs.
Zero-downtime model rotation: weight adjustments and alias additions are applied via config reload without gateway restarts.

Next steps

Provider Routing — 11 routing strategies for distributing traffic within groups
Provider Fallback — trigger-specific fallback across providers
Circuit Breakers & Retry — per-target resilience within a group
Consumer Groups — route different teams to different model groups via upstream

Use this page when​

Primary audience​

Defining a Model Group​

Field reference​

Group Routing​

Aliases​

Multiple aliases per group​

Fallback Groups​

Fallback group with a cheaper model​

Full Example: Multi-Tier Production Config​

Dynamic Group Updates​

Watching for config changes​

Observability​

Metrics exposed by group​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​