Mistral AI

Mistral AI builds high-performance large language models with strong multilingual capabilities and efficient mixture-of-experts (MoE) architectures. Their inference API is fully OpenAI-compatible, so Keeptrusts needs no format translation — requests and responses flow through the gateway in native OpenAI wire format, and any OpenAI SDK client can be pointed at the gateway with zero code changes.

Use this page when

You need the exact command, config, API, or integration details for Mistral AI.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Keeptrusts sits between your application and Mistral's API endpoint, enforcing policy chains — prompt-injection detection, PII redaction, safety filters, content-quality scoring, audit logging — on every request and response without requiring application-side changes.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

Mistral API key — obtain one from La Plateforme.
Keeptrusts CLI — install kt (quickstart guide).
Export your API key so the gateway can read it at startup:

export MISTRAL_API_KEY="your-mistral-api-key"

When the provider field is set to "mistral", Keeptrusts auto-detects both the base URL (https://api.mistral.ai/v1) and the API key environment variable (MISTRAL_API_KEY). You only need to override these if you use a custom deployment, a self-hosted Mistral endpoint, or a non-standard env-var name.

Configuration

A minimal policy-config.yaml that routes traffic through Mistral with prompt-injection, PII, and safety policies:

pack:
  name: mistral-gateway
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - safety-filter
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
  safety-filter:
    mode: strict
    action: block
  audit-logger:
    retention_days: 365
providers:
  strategy: single
  targets:
  - id: mistral-large
    provider: mistral
    model: mistral-large-latest
    base_url: https://api.mistral.ai/v1
    secret_key_ref:
      env: MISTRAL_API_KEY

Start the gateway:

kt gateway run \
  --listen 0.0.0.0:41002 \
  --policy-config policy-config.yaml

Compact Provider Shorthand

You can encode the model directly in the provider field. The two forms below are equivalent:

# Shorthand — model embedded in the provider string
- id: "mistral-large"
  provider: "mistral:chat:mistral-large-latest"

# Explicit — separate provider and model fields
- id: "mistral-large"
  provider: "mistral"
  model: "mistral-large-latest"

The shorthand form is convenient for quick configurations. The explicit form is recommended when you need to set additional fields like pricing or health_probe.

Provider Fields

All fields available on a providers.targets[] entry for Mistral AI:

Field	Type	Default	Description
`id`	string	required	Unique identifier for this target. Used in logs, the console dashboard, and routing decisions.
`provider`	string	required	Provider ID. Use `"mistral"` or the shorthand `"mistral:chat:<model>"`.
`model`	string	required	Model name, e.g. `"mistral-large-latest"`. Passed through to the upstream API as-is.
`base_url`	string	`https://api.mistral.ai/v1`	API base URL. Auto-detected when provider is `"mistral"`. Override for self-hosted or VPC endpoints.
`secret_key_ref`	object	`MISTRAL_API_KEY`	Object reference to the environment variable holding the API key. Auto-detected for the `"mistral"` provider.
`timeout_seconds`	integer	`60`	Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error.
`stream_timeout_seconds`	integer	inherits `timeout_seconds`	Maximum wall-clock time for streaming requests. Falls back to `timeout_seconds` if not set. Set this higher than `timeout_seconds` for long-running streamed generations.
`max_context_tokens`	integer	none	Maximum token budget for the request (prompt + completion). When set, the gateway rejects requests that exceed this limit before forwarding to the upstream.
`format`	string	`"openai"`	Wire format. Mistral is natively OpenAI-compatible, so this is always `"openai"`.
`provider_type`	string	auto	Explicit provider-type override. Rarely needed — auto-detection handles Mistral correctly.
`description`	string	none	Human-readable label shown in the console dashboard, logs, and health-check output.
`weight`	float	`1.0`	Routing weight used by the `weighted_round_robin` strategy. Higher values receive proportionally more traffic.
`pricing`	object	none	Token pricing in USD per 1M tokens. Fields: `prompt` (input cost), `completion` (output cost). Displayed in the console cost dashboard.
`health_probe`	object	none	Active health probe configuration. Sub-fields: `enabled` (bool), `interval_seconds` (int), `timeout_seconds` (int). When enabled, the gateway periodically sends lightweight requests to verify the target is reachable.

Supported Models

Model	Context Window	Strengths
`mistral-large-latest`	128K	Flagship model — strongest reasoning, multilingual, and instruction following
`mistral-medium-latest`	32K	Balanced quality-to-cost ratio for production workloads
`mistral-small-latest`	32K	Fast and cost-effective for classification, extraction, and simpler tasks
`open-mixtral-8x22b`	64K	Open-weight MoE architecture — strong general performance with efficient inference
`codestral-latest`	32K	Purpose-built for code generation, completion, review, and explanation

Any model available on the Mistral API can be used — set the model field to the model ID string. Keeptrusts passes the model identifier through to the upstream without validation, so new models are supported automatically as Mistral releases them.

Use -latest aliases (e.g. mistral-large-latest) during development to always get the newest version. Pin to a dated version (e.g. mistral-large-2407) in production when you need reproducible outputs.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.mistral.ai/v1. Clients send standard OpenAI-format requests — no Mistral-specific SDK is required.

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # auth is handled by Keeptrusts via MISTRAL_API_KEY
)

response = client.chat.completions.create(
    model="mistral-large-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain the difference between CNN and RNN architectures."},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused", // auth handled by Keeptrusts via MISTRAL_API_KEY
});

const response = await client.chat.completions.create({
  model: "mistral-large-latest",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain the difference between CNN and RNN architectures." },
  ],
  temperature: 0.7,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-large-latest",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between CNN and RNN architectures."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Streaming

Keeptrusts fully supports Mistral's streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.

Configure stream_timeout_seconds to allow enough time for long-running streamed generations:

pack:
  name: mistral-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-streaming
    provider: mistral
    model: mistral-large-latest
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Write a short essay on EU AI regulation."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

const stream = await client.chat.completions.create({
  model: "mistral-large-latest",
  messages: [{ role: "user", content: "Write a short essay on EU AI regulation." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "mistral-large-latest",
    "messages": [{"role": "user", "content": "Write a short essay on EU AI regulation."}],
    "stream": true
  }'

Advanced Configuration

Multi-Model Fallback

Automatically fail over from Mistral Large to a smaller, faster model when the primary target is unavailable or returns errors:

pack:
  name: mistral-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-large-primary
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: mistral-small-fallback
    provider: mistral
    model: mistral-small-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

The gateway tries targets in order. If the first target fails (timeout, 5xx, connection error), the request is automatically retried against the next target. The client receives a single response — the failover is transparent.

Cross-Provider Fallback

Use Mistral as the primary provider with a different provider as a safety net. Because both use OpenAI wire format, the gateway handles this seamlessly:

pack:
  name: mistral-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-primary
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: openai-fallback
    provider: openai
    model: gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Latency-Based Routing

Route each request to the target with the lowest observed latency. Useful when running multiple Mistral model tiers and you want the fastest available response:

pack:
  name: mistral-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-large
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: mistral-small
    provider: mistral
    model: mistral-small-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Weighted A/B Testing

Split traffic proportionally across model variants to compare quality, cost, or latency in production:

pack:
  name: mistral-providers-7
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: variant-large
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: variant-mixtral
    provider: mistral
    model: open-mixtral-8x22b
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Combine with audit-logger and the console Events dashboard to compare output quality across variants.

Circuit Breaker

Temporarily remove unhealthy targets from the rotation when they exceed an error threshold. The circuit breaker transitions through three states: closed (normal), open (target removed), and half-open (limited test traffic to check recovery):

pack:
  name: mistral-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-main
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

When the circuit opens, the gateway stops sending traffic to the failed target and routes to healthy alternatives. After recovery_timeout_seconds, it enters half-open state and sends a limited number of test requests. If those succeed, the circuit closes and normal traffic resumes.

Retry Policy

Automatically retry transient failures with exponential backoff. This is applied before the fallback strategy, so a single target gets multiple attempts before the gateway moves to the next target:

pack:
  name: mistral-providers-9
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-main
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

The 429 status code is particularly important for Mistral — it indicates rate limiting. The backoff gives the rate limiter time to reset before retrying.

Code Generation with Codestral

Use Codestral for code-specific workloads. It is purpose-built for code generation, completion, review, and explanation:

pack:
  name: mistral-providers-10
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-code
    provider: mistral
    model: codestral-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

response = client.chat.completions.create(
    model="codestral-latest",
    messages=[
        {"role": "system", "content": "You are an expert programmer. Write clean, well-documented code."},
        {"role": "user", "content": "Write a Python function to merge two sorted lists in O(n) time."},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

Combined Resilience Configuration

A production-grade setup combining circuit breaker, retry, health probes, and fallback for maximum uptime:

pack:
  name: mistral-providers-11
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: mistral-large-primary
    provider: mistral
    model: mistral-large-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: mistral-small-secondary
    provider: mistral
    model: mistral-small-latest
    secret_key_ref:
      env: MISTRAL_API_KEY
  - id: openai-emergency
    provider: openai
    model: gpt-4o-mini
    secret_key_ref:
      env: OPENAI_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Mistral is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client (Python, Node.js, Go, etc.) without code changes.
Use -latest model aliases during development — Mistral regularly updates models behind the -latest alias, so you always get improvements. Pin to a specific dated version (e.g. mistral-large-2407) in production when you need reproducible outputs.
Use Codestral for code tasks — codestral-latest is specifically trained for code generation, completion, and explanation. Use a lower temperature (0.1–0.3) for deterministic code output.
Leverage multilingual strength — Mistral models have strong performance across European languages (French, German, Spanish, Italian), making them a good choice for multilingual deployments and EU compliance workloads.
Enable health probes on production targets — active probes let routing strategies (fallback, latency) react to API outages within seconds rather than waiting for a request to fail.
Combine circuit breaker with fallback — the circuit breaker prevents cascading failures by removing unhealthy targets, while fallback ensures requests are still served by healthy alternatives.
Set stream_timeout_seconds for streaming — streaming responses can take significantly longer than non-streaming. Set this to 2–3× your timeout_seconds value to avoid premature timeouts on long generations.
Track costs with pricing metadata — set the pricing field on each target so the console dashboard can display per-model cost breakdowns and help you optimize spend.
Prefer fallback strategy for critical workloads; pair Mistral with a second provider for resilience.
Declare pricing even if approximate — it enables cost dashboards and per-request budget enforcement.
Separate API keys per environment — use distinct secret_key_ref values for dev, staging, and production.
Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For AI systems

Canonical terms: Keeptrusts gateway, Mistral AI, Mistral, Mistral Large, Mistral Small, Codestral, Pixtral, provider target, policy-config.yaml, provider: "mistral".
Config field names: provider, model, base_url: "https://api.mistral.ai/v1", secret_key_ref.env: "MISTRAL_API_KEY", format: "openai", stream_timeout_seconds, pricing.
Provider shorthand: mistral:chat:<model> (e.g., mistral:chat:mistral-large-latest).
Key behavior: Mistral uses an OpenAI-compatible API with function calling and JSON mode support.
Best next pages: Anthropic integration, OpenAI integration, Provider routing.

For engineers

Prerequisites: Mistral API key (MISTRAL_API_KEY env var from console.mistral.ai), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"mistral-large-latest","messages":[{"role":"user","content":"hello"}]}'.
Mistral uses OpenAI-compatible API — standard OpenAI SDKs work without modification.
Use separate secret_key_ref values for dev, staging, and production environments.
Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For leaders

Mistral AI is EU-headquartered and offers EU-hosted inference — relevant for GDPR and EU AI Act compliance.
Multilingual strength across European languages makes Mistral suitable for pan-European deployments.
Codestral provides dedicated code generation capabilities; Pixtral adds vision — apply different policies per model capability.
Function calling support enables agentic workloads — pair with Keeptrusts prompt-injection policies for agent governance.

Next steps

Anthropic integration — alternative high-quality reasoning models
OpenAI integration — compare with GPT-4o
Provider routing strategies — EU-preferred routing with global fallback
Policy configuration — prompt-injection and PII policy reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Compact Provider Shorthand​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Multi-Model Fallback​

Cross-Provider Fallback​

Latency-Based Routing​

Weighted A/B Testing​

Circuit Breaker​

Retry Policy​

Code Generation with Codestral​

Combined Resilience Configuration​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​