Mistral AI
Mistral AI builds high-performance large language models with strong multilingual capabilities and efficient mixture-of-experts (MoE) architectures. Their inference API is fully OpenAI-compatible, so Keeptrusts needs no format translation — requests and responses flow through the gateway in native OpenAI wire format, and any OpenAI SDK client can be pointed at the gateway with zero code changes.
Use this page when
- You need the exact command, config, API, or integration details for Mistral AI.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Keeptrusts sits between your application and Mistral's API endpoint, enforcing policy chains — prompt-injection detection, PII redaction, safety filters, content-quality scoring, audit logging — on every request and response without requiring application-side changes.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- Mistral API key — obtain one from La Plateforme.
- Keeptrusts CLI — install
kt(quickstart guide). - Export your API key so the gateway can read it at startup:
export MISTRAL_API_KEY="your-mistral-api-key"
When the provider field is set to "mistral", Keeptrusts auto-detects both the base URL (https://api.mistral.ai/v1) and the API key environment variable (MISTRAL_API_KEY). You only need to override these if you use a custom deployment, a self-hosted Mistral endpoint, or a non-standard env-var name.
Configuration
A minimal policy-config.yaml that routes traffic through Mistral with prompt-injection, PII, and safety policies:
pack:
name: mistral-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: mistral-large
provider: mistral
model: mistral-large-latest
base_url: https://api.mistral.ai/v1
secret_key_ref:
env: MISTRAL_API_KEY
Start the gateway:
kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml
Compact Provider Shorthand
You can encode the model directly in the provider field. The two forms below are equivalent:
# Shorthand — model embedded in the provider string
- id: "mistral-large"
provider: "mistral:chat:mistral-large-latest"
# Explicit — separate provider and model fields
- id: "mistral-large"
provider: "mistral"
model: "mistral-large-latest"
The shorthand form is convenient for quick configurations. The explicit form is recommended when you need to set additional fields like pricing or health_probe.
Provider Fields
All fields available on a providers.targets[] entry for Mistral AI:
| Field | Type | Default | Description |
|---|---|---|---|
id | string | required | Unique identifier for this target. Used in logs, the console dashboard, and routing decisions. |
provider | string | required | Provider ID. Use "mistral" or the shorthand "mistral:chat:<model>". |
model | string | required | Model name, e.g. "mistral-large-latest". Passed through to the upstream API as-is. |
base_url | string | https://api.mistral.ai/v1 | API base URL. Auto-detected when provider is "mistral". Override for self-hosted or VPC endpoints. |
secret_key_ref | object | MISTRAL_API_KEY | Object reference to the environment variable holding the API key. Auto-detected for the "mistral" provider. |
timeout_seconds | integer | 60 | Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error. |
stream_timeout_seconds | integer | inherits timeout_seconds | Maximum wall-clock time for streaming requests. Falls back to timeout_seconds if not set. Set this higher than timeout_seconds for long-running streamed generations. |
max_context_tokens | integer | none | Maximum token budget for the request (prompt + completion). When set, the gateway rejects requests that exceed this limit before forwarding to the upstream. |
format | string | "openai" | Wire format. Mistral is natively OpenAI-compatible, so this is always "openai". |
provider_type | string | auto | Explicit provider-type override. Rarely needed — auto-detection handles Mistral correctly. |
description | string | none | Human-readable label shown in the console dashboard, logs, and health-check output. |
weight | float | 1.0 | Routing weight used by the weighted_round_robin strategy. Higher values receive proportionally more traffic. |
pricing | object | none | Token pricing in USD per 1M tokens. Fields: prompt (input cost), completion (output cost). Displayed in the console cost dashboard. |
health_probe | object | none | Active health probe configuration. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int). When enabled, the gateway periodically sends lightweight requests to verify the target is reachable. |
Supported Models
| Model | Context Window | Strengths |
|---|---|---|
mistral-large-latest | 128K | Flagship model — strongest reasoning, multilingual, and instruction following |
mistral-medium-latest | 32K | Balanced quality-to-cost ratio for production workloads |
mistral-small-latest | 32K | Fast and cost-effective for classification, extraction, and simpler tasks |
open-mixtral-8x22b | 64K | Open-weight MoE architecture — strong general performance with efficient inference |
codestral-latest | 32K | Purpose-built for code generation, completion, review, and explanation |
Any model available on the Mistral API can be used — set the model field to the model ID string. Keeptrusts passes the model identifier through to the upstream without validation, so new models are supported automatically as Mistral releases them.
-latest aliases (e.g. mistral-large-latest) during development to always get the newest version. Pin to a dated version (e.g. mistral-large-2407) in production when you need reproducible outputs.Client Examples
Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.mistral.ai/v1. Clients send standard OpenAI-format requests — no Mistral-specific SDK is required.
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via MISTRAL_API_KEY
)
response = client.chat.completions.create(
model="mistral-large-latest",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between CNN and RNN architectures."},
],
temperature=0.7,
max_tokens=512,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused", // auth handled by Keeptrusts via MISTRAL_API_KEY
});
const response = await client.chat.completions.create({
model: "mistral-large-latest",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain the difference between CNN and RNN architectures." },
],
temperature: 0.7,
max_tokens: 512,
});
console.log(response.choices[0].message.content);
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-large-latest",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between CNN and RNN architectures."}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming
Keeptrusts fully supports Mistral's streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.
Configure stream_timeout_seconds to allow enough time for long-running streamed generations:
pack:
name: mistral-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-streaming
provider: mistral
model: mistral-large-latest
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
stream = client.chat.completions.create(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Write a short essay on EU AI regulation."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused",
});
const stream = await client.chat.completions.create({
model: "mistral-large-latest",
messages: [{ role: "user", content: "Write a short essay on EU AI regulation." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "mistral-large-latest",
"messages": [{"role": "user", "content": "Write a short essay on EU AI regulation."}],
"stream": true
}'
Advanced Configuration
Multi-Model Fallback
Automatically fail over from Mistral Large to a smaller, faster model when the primary target is unavailable or returns errors:
pack:
name: mistral-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small-fallback
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
The gateway tries targets in order. If the first target fails (timeout, 5xx, connection error), the request is automatically retried against the next target. The client receives a single response — the failover is transparent.
Cross-Provider Fallback
Use Mistral as the primary provider with a different provider as a safety net. Because both use OpenAI wire format, the gateway handles this seamlessly:
pack:
name: mistral-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: openai-fallback
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Latency-Based Routing
Route each request to the target with the lowest observed latency. Useful when running multiple Mistral model tiers and you want the fastest available response:
pack:
name: mistral-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Weighted A/B Testing
Split traffic proportionally across model variants to compare quality, cost, or latency in production:
pack:
name: mistral-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-large
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: variant-mixtral
provider: mistral
model: open-mixtral-8x22b
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Combine with audit-logger and the console Events dashboard to compare output quality across variants.
Circuit Breaker
Temporarily remove unhealthy targets from the rotation when they exceed an error threshold. The circuit breaker transitions through three states: closed (normal), open (target removed), and half-open (limited test traffic to check recovery):
pack:
name: mistral-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-main
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
When the circuit opens, the gateway stops sending traffic to the failed target and routes to healthy alternatives. After recovery_timeout_seconds, it enters half-open state and sends a limited number of test requests. If those succeed, the circuit closes and normal traffic resumes.
Retry Policy
Automatically retry transient failures with exponential backoff. This is applied before the fallback strategy, so a single target gets multiple attempts before the gateway moves to the next target:
pack:
name: mistral-providers-9
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-main
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
The 429 status code is particularly important for Mistral — it indicates rate limiting. The backoff gives the rate limiter time to reset before retrying.
Code Generation with Codestral
Use Codestral for code-specific workloads. It is purpose-built for code generation, completion, review, and explanation:
pack:
name: mistral-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-code
provider: mistral
model: codestral-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
response = client.chat.completions.create(
model="codestral-latest",
messages=[
{"role": "system", "content": "You are an expert programmer. Write clean, well-documented code."},
{"role": "user", "content": "Write a Python function to merge two sorted lists in O(n) time."},
],
temperature=0.2,
)
print(response.choices[0].message.content)
Combined Resilience Configuration
A production-grade setup combining circuit breaker, retry, health probes, and fallback for maximum uptime:
pack:
name: mistral-providers-11
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small-secondary
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: openai-emergency
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
- Mistral is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client (Python, Node.js, Go, etc.) without code changes.
- Use
-latestmodel aliases during development — Mistral regularly updates models behind the-latestalias, so you always get improvements. Pin to a specific dated version (e.g.mistral-large-2407) in production when you need reproducible outputs. - Use Codestral for code tasks —
codestral-latestis specifically trained for code generation, completion, and explanation. Use a lower temperature (0.1–0.3) for deterministic code output. - Leverage multilingual strength — Mistral models have strong performance across European languages (French, German, Spanish, Italian), making them a good choice for multilingual deployments and EU compliance workloads.
- Enable health probes on production targets — active probes let routing strategies (fallback, latency) react to API outages within seconds rather than waiting for a request to fail.
- Combine circuit breaker with fallback — the circuit breaker prevents cascading failures by removing unhealthy targets, while fallback ensures requests are still served by healthy alternatives.
- Set
stream_timeout_secondsfor streaming — streaming responses can take significantly longer than non-streaming. Set this to 2–3× yourtimeout_secondsvalue to avoid premature timeouts on long generations. - Track costs with pricing metadata — set the
pricingfield on each target so the console dashboard can display per-model cost breakdowns and help you optimize spend. - Prefer
fallbackstrategy for critical workloads; pair Mistral with a second provider for resilience. - Declare
pricingeven if approximate — it enables cost dashboards and per-request budget enforcement. - Separate API keys per environment — use distinct
secret_key_refvalues for dev, staging, and production. - Set
stream_timeout_secondsfor streaming workloads to accommodate longer generations.
For AI systems
- Canonical terms: Keeptrusts gateway, Mistral AI, Mistral, Mistral Large, Mistral Small, Codestral, Pixtral, provider target, policy-config.yaml,
provider: "mistral". - Config field names:
provider,model,base_url: "https://api.mistral.ai/v1",secret_key_ref.env: "MISTRAL_API_KEY",format: "openai",stream_timeout_seconds,pricing. - Provider shorthand:
mistral:chat:<model>(e.g.,mistral:chat:mistral-large-latest). - Key behavior: Mistral uses an OpenAI-compatible API with function calling and JSON mode support.
- Best next pages: Anthropic integration, OpenAI integration, Provider routing.
For engineers
- Prerequisites: Mistral API key (
MISTRAL_API_KEYenv var from console.mistral.ai),ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"mistral-large-latest","messages":[{"role":"user","content":"hello"}]}'. - Mistral uses OpenAI-compatible API — standard OpenAI SDKs work without modification.
- Use separate
secret_key_refvalues for dev, staging, and production environments. - Set
stream_timeout_secondsfor streaming workloads to accommodate longer generations.
For leaders
- Mistral AI is EU-headquartered and offers EU-hosted inference — relevant for GDPR and EU AI Act compliance.
- Multilingual strength across European languages makes Mistral suitable for pan-European deployments.
- Codestral provides dedicated code generation capabilities; Pixtral adds vision — apply different policies per model capability.
- Function calling support enables agentic workloads — pair with Keeptrusts prompt-injection policies for agent governance.
Next steps
- Anthropic integration — alternative high-quality reasoning models
- OpenAI integration — compare with GPT-4o
- Provider routing strategies — EU-preferred routing with global fallback
- Policy configuration — prompt-injection and PII policy reference
- Quickstart — install
ktand run your first gateway