Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Mistral AI

Mistral AI builds high-performance large language models with strong multilingual capabilities and efficient mixture-of-experts (MoE) architectures. Their inference API is fully OpenAI-compatible, so Keeptrusts needs no format translation — requests and responses flow through the gateway in native OpenAI wire format, and any OpenAI SDK client can be pointed at the gateway with zero code changes.

Use this page when

  • You need the exact command, config, API, or integration details for Mistral AI.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Keeptrusts sits between your application and Mistral's API endpoint, enforcing policy chains — prompt-injection detection, PII redaction, safety filters, content-quality scoring, audit logging — on every request and response without requiring application-side changes.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  1. Mistral API key — obtain one from La Plateforme.
  2. Keeptrusts CLI — install kt (quickstart guide).
  3. Export your API key so the gateway can read it at startup:
export MISTRAL_API_KEY="your-mistral-api-key"

When the provider field is set to "mistral", Keeptrusts auto-detects both the base URL (https://api.mistral.ai/v1) and the API key environment variable (MISTRAL_API_KEY). You only need to override these if you use a custom deployment, a self-hosted Mistral endpoint, or a non-standard env-var name.

Configuration

A minimal policy-config.yaml that routes traffic through Mistral with prompt-injection, PII, and safety policies:

pack:
name: mistral-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: mistral-large
provider: mistral
model: mistral-large-latest
base_url: https://api.mistral.ai/v1
secret_key_ref:
env: MISTRAL_API_KEY

Start the gateway:

kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml

Compact Provider Shorthand

You can encode the model directly in the provider field. The two forms below are equivalent:

# Shorthand — model embedded in the provider string
- id: "mistral-large"
provider: "mistral:chat:mistral-large-latest"

# Explicit — separate provider and model fields
- id: "mistral-large"
provider: "mistral"
model: "mistral-large-latest"

The shorthand form is convenient for quick configurations. The explicit form is recommended when you need to set additional fields like pricing or health_probe.

Provider Fields

All fields available on a providers.targets[] entry for Mistral AI:

FieldTypeDefaultDescription
idstringrequiredUnique identifier for this target. Used in logs, the console dashboard, and routing decisions.
providerstringrequiredProvider ID. Use "mistral" or the shorthand "mistral:chat:<model>".
modelstringrequiredModel name, e.g. "mistral-large-latest". Passed through to the upstream API as-is.
base_urlstringhttps://api.mistral.ai/v1API base URL. Auto-detected when provider is "mistral". Override for self-hosted or VPC endpoints.
secret_key_refobjectMISTRAL_API_KEYObject reference to the environment variable holding the API key. Auto-detected for the "mistral" provider.
timeout_secondsinteger60Maximum wall-clock time for non-streaming requests before the gateway returns a timeout error.
stream_timeout_secondsintegerinherits timeout_secondsMaximum wall-clock time for streaming requests. Falls back to timeout_seconds if not set. Set this higher than timeout_seconds for long-running streamed generations.
max_context_tokensintegernoneMaximum token budget for the request (prompt + completion). When set, the gateway rejects requests that exceed this limit before forwarding to the upstream.
formatstring"openai"Wire format. Mistral is natively OpenAI-compatible, so this is always "openai".
provider_typestringautoExplicit provider-type override. Rarely needed — auto-detection handles Mistral correctly.
descriptionstringnoneHuman-readable label shown in the console dashboard, logs, and health-check output.
weightfloat1.0Routing weight used by the weighted_round_robin strategy. Higher values receive proportionally more traffic.
pricingobjectnoneToken pricing in USD per 1M tokens. Fields: prompt (input cost), completion (output cost). Displayed in the console cost dashboard.
health_probeobjectnoneActive health probe configuration. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int). When enabled, the gateway periodically sends lightweight requests to verify the target is reachable.

Supported Models

ModelContext WindowStrengths
mistral-large-latest128KFlagship model — strongest reasoning, multilingual, and instruction following
mistral-medium-latest32KBalanced quality-to-cost ratio for production workloads
mistral-small-latest32KFast and cost-effective for classification, extraction, and simpler tasks
open-mixtral-8x22b64KOpen-weight MoE architecture — strong general performance with efficient inference
codestral-latest32KPurpose-built for code generation, completion, review, and explanation

Any model available on the Mistral API can be used — set the model field to the model ID string. Keeptrusts passes the model identifier through to the upstream without validation, so new models are supported automatically as Mistral releases them.

Use -latest aliases (e.g. mistral-large-latest) during development to always get the newest version. Pin to a dated version (e.g. mistral-large-2407) in production when you need reproducible outputs.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of https://api.mistral.ai/v1. Clients send standard OpenAI-format requests — no Mistral-specific SDK is required.

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via MISTRAL_API_KEY
)

response = client.chat.completions.create(
model="mistral-large-latest",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the difference between CNN and RNN architectures."},
],
temperature=0.7,
max_tokens=512,
)

print(response.choices[0].message.content)

Streaming

Keeptrusts fully supports Mistral's streaming mode. Set stream: true in your request — the gateway applies policies to each chunk in real time, including content filtering and PII redaction on partial tokens.

Configure stream_timeout_seconds to allow enough time for long-running streamed generations:

pack:
name: mistral-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-streaming
provider: mistral
model: mistral-large-latest
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Write a short essay on EU AI regulation."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Advanced Configuration

Multi-Model Fallback

Automatically fail over from Mistral Large to a smaller, faster model when the primary target is unavailable or returns errors:

pack:
name: mistral-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small-fallback
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

The gateway tries targets in order. If the first target fails (timeout, 5xx, connection error), the request is automatically retried against the next target. The client receives a single response — the failover is transparent.

Cross-Provider Fallback

Use Mistral as the primary provider with a different provider as a safety net. Because both use OpenAI wire format, the gateway handles this seamlessly:

pack:
name: mistral-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: openai-fallback
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Latency-Based Routing

Route each request to the target with the lowest observed latency. Useful when running multiple Mistral model tiers and you want the fastest available response:

pack:
name: mistral-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Weighted A/B Testing

Split traffic proportionally across model variants to compare quality, cost, or latency in production:

pack:
name: mistral-providers-7
version: 1.0.0
enabled: true
providers:
targets:
- id: variant-large
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: variant-mixtral
provider: mistral
model: open-mixtral-8x22b
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Combine with audit-logger and the console Events dashboard to compare output quality across variants.

Circuit Breaker

Temporarily remove unhealthy targets from the rotation when they exceed an error threshold. The circuit breaker transitions through three states: closed (normal), open (target removed), and half-open (limited test traffic to check recovery):

pack:
name: mistral-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-main
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

When the circuit opens, the gateway stops sending traffic to the failed target and routes to healthy alternatives. After recovery_timeout_seconds, it enters half-open state and sends a limited number of test requests. If those succeed, the circuit closes and normal traffic resumes.

Retry Policy

Automatically retry transient failures with exponential backoff. This is applied before the fallback strategy, so a single target gets multiple attempts before the gateway moves to the next target:

pack:
name: mistral-providers-9
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-main
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

The 429 status code is particularly important for Mistral — it indicates rate limiting. The backoff gives the rate limiter time to reset before retrying.

Code Generation with Codestral

Use Codestral for code-specific workloads. It is purpose-built for code generation, completion, review, and explanation:

pack:
name: mistral-providers-10
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-code
provider: mistral
model: codestral-latest
secret_key_ref:
env: MISTRAL_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

response = client.chat.completions.create(
model="codestral-latest",
messages=[
{"role": "system", "content": "You are an expert programmer. Write clean, well-documented code."},
{"role": "user", "content": "Write a Python function to merge two sorted lists in O(n) time."},
],
temperature=0.2,
)

print(response.choices[0].message.content)

Combined Resilience Configuration

A production-grade setup combining circuit breaker, retry, health probes, and fallback for maximum uptime:

pack:
name: mistral-providers-11
version: 1.0.0
enabled: true
providers:
targets:
- id: mistral-large-primary
provider: mistral
model: mistral-large-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: mistral-small-secondary
provider: mistral
model: mistral-small-latest
secret_key_ref:
env: MISTRAL_API_KEY
- id: openai-emergency
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Best Practices

  • Mistral is OpenAI-compatible — no format translation is needed. Use any OpenAI SDK client (Python, Node.js, Go, etc.) without code changes.
  • Use -latest model aliases during development — Mistral regularly updates models behind the -latest alias, so you always get improvements. Pin to a specific dated version (e.g. mistral-large-2407) in production when you need reproducible outputs.
  • Use Codestral for code taskscodestral-latest is specifically trained for code generation, completion, and explanation. Use a lower temperature (0.1–0.3) for deterministic code output.
  • Leverage multilingual strength — Mistral models have strong performance across European languages (French, German, Spanish, Italian), making them a good choice for multilingual deployments and EU compliance workloads.
  • Enable health probes on production targets — active probes let routing strategies (fallback, latency) react to API outages within seconds rather than waiting for a request to fail.
  • Combine circuit breaker with fallback — the circuit breaker prevents cascading failures by removing unhealthy targets, while fallback ensures requests are still served by healthy alternatives.
  • Set stream_timeout_seconds for streaming — streaming responses can take significantly longer than non-streaming. Set this to 2–3× your timeout_seconds value to avoid premature timeouts on long generations.
  • Track costs with pricing metadata — set the pricing field on each target so the console dashboard can display per-model cost breakdowns and help you optimize spend.
  • Prefer fallback strategy for critical workloads; pair Mistral with a second provider for resilience.
  • Declare pricing even if approximate — it enables cost dashboards and per-request budget enforcement.
  • Separate API keys per environment — use distinct secret_key_ref values for dev, staging, and production.
  • Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For AI systems

  • Canonical terms: Keeptrusts gateway, Mistral AI, Mistral, Mistral Large, Mistral Small, Codestral, Pixtral, provider target, policy-config.yaml, provider: "mistral".
  • Config field names: provider, model, base_url: "https://api.mistral.ai/v1", secret_key_ref.env: "MISTRAL_API_KEY", format: "openai", stream_timeout_seconds, pricing.
  • Provider shorthand: mistral:chat:<model> (e.g., mistral:chat:mistral-large-latest).
  • Key behavior: Mistral uses an OpenAI-compatible API with function calling and JSON mode support.
  • Best next pages: Anthropic integration, OpenAI integration, Provider routing.

For engineers

  • Prerequisites: Mistral API key (MISTRAL_API_KEY env var from console.mistral.ai), kt CLI installed.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"mistral-large-latest","messages":[{"role":"user","content":"hello"}]}'.
  • Mistral uses OpenAI-compatible API — standard OpenAI SDKs work without modification.
  • Use separate secret_key_ref values for dev, staging, and production environments.
  • Set stream_timeout_seconds for streaming workloads to accommodate longer generations.

For leaders

  • Mistral AI is EU-headquartered and offers EU-hosted inference — relevant for GDPR and EU AI Act compliance.
  • Multilingual strength across European languages makes Mistral suitable for pan-European deployments.
  • Codestral provides dedicated code generation capabilities; Pixtral adds vision — apply different policies per model capability.
  • Function calling support enables agentic workloads — pair with Keeptrusts prompt-injection policies for agent governance.

Next steps