Cohere
Cohere provides enterprise-grade large language models optimized for retrieval-augmented generation, tool use, and multilingual workloads. Keeptrusts gateways Cohere's v2 Chat API natively, auto-translating between OpenAI's request format and Cohere's distinct schema so existing client code requires no modification. All policy rules — PII redaction, prompt-injection blocking, audit logging — apply before each request reaches Cohere and before each response reaches your application.
Use this page when
- You need the exact command, config, API, or integration details for Cohere.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- A Cohere API key with access to Production tier
- Keeptrusts CLI (
kt) installed and on yourPATH COHERE_API_KEYexported in your shell or injected via your secrets manager
Configuration
Minimal configuration
pack:
name: cohere-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-command-r-plus
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Full named configuration with policy chain
pack:
name: cohere-enterprise
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- citation-verifier
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
entities:
- EMAIL
- PHONE
- SSN
- CREDIT_CARD
citation-verifier:
check_hallucinations: true
min_grounded_ratio: 0.8
audit-logger:
retention_days: 365
providers:
targets:
- id: cohere-command-r-plus
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
Balanced model (cost-optimised)
pack:
name: cohere-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-command-r
provider: cohere:chat:command-r-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Fast/cheap model for high-throughput workloads
pack:
name: cohere-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-r7b
provider: cohere:chat:command-r7b-12-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Embeddings endpoint
pack:
name: cohere-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-embed
provider: cohere:embedding:embed-english-v3.0
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Start the gateway
export COHERE_API_KEY="your-production-api-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
Provider Fields
| Field | Required | Default | Description |
|---|---|---|---|
provider | Yes | — | Provider identifier. Use short form "cohere" or fully-qualified "cohere:chat:<model>". |
secret_key_ref | Yes | COHERE_API_KEY | Name of the env var holding the Cohere API key. Auto-detected if set to the default name. |
base_url | No | https://api.cohere.com/v2 | Override the Cohere API base URL. Useful for gateways or private endpoints. |
provider_type | No | cohere | Forces the Cohere runtime when the provider string is ambiguous. |
format | No | cohere | Wire format used for request/response translation. Keeptrusts auto-translates OpenAI-format client requests to Cohere v2 format. |
data_policy.training_opt_out | No | false | When true, adds the privacy_tier: "default" header to opt out of Cohere model training. Set to true for any production or regulated workload. |
options.max_tokens | No | Model default | Maximum number of tokens in the completion. |
options.temperature | No | 0.3 | Sampling temperature (0–2). Lower values produce more deterministic outputs. |
options.p | No | 0.75 | Nucleus sampling top-p. Applied alongside temperature. |
Supported Models
| Model | Context Window | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|---|
command-r-plus-08-2024 | 128k | $2.50 | $10.00 | Best quality; recommended for complex RAG and tool-use pipelines |
command-r-08-2024 | 128k | $0.15 | $0.60 | Balanced quality and cost; good default for most chat workloads |
command-r7b-12-2024 | 128k | $0.0375 | $0.15 | Fast and cheapest; suited for high-throughput classification or extraction |
command-nightly | 128k | Varies | Varies | Latest experimental Command model; not recommended for production |
embed-english-v3.0 | 512 tokens | $0.10 | — | English-only embeddings; use with input_type: search_document or search_query |
embed-multilingual-v3.0 | 512 tokens | $0.10 | — | 100+ languages; same dimensions as English variant |
rerank-english-v3.0 | 4096 | $2.00 / 1k searches | — | Reranking only; pass candidate documents and a query |
Cohere's v2 API uses message instead of messages, chat_history instead of message arrays, and max_tokens as max_new_tokens. Keeptrusts's format translation layer handles all conversions automatically — your OpenAI-format client code works without changes.
Client Examples
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by the gateway
)
# Chat completion — Cohere format is auto-detected and translated
response = client.chat.completions.create(
model="command-r-plus-08-2024",
messages=[
{
"role": "system",
"content": "You are a precise assistant. Cite your sources.",
},
{
"role": "user",
"content": "Summarise the key provisions of the EU AI Act.",
},
],
max_tokens=1024,
temperature=0.3,
)
print(response.choices[0].message.content)
# Embeddings
embed_response = client.embeddings.create(
model="embed-english-v3.0",
input=["What is retrieval-augmented generation?"],
)
print(f"Dimensions: {len(embed_response.data[0].embedding)}")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "unused",
});
// Chat completion
const response = await client.chat.completions.create({
model: "command-r-plus-08-2024",
messages: [
{
role: "system",
content: "You are a precise assistant. Cite your sources.",
},
{
role: "user",
content: "Summarise the key provisions of the EU AI Act.",
},
],
max_tokens: 1024,
temperature: 0.3,
});
console.log(response.choices[0].message.content);
// Embeddings
const embedResponse = await client.embeddings.create({
model: "embed-english-v3.0",
input: ["What is retrieval-augmented generation?"],
});
console.log(`Dimensions: ${embedResponse.data[0].embedding.length}`);
# Chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "command-r-plus-08-2024",
"messages": [
{"role": "system", "content": "You are a precise assistant."},
{"role": "user", "content": "Summarise the key provisions of the EU AI Act."}
],
"max_tokens": 1024,
"temperature": 0.3
}'
# Embeddings
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "embed-english-v3.0",
"input": ["What is retrieval-augmented generation?"]
}'
Streaming
Cohere supports server-sent event (SSE) streaming. Keeptrusts intercepts each text-generation event to apply real-time content policies before forwarding the token to the client. Set stream: true (or True) in your request:
- Python
- Node.js
- cURL
with client.chat.completions.stream(
model="command-r-plus-08-2024",
messages=[{"role": "user", "content": "Explain quantum entanglement step by step."}],
max_tokens=2048,
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
const stream = await client.chat.completions.stream({
model: "command-r-plus-08-2024",
messages: [{ role: "user", content: "Explain quantum entanglement step by step." }],
max_tokens: 2048,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(delta);
}
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "command-r-plus-08-2024",
"messages": [{"role": "user", "content": "Explain quantum entanglement step by step."}],
"max_tokens": 2048,
"stream": true
}'
Streaming is compatible with all policy types. Redaction policies apply to buffered intermediate chunks; blocking policies halt the stream immediately and return a 403 with a policy violation envelope.
Advanced Configuration
Training opt-out and data privacy
Cohere's Privacy Tier controls whether your prompts are used to train future models. Set data_policy.training_opt_out: true for all regulated or sensitive workloads:
pack:
name: cohere-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-private
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Cohere's default tier may allow training use. Always set training_opt_out: true in any production, customer-data, or regulated environment.
RAG and grounded generation
Command R+ has native tool-use and document-grounding capabilities. Use Keeptrusts's citation-verifier policy alongside grounded prompts to validate that model citations are present before returning responses:
policy:
citation-verifier:
require_sources: true
require_source_match: true
min_groundedness: 0.85
providers:
targets:
- id: cohere-rag
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
pack:
name: cohere-example-7
version: 1.0.0
enabled: true
policies:
chain:
- citation-verifier
Multi-model fallback
Route to the cheaper Command R7B model if the Command R+ quota is exhausted or latency is high:
pack:
name: cohere-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-primary
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
- id: cohere-fallback
provider: cohere:chat:command-r7b-12-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Best Practices
-
Always enable
training_opt_out— Cohere's default tier uses prompts for training. Setdata_policy.training_opt_out: trueon every production target to ensure your data is never used for model improvement without explicit consent. -
Use fully-qualified provider IDs — Prefer
"cohere:chat:command-r-plus-08-2024"over the bare"cohere"shorthand. The fully-qualified form pins the model version and avoids unintended upgrades when Cohere changes their default. -
Apply prompt-injection detection — Cohere Command R+ supports tool use and connectors, which expands the surface area for indirect prompt injection. Include the
prompt-injectionpolicy withthreshold: 0.8in every chain that uses tools or document grounding. -
Set explicit
max_tokens— Cohere models can generate very long responses. Cappingmax_tokensprevents unexpectedly large API bills and keeps policy evaluation latency predictable. -
Use domain-appropriate embed models — For search and RAG, use
embed-english-v3.0withinput_type: search_documentwhen indexing andinput_type: search_querywhen querying. Mismatched input types degrade retrieval quality. -
Chain
citation-verifierfor RAG workloads — When using Command R+ with grounded generation, add thecitation-verifierpolicy to ensure responses cite evidence and meet your minimum grounded-content threshold before reaching end users.
For AI systems
- Canonical terms: Keeptrusts gateway, Cohere, Command R+, Command R, Embed, Rerank, provider target, policy-config.yaml,
provider: "cohere". - Config field names:
provider,model,base_url: "https://api.cohere.ai",secret_key_ref.env: "COHERE_API_KEY",format,provider_type: "cohere". - Key behavior: Keeptrusts translates between OpenAI format and Cohere's native chat/generate API. Supports both chat and RAG (grounded generation) workloads.
- Best next pages: Anthropic integration, Voyage integration (embeddings), Policy configuration.
For engineers
- Prerequisites: Cohere API key (
COHERE_API_KEYenv var from dashboard.cohere.com),ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"command-r-plus","messages":[{"role":"user","content":"hello"}]}'. - For RAG workloads with grounded generation, add
citation-verifierpolicy to enforce minimum grounded-content thresholds. - Cohere provides separate endpoints for chat, embed, and rerank — configure separate provider targets for each capability.
For leaders
- Cohere's Command R+ excels at RAG workloads with native grounded generation — citations are built into responses, enabling verifiable AI outputs.
- The
citation-verifierpolicy enforces that responses cite evidence, addressing regulatory requirements for traceable AI decisions. - Cohere offers enterprise data protection agreements and does not train on customer data by default.
- Embed and Rerank models support vector search pipelines — Keeptrusts provides audit logging across the full retrieval-augmented stack.
Next steps
- Voyage integration — dedicated embedding models for vector search
- Anthropic integration — alternative high-quality reasoning models
- Provider routing strategies — separate targets for chat vs embedding workloads
- Policy configuration — citation-verifier and PII redaction reference
- Quickstart — install
ktand run your first gateway