Cohere

Cohere provides enterprise-grade large language models optimized for retrieval-augmented generation, tool use, and multilingual workloads. Keeptrusts gateways Cohere's v2 Chat API natively, auto-translating between OpenAI's request format and Cohere's distinct schema so existing client code requires no modification. All policy rules — PII redaction, prompt-injection blocking, audit logging — apply before each request reaches Cohere and before each response reaches your application.

Use this page when

You need the exact command, config, API, or integration details for Cohere.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

A Cohere API key with access to Production tier
Keeptrusts CLI (kt) installed and on your PATH
COHERE_API_KEY exported in your shell or injected via your secrets manager

Configuration

Minimal configuration

pack:
  name: cohere-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-command-r-plus
    provider: cohere:chat:command-r-plus-08-2024
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Full named configuration with policy chain

pack:
  name: cohere-enterprise
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - citation-verifier
  - audit-logger
policy:
  prompt-injection:
    threshold: 0.8
    action: block
  pii-detector:
    action: redact
    entities:
    - EMAIL
    - PHONE
    - SSN
    - CREDIT_CARD
  citation-verifier:
    check_hallucinations: true
    min_grounded_ratio: 0.8
  audit-logger:
    retention_days: 365
providers:
  targets:
  - id: cohere-command-r-plus
    provider: cohere:chat:command-r-plus-08-2024
    secret_key_ref:
      env: COHERE_API_KEY

Balanced model (cost-optimised)

pack:
  name: cohere-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-command-r
    provider: cohere:chat:command-r-08-2024
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Fast/cheap model for high-throughput workloads

pack:
  name: cohere-providers-4
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-r7b
    provider: cohere:chat:command-r7b-12-2024
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Embeddings endpoint

pack:
  name: cohere-providers-5
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-embed
    provider: cohere:embedding:embed-english-v3.0
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Start the gateway

export COHERE_API_KEY="your-production-api-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Provider Fields

Field	Required	Default	Description
`provider`	Yes	—	Provider identifier. Use short form `"cohere"` or fully-qualified `"cohere:chat:<model>"`.
`secret_key_ref`	Yes	`COHERE_API_KEY`	Name of the env var holding the Cohere API key. Auto-detected if set to the default name.
`base_url`	No	`https://api.cohere.com/v2`	Override the Cohere API base URL. Useful for gateways or private endpoints.
`provider_type`	No	`cohere`	Forces the Cohere runtime when the provider string is ambiguous.
`format`	No	`cohere`	Wire format used for request/response translation. Keeptrusts auto-translates OpenAI-format client requests to Cohere v2 format.
`data_policy.training_opt_out`	No	`false`	When `true`, adds the `privacy_tier: "default"` header to opt out of Cohere model training. Set to `true` for any production or regulated workload.
`options.max_tokens`	No	Model default	Maximum number of tokens in the completion.
`options.temperature`	No	`0.3`	Sampling temperature (0–2). Lower values produce more deterministic outputs.
`options.p`	No	`0.75`	Nucleus sampling top-p. Applied alongside temperature.

Supported Models

Model	Context Window	Input (per 1M tokens)	Output (per 1M tokens)	Notes
`command-r-plus-08-2024`	128k	$2.50	$10.00	Best quality; recommended for complex RAG and tool-use pipelines
`command-r-08-2024`	128k	$0.15	$0.60	Balanced quality and cost; good default for most chat workloads
`command-r7b-12-2024`	128k	$0.0375	$0.15	Fast and cheapest; suited for high-throughput classification or extraction
`command-nightly`	128k	Varies	Varies	Latest experimental Command model; not recommended for production
`embed-english-v3.0`	512 tokens	$0.10	—	English-only embeddings; use with `input_type: search_document` or `search_query`
`embed-multilingual-v3.0`	512 tokens	$0.10	—	100+ languages; same dimensions as English variant
`rerank-english-v3.0`	4096	$2.00 / 1k searches	—	Reranking only; pass candidate documents and a query

Cohere v2 schema differences

Cohere's v2 API uses message instead of messages, chat_history instead of message arrays, and max_tokens as max_new_tokens. Keeptrusts's format translation layer handles all conversions automatically — your OpenAI-format client code works without changes.

Client Examples

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # auth is handled by the gateway
)

# Chat completion — Cohere format is auto-detected and translated
response = client.chat.completions.create(
    model="command-r-plus-08-2024",
    messages=[
        {
            "role": "system",
            "content": "You are a precise assistant. Cite your sources.",
        },
        {
            "role": "user",
            "content": "Summarise the key provisions of the EU AI Act.",
        },
    ],
    max_tokens=1024,
    temperature=0.3,
)
print(response.choices[0].message.content)

# Embeddings
embed_response = client.embeddings.create(
    model="embed-english-v3.0",
    input=["What is retrieval-augmented generation?"],
)
print(f"Dimensions: {len(embed_response.data[0].embedding)}")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

// Chat completion
const response = await client.chat.completions.create({
  model: "command-r-plus-08-2024",
  messages: [
    {
      role: "system",
      content: "You are a precise assistant. Cite your sources.",
    },
    {
      role: "user",
      content: "Summarise the key provisions of the EU AI Act.",
    },
  ],
  max_tokens: 1024,
  temperature: 0.3,
});
console.log(response.choices[0].message.content);

// Embeddings
const embedResponse = await client.embeddings.create({
  model: "embed-english-v3.0",
  input: ["What is retrieval-augmented generation?"],
});
console.log(`Dimensions: ${embedResponse.data[0].embedding.length}`);

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "command-r-plus-08-2024",
    "messages": [
      {"role": "system", "content": "You are a precise assistant."},
      {"role": "user", "content": "Summarise the key provisions of the EU AI Act."}
    ],
    "max_tokens": 1024,
    "temperature": 0.3
  }'

# Embeddings
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embed-english-v3.0",
    "input": ["What is retrieval-augmented generation?"]
  }'

Streaming

Cohere supports server-sent event (SSE) streaming. Keeptrusts intercepts each text-generation event to apply real-time content policies before forwarding the token to the client. Set stream: true (or True) in your request:

Python
Node.js
cURL

with client.chat.completions.stream(
    model="command-r-plus-08-2024",
    messages=[{"role": "user", "content": "Explain quantum entanglement step by step."}],
    max_tokens=2048,
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

const stream = await client.chat.completions.stream({
  model: "command-r-plus-08-2024",
  messages: [{ role: "user", content: "Explain quantum entanglement step by step." }],
  max_tokens: 2048,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(delta);
}

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "command-r-plus-08-2024",
    "messages": [{"role": "user", "content": "Explain quantum entanglement step by step."}],
    "max_tokens": 2048,
    "stream": true
  }'

Streaming is compatible with all policy types. Redaction policies apply to buffered intermediate chunks; blocking policies halt the stream immediately and return a 403 with a policy violation envelope.

Advanced Configuration

Training opt-out and data privacy

Cohere's Privacy Tier controls whether your prompts are used to train future models. Set data_policy.training_opt_out: true for all regulated or sensitive workloads:

pack:
  name: cohere-providers-6
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-private
    provider: cohere:chat:command-r-plus-08-2024
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Production requirement

Cohere's default tier may allow training use. Always set training_opt_out: true in any production, customer-data, or regulated environment.

RAG and grounded generation

Command R+ has native tool-use and document-grounding capabilities. Use Keeptrusts's citation-verifier policy alongside grounded prompts to validate that model citations are present before returning responses:

policy:
  citation-verifier:
    require_sources: true
    require_source_match: true
    min_groundedness: 0.85
providers:
  targets:
  - id: cohere-rag
    provider: cohere:chat:command-r-plus-08-2024
    secret_key_ref:
      env: COHERE_API_KEY
pack:
  name: cohere-example-7
  version: 1.0.0
  enabled: true
policies:
  chain:
  - citation-verifier

Multi-model fallback

Route to the cheaper Command R7B model if the Command R+ quota is exhausted or latency is high:

pack:
  name: cohere-providers-8
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: cohere-primary
    provider: cohere:chat:command-r-plus-08-2024
    secret_key_ref:
      env: COHERE_API_KEY
  - id: cohere-fallback
    provider: cohere:chat:command-r7b-12-2024
    secret_key_ref:
      env: COHERE_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Best Practices

Always enable training_opt_out — Cohere's default tier uses prompts for training. Set data_policy.training_opt_out: true on every production target to ensure your data is never used for model improvement without explicit consent.
Use fully-qualified provider IDs — Prefer "cohere:chat:command-r-plus-08-2024" over the bare "cohere" shorthand. The fully-qualified form pins the model version and avoids unintended upgrades when Cohere changes their default.
Apply prompt-injection detection — Cohere Command R+ supports tool use and connectors, which expands the surface area for indirect prompt injection. Include the prompt-injection policy with threshold: 0.8 in every chain that uses tools or document grounding.
Set explicit max_tokens — Cohere models can generate very long responses. Capping max_tokens prevents unexpectedly large API bills and keeps policy evaluation latency predictable.
Use domain-appropriate embed models — For search and RAG, use embed-english-v3.0 with input_type: search_document when indexing and input_type: search_query when querying. Mismatched input types degrade retrieval quality.
Chain citation-verifier for RAG workloads — When using Command R+ with grounded generation, add the citation-verifier policy to ensure responses cite evidence and meet your minimum grounded-content threshold before reaching end users.

For AI systems

Canonical terms: Keeptrusts gateway, Cohere, Command R+, Command R, Embed, Rerank, provider target, policy-config.yaml, provider: "cohere".
Config field names: provider, model, base_url: "https://api.cohere.ai", secret_key_ref.env: "COHERE_API_KEY", format, provider_type: "cohere".
Key behavior: Keeptrusts translates between OpenAI format and Cohere's native chat/generate API. Supports both chat and RAG (grounded generation) workloads.
Best next pages: Anthropic integration, Voyage integration (embeddings), Policy configuration.

For engineers

Prerequisites: Cohere API key (COHERE_API_KEY env var from dashboard.cohere.com), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"command-r-plus","messages":[{"role":"user","content":"hello"}]}'.
For RAG workloads with grounded generation, add citation-verifier policy to enforce minimum grounded-content thresholds.
Cohere provides separate endpoints for chat, embed, and rerank — configure separate provider targets for each capability.

For leaders

Cohere's Command R+ excels at RAG workloads with native grounded generation — citations are built into responses, enabling verifiable AI outputs.
The citation-verifier policy enforces that responses cite evidence, addressing regulatory requirements for traceable AI decisions.
Cohere offers enterprise data protection agreements and does not train on customer data by default.
Embed and Rerank models support vector search pipelines — Keeptrusts provides audit logging across the full retrieval-augmented stack.

Next steps

Voyage integration — dedicated embedding models for vector search
Anthropic integration — alternative high-quality reasoning models
Provider routing strategies — separate targets for chat vs embedding workloads
Policy configuration — citation-verifier and PII redaction reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Minimal configuration​

Full named configuration with policy chain​

Balanced model (cost-optimised)​

Fast/cheap model for high-throughput workloads​

Embeddings endpoint​

Start the gateway​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Training opt-out and data privacy​

RAG and grounded generation​

Multi-model fallback​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​