Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cohere

Cohere provides enterprise-grade large language models optimized for retrieval-augmented generation, tool use, and multilingual workloads. Keeptrusts gateways Cohere's v2 Chat API natively, auto-translating between OpenAI's request format and Cohere's distinct schema so existing client code requires no modification. All policy rules — PII redaction, prompt-injection blocking, audit logging — apply before each request reaches Cohere and before each response reaches your application.

Use this page when

  • You need the exact command, config, API, or integration details for Cohere.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  • A Cohere API key with access to Production tier
  • Keeptrusts CLI (kt) installed and on your PATH
  • COHERE_API_KEY exported in your shell or injected via your secrets manager

Configuration

Minimal configuration

pack:
name: cohere-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-command-r-plus
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Full named configuration with policy chain

pack:
name: cohere-enterprise
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- citation-verifier
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
entities:
- EMAIL
- PHONE
- SSN
- CREDIT_CARD
citation-verifier:
check_hallucinations: true
min_grounded_ratio: 0.8
audit-logger:
retention_days: 365
providers:
targets:
- id: cohere-command-r-plus
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY

Balanced model (cost-optimised)

pack:
name: cohere-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-command-r
provider: cohere:chat:command-r-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Fast/cheap model for high-throughput workloads

pack:
name: cohere-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-r7b
provider: cohere:chat:command-r7b-12-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Embeddings endpoint

pack:
name: cohere-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-embed
provider: cohere:embedding:embed-english-v3.0
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Start the gateway

export COHERE_API_KEY="your-production-api-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Provider Fields

FieldRequiredDefaultDescription
providerYesProvider identifier. Use short form "cohere" or fully-qualified "cohere:chat:<model>".
secret_key_refYesCOHERE_API_KEYName of the env var holding the Cohere API key. Auto-detected if set to the default name.
base_urlNohttps://api.cohere.com/v2Override the Cohere API base URL. Useful for gateways or private endpoints.
provider_typeNocohereForces the Cohere runtime when the provider string is ambiguous.
formatNocohereWire format used for request/response translation. Keeptrusts auto-translates OpenAI-format client requests to Cohere v2 format.
data_policy.training_opt_outNofalseWhen true, adds the privacy_tier: "default" header to opt out of Cohere model training. Set to true for any production or regulated workload.
options.max_tokensNoModel defaultMaximum number of tokens in the completion.
options.temperatureNo0.3Sampling temperature (0–2). Lower values produce more deterministic outputs.
options.pNo0.75Nucleus sampling top-p. Applied alongside temperature.

Supported Models

ModelContext WindowInput (per 1M tokens)Output (per 1M tokens)Notes
command-r-plus-08-2024128k$2.50$10.00Best quality; recommended for complex RAG and tool-use pipelines
command-r-08-2024128k$0.15$0.60Balanced quality and cost; good default for most chat workloads
command-r7b-12-2024128k$0.0375$0.15Fast and cheapest; suited for high-throughput classification or extraction
command-nightly128kVariesVariesLatest experimental Command model; not recommended for production
embed-english-v3.0512 tokens$0.10English-only embeddings; use with input_type: search_document or search_query
embed-multilingual-v3.0512 tokens$0.10100+ languages; same dimensions as English variant
rerank-english-v3.04096$2.00 / 1k searchesReranking only; pass candidate documents and a query
Cohere v2 schema differences

Cohere's v2 API uses message instead of messages, chat_history instead of message arrays, and max_tokens as max_new_tokens. Keeptrusts's format translation layer handles all conversions automatically — your OpenAI-format client code works without changes.

Client Examples

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by the gateway
)

# Chat completion — Cohere format is auto-detected and translated
response = client.chat.completions.create(
model="command-r-plus-08-2024",
messages=[
{
"role": "system",
"content": "You are a precise assistant. Cite your sources.",
},
{
"role": "user",
"content": "Summarise the key provisions of the EU AI Act.",
},
],
max_tokens=1024,
temperature=0.3,
)
print(response.choices[0].message.content)

# Embeddings
embed_response = client.embeddings.create(
model="embed-english-v3.0",
input=["What is retrieval-augmented generation?"],
)
print(f"Dimensions: {len(embed_response.data[0].embedding)}")

Streaming

Cohere supports server-sent event (SSE) streaming. Keeptrusts intercepts each text-generation event to apply real-time content policies before forwarding the token to the client. Set stream: true (or True) in your request:

with client.chat.completions.stream(
model="command-r-plus-08-2024",
messages=[{"role": "user", "content": "Explain quantum entanglement step by step."}],
max_tokens=2048,
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

Streaming is compatible with all policy types. Redaction policies apply to buffered intermediate chunks; blocking policies halt the stream immediately and return a 403 with a policy violation envelope.

Advanced Configuration

Training opt-out and data privacy

Cohere's Privacy Tier controls whether your prompts are used to train future models. Set data_policy.training_opt_out: true for all regulated or sensitive workloads:

pack:
name: cohere-providers-6
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-private
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Production requirement

Cohere's default tier may allow training use. Always set training_opt_out: true in any production, customer-data, or regulated environment.

RAG and grounded generation

Command R+ has native tool-use and document-grounding capabilities. Use Keeptrusts's citation-verifier policy alongside grounded prompts to validate that model citations are present before returning responses:

policy:
citation-verifier:
require_sources: true
require_source_match: true
min_groundedness: 0.85
providers:
targets:
- id: cohere-rag
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
pack:
name: cohere-example-7
version: 1.0.0
enabled: true
policies:
chain:
- citation-verifier

Multi-model fallback

Route to the cheaper Command R7B model if the Command R+ quota is exhausted or latency is high:

pack:
name: cohere-providers-8
version: 1.0.0
enabled: true
providers:
targets:
- id: cohere-primary
provider: cohere:chat:command-r-plus-08-2024
secret_key_ref:
env: COHERE_API_KEY
- id: cohere-fallback
provider: cohere:chat:command-r7b-12-2024
secret_key_ref:
env: COHERE_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Best Practices

  1. Always enable training_opt_out — Cohere's default tier uses prompts for training. Set data_policy.training_opt_out: true on every production target to ensure your data is never used for model improvement without explicit consent.

  2. Use fully-qualified provider IDs — Prefer "cohere:chat:command-r-plus-08-2024" over the bare "cohere" shorthand. The fully-qualified form pins the model version and avoids unintended upgrades when Cohere changes their default.

  3. Apply prompt-injection detection — Cohere Command R+ supports tool use and connectors, which expands the surface area for indirect prompt injection. Include the prompt-injection policy with threshold: 0.8 in every chain that uses tools or document grounding.

  4. Set explicit max_tokens — Cohere models can generate very long responses. Capping max_tokens prevents unexpectedly large API bills and keeps policy evaluation latency predictable.

  5. Use domain-appropriate embed models — For search and RAG, use embed-english-v3.0 with input_type: search_document when indexing and input_type: search_query when querying. Mismatched input types degrade retrieval quality.

  6. Chain citation-verifier for RAG workloads — When using Command R+ with grounded generation, add the citation-verifier policy to ensure responses cite evidence and meet your minimum grounded-content threshold before reaching end users.

For AI systems

  • Canonical terms: Keeptrusts gateway, Cohere, Command R+, Command R, Embed, Rerank, provider target, policy-config.yaml, provider: "cohere".
  • Config field names: provider, model, base_url: "https://api.cohere.ai", secret_key_ref.env: "COHERE_API_KEY", format, provider_type: "cohere".
  • Key behavior: Keeptrusts translates between OpenAI format and Cohere's native chat/generate API. Supports both chat and RAG (grounded generation) workloads.
  • Best next pages: Anthropic integration, Voyage integration (embeddings), Policy configuration.

For engineers

  • Prerequisites: Cohere API key (COHERE_API_KEY env var from dashboard.cohere.com), kt CLI installed.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"command-r-plus","messages":[{"role":"user","content":"hello"}]}'.
  • For RAG workloads with grounded generation, add citation-verifier policy to enforce minimum grounded-content thresholds.
  • Cohere provides separate endpoints for chat, embed, and rerank — configure separate provider targets for each capability.

For leaders

  • Cohere's Command R+ excels at RAG workloads with native grounded generation — citations are built into responses, enabling verifiable AI outputs.
  • The citation-verifier policy enforces that responses cite evidence, addressing regulatory requirements for traceable AI decisions.
  • Cohere offers enterprise data protection agreements and does not train on customer data by default.
  • Embed and Rerank models support vector search pipelines — Keeptrusts provides audit logging across the full retrieval-augmented stack.

Next steps