Runtime Request Families

Keeptrusts exposes public proxy, discovery, WebSocket, and MCP routes so clients can keep their native request shape while using one gateway. This page compares the primary text, embedding, and audio proxy families; it is not an exhaustive route inventory. Governance coverage is route-specific, so do not assume that every transport executes the same policy path.

Primary text, embedding, and audio routes

Family	Route	Use it for	Response style
Chat completions	`POST /v1/chat/completions`	OpenAI-style chat clients	JSON or streaming
Responses	`POST /v1/responses`	New text integrations that prefer the Responses-style shape	JSON or streaming
Messages	`POST /v1/messages`	Anthropic-style message envelopes and content blocks	JSON or streaming
Embeddings	`POST /v1/embeddings`	Vector generation	JSON
Audio transcription	`POST /v1/audio/transcriptions`	Speech-to-text	JSON
Audio speech	`POST /v1/audio/speech`	Text-to-speech	Binary audio

Other live gateway surfaces

Surface	Route	Owning guidance
Legacy text completions	`POST /v1/completions`	Legacy compatibility route; do not infer Chat/Responses governance parity
Unified Access chat	`POST /v1/unified/chat/completions`	Unified Access
Moderation	`POST /v1/moderations`	Runtime moderation configuration
Model discovery	`GET /v1/models`, `GET /v1/models/:model_id`	Runtime models configuration
WebSocket proxy	`GET /v1/chat/completions/ws`, `GET /v1/responses/ws`	WebSocket Proxy
Published-host MCP	`GET /mcp`, `POST /mcp`	Gateway MCP Surface

These surfaces keep their owning authentication, publication, routing, and governance boundaries. Their presence does not make the Chat/Responses policy path universal.

How to choose between text families

Use /v1/responses for new text-family work when you want the Responses-style request and output model.
Use /v1/chat/completions when the client already expects the OpenAI chat-completions shape.
Use /v1/messages only when the caller requires the Anthropic Messages shape and the current reduced governance boundary is acceptable.

Chat Completions and Responses execute the gateway's input and output policy path. Messages currently authenticates the client, performs connected billing and provider routing, and proxies the Anthropic-shaped request, but it does not execute that Chat/Responses policy evaluation path. Use Messages only when that current governance boundary is acceptable.

Route shape and provider readiness are separate. The public Messages-shaped route exists, but the native Anthropic provider adapter is not currently adoption-ready. Do not infer provider support or policy parity from the route's presence.

For streaming details and the related WebSocket frame-level limitation, see Streaming with SSE.

Structured outputs

On the OpenAI-style Chat Completions and Responses routes, structured outputs use response_format in the request body. The Anthropic-style Messages route does not currently advertise json_object or json_schema response-format capability; do not add this OpenAI field to a Messages request.

{
  "response_format": {
    "type": "json_object"
  }
}

Or with a named schema:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "order_summary",
      "strict": true,
      "schema": {
        "type": "object"
      }
    }
  }
}

Request-time cache and session hints

Chat Completions and Responses support request-time cache and session hints. Do not infer equivalent handling on Messages from the shared field names.

Field	Location	What it does
`cache_control`	Request body	Requests prompt-caching behavior when runtime policy allows it
`session_id`	Request body	Groups related requests onto the same session lane
`x-session-id`	Request header	Header form of the same session identifier

If both session_id and x-session-id are present, they must match.

What is configured elsewhere

Some runtime behavior is not chosen per request. Use runtime configuration docs for:

default provider and fallback behavior
privacy and retention settings
organization-level plugin defaults
evaluation and traffic-mirroring controls

See Runtime Configuration for those defaults.

Primary text, embedding, and audio routes​

Other live gateway surfaces​

How to choose between text families​

Structured outputs​

Request-time cache and session hints​

What is configured elsewhere​

Next steps​