Runtime request families
Keeptrusts freezes a narrow launch set of runtime request families so the gateway, console, and operators all reason about the same request vocabulary. Use this page when you need the canonical split between text families, embeddings, audio, prompt caching, plugin governance, silent engine, and optional shadow routing.
Use this page when
- You need to choose between
chat_completions,responses,messages, embeddings, transcription, or speech. - You are wiring structured outputs, prompt caching,
session_id, or plugin governance and need the supported launch contract. - You are deciding whether a workflow belongs in runtime routing, runtime privacy, or a separate optional shadow-evaluation lane.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Launch request-family map
| Family | Canonical route | Use it for | Transport |
|---|---|---|---|
chat_completions | POST /v1/chat/completions | OpenAI-style chat clients that already speak chat-completions | JSON or SSE |
responses | POST /v1/responses | The canonical text-family route for new integrations, structured outputs, and normalized provider behavior | JSON or SSE |
messages | POST /v1/messages | Anthropic-style message envelopes and content blocks | JSON or SSE |
embeddings | POST /v1/embeddings | Vector generation only | JSON |
audio_transcriptions | POST /v1/audio/transcriptions | Speech-to-text from uploaded audio | JSON |
audio_speech | POST /v1/audio/speech | Text-to-speech binary audio output | Binary audio |
Launch scope does not imply support for responses_compact, batches, generic files/containers, vector-store parity, A2A, or realtime/WebRTC.
Choosing between chat_completions, responses, and messages
- Use
responsesfor new text-family work. It is the canonical request family and the single normalized route for Responses-style behavior. - Use
chat_completionswhen you are integrating an OpenAI-compatible client that already expects chat-completions payloads. - Use
messageswhen your caller already uses Anthropic-style content blocks such astext,image,document,tool_use,tool_result, orthinking.
Structured outputs
Structured outputs use the top-level response_format field across the text families.
| Form | Shape | Notes |
|---|---|---|
| JSON object | { "type": "json_object" } | Caller wants valid JSON output without a named schema |
| JSON schema | { "type": "json_schema", "json_schema": { "name": "...", "strict": true, "schema": {} } } | Keeps the schema name, strict=true, and full schema object intact |
Key rules:
json_objectandjson_schemaare separate capabilities.- Unsupported schema or strict-mode requests fail closed before provider dispatch.
- Streaming structured outputs emit cumulative valid JSON snapshots. If the gateway cannot keep emitting valid JSON, it terminates through the structured-output error path instead of guessing broken deltas.
Prompt caching and session_id
Request-time prompt caching is separate from the broader semantic/exact response cache backends.
| Field | Where it lives | What it does |
|---|---|---|
cache_control | top-level request field | Lets a caller request prompt-caching behavior when org defaults allow it |
session_id | top-level request field | Lets the gateway keep related requests on the same sticky-routing lane |
x-session-id | request header alias | Header alias for the same semantic value as session_id |
Rules:
- If both body
session_idandx-session-idare present, they must match. session_idacceptance is controlled by runtime-routing defaults.- Sticky routing and cache locality scope by organization, gateway, and caller boundary.
Plugin governance
Request-time plugins use stable hyphenated IDs:
pdf-inputsresponse-healingcontext-compressionweb-search
Runtime routing settings add organization-level governance on top:
plugin_governance.defaults[]plugin_governance.forced_on[]plugin_governance.prevent_overrides[]
Important behavior:
- Request-time plugins are still capability-gated.
- Forced-on plugins can add required behavior, but privacy rules may still disable or reject them.
- Silent engine can block privacy-incompatible plugins such as
web-searchor replay-style response healing.
Embeddings, transcription, and speech
Embeddings
Use POST /v1/embeddings when you need vectors instead of generated text. The family returns embedding_vector output only.
Audio transcription
Use POST /v1/audio/transcriptions for speech-to-text.
- Accepted launch input formats:
wav,mp3,mp4,mpeg,mpga,m4a,webm,ogg - Decoded input ceiling:
25 MiB - The request must pass auth, ABAC, format, and capability checks before dispatch
Audio speech
Use POST /v1/audio/speech for text-to-speech.
- Supported output formats:
mp3,wav,opus,aac,flac,pcm - Maximum text input size:
16,384UTF-8 bytes - The route returns binary audio only after full preflight succeeds
Runtime routing and privacy settings
The console exposes two operator pages for these controls:
- Settings → Runtime routing (
/settings/runtime-routing) - Settings → Runtime privacy (
/settings/runtime-privacy)
Runtime routing manages:
- default provider policy (
allow_fallbacks,require_parameters,data_collection,zdr) - prompt-caching defaults (
allow_cache_control,sticky_routing,allow_session_id,session_header_name) - plugin governance defaults
- optional shadow-routing defaults
Runtime privacy manages silent engine separately. Runtime routing does not redefine silent-engine ownership.
Silent engine behavior
When silent_engine.enabled=true, Keeptrusts disables optional telemetry and persistence, including:
- callbacks
- history persistence
- learning persistence
- gateway telemetry pushes
- payload logging
- citation writeback
Silent engine still preserves the minimum enforcement state:
- auth
- ABAC
- token expiry
- token budget counters
- token request counters
- rate limits
Optional shadow routing
Shadow routing is an optional evaluation lane, not a second primary runtime path.
- Primary response selection, success/failure semantics, and client-visible payloads stay unchanged.
- Launch scope uses
evaluation_mode=asynchronous. - Capture mode is normally
metadata_only, and privacy rules may downgrade or reject richer capture. - Shadow routing is separate from explicit traffic mirroring and should not be treated as proof that a second provider path is ready for general production traffic.
For AI systems
- Canonical terms: Keeptrusts runtime families,
responses,messages,chat_completions,embeddings,audio_transcriptions,audio_speech,response_format,cache_control,session_id,x-session-id, plugin governance, silent engine, shadow routing. - Prefer
responsesfor new text-family integrations. - Runtime-routing settings and silent-engine settings are separate surfaces.
For engineers
- Start new text-family integrations on
POST /v1/responsesunless an existing SDK already requireschat_completionsormessages. - Treat structured outputs, prompt caching, plugin governance, and audio preflight as fail-closed features. Unsupported capability metadata should block the request instead of downgrading silently.
- Use Runtime routing for org defaults and Runtime privacy for silent engine. Do not model silent engine as just another runtime-routing preset.
For leaders
- The launch family split keeps the runtime surface predictable: text, vectors, and audio each have their own clear contract.
- Silent engine provides a stronger privacy floor without weakening auth or spend enforcement.
- Optional shadow routing is useful for evaluation, but it is intentionally separate from launch readiness so experiments do not destabilize the primary production lane.