Gateway Runtime Features
Keeptrusts’s gateway is where runtime enforcement actually happens. The customer docs should make that explicit because many important features live in gateway behavior, not just in the browser.
Use this page when
- You need to understand what the Keeptrusts gateway runtime does at request time — policy enforcement, provider routing, failover, caching, redaction, and tracing.
- You are configuring multi-provider fallback, data-routing policies, fail mode, or agent context planning.
- You want to verify runtime behavior through gateway endpoints like
/keeptrusts/config,/keeptrusts/providers/metrics, or/keeptrusts/config/reload.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Core runtime role
The gateway sits between your application and the upstream model provider. It can:
- Enforce policies on input, tool, and output phases.
- Emit events to the control plane.
- Export OTLP traces.
- Cache successful upstream responses.
- Apply fail-mode behavior when enforcement or upstream conditions require it.
Upstream connectivity features
The gateway supports:
- Explicit upstream URL configuration.
- Optional upstream API key configuration.
- Configurable upstream auth header and prefix.
- Empty auth prefixes for providers that expect raw key headers.
- Multi-provider fallback with ordered or latency-based routing.
- Provider selection constraints based on declared retention and training metadata.
The current explicit OpenAI-compatible upstream runtime subset includes aimlapi, alibaba, cloudera, cloudflare-gateway, litellm, localai, llamaApi, truefoundry, and xai, in addition to the previously documented OpenAI-compatible providers.
For cloudflare-gateway and cloudera, treat the configured base_url as part of the runtime contract rather than a cosmetic override:
cloudflare-gatewayexpects a Cloudflare AI Gateway base URL such ashttps://gateway.ai.cloudflare.com/v1/<account_id>/<gateway_id>/<provider>.clouderaexpects a Cloudera AI Inference endpoint URL and token-scoped auth.
Keeptrusts tracks several evaluator and script-style provider pages as explicit runtime owners, but kt gateway run still returns a clear 501 unsupported_execution_target for bare go, ruby, sequence, simulated-user, slack, and webhook providers because those flows require an explicit script target, a human loop, or an external service loop.
transformers:{text-generation|feature-extraction|embeddings}:... now has a native local runner path as well. The gateway invokes node and runs an embedded Transformers.js bridge that expects @huggingface/transformers to be installed in the configured working_dir or otherwise resolvable from the process environment. You can still override the contract with adapter_command if you need a custom local bridge.
Multi-provider fallback
For production resilience, configure multiple LLM providers in priority order. The gateway automatically falls back to the next provider when:
- Rate limits are hit (HTTP 429)
- Server errors occur (HTTP 5xx)
- Requests time out
- Content filters block the request
- Zero-token completions are detected
Provider and model pinning
Clients can override the normal routing strategy by sending request headers:
X-Keeptrusts-Provider: <target-id>— Route the request to a specific provider target declared in the gateway config.X-Keeptrusts-Model: <model-id>— Override the model within the selected (or routed) provider target.
Both headers are validated against the active config. Unknown or unauthorized values return 400 Bad Request. Pinning does not bypass policy evaluation, rate limits, or budget enforcement.
Key capabilities
- Ordered fallback: Try providers in the order you specify
- Latency-based routing: Automatically route to the fastest provider based on measured response times
- Context compression: Automatically compress conversation history when switching to a provider with lower context limits
- Zero-completion insurance: Retry on providers that return empty responses
Data-routing and Zero Data Retention controls
The gateway can apply a data-routing-policy before the fallback loop begins.
This policy works with operator-declared providers.targets[].data_policy metadata so you can express requirements like:
- route only to providers with
zero_data_retention: true - route only to providers with
training_opt_out: true - exclude providers whose
retention_daysexceeds a configured maximum
If no providers remain after filtering:
on_no_compliant_provider: blockreturns HTTP 403on_no_compliant_provider: warnlogs the situation and continues with the full target list
Viewing provider metrics
The gateway exposes provider metrics at GET /keeptrusts/providers/metrics.
- cURL
- Python
- Node.js
curl http://localhost:8080/keeptrusts/providers/metrics
import httpx
metrics = httpx.get("http://localhost:8080/keeptrusts/providers/metrics").json()
for provider in metrics:
print(f"{provider['id']}: p50_ttft={provider.get('p50_ttft_ms')}ms")
const metrics = await fetch("http://localhost:8080/keeptrusts/providers/metrics").then(r => r.json());
metrics.forEach(p => console.log(`${p.id}: p50_ttft=${p.p50_ttft_ms}ms`));
Inspecting the active config
- cURL
- Python
- Node.js
curl http://localhost:8080/keeptrusts/config
import httpx
config = httpx.get("http://localhost:8080/keeptrusts/config").json()
print(config)
const config = await fetch("http://localhost:8080/keeptrusts/config").then(r => r.json());
console.log(config);
Hot-reloading the config
- cURL
- Python
- Node.js
curl -X POST http://localhost:8080/keeptrusts/config/reload \
-H "Authorization: Bearer $KEEPTRUSTS_ADMIN_TOKEN"
import httpx
resp = httpx.post(
"http://localhost:8080/keeptrusts/config/reload",
headers={"Authorization": f"Bearer {admin_token}"},
)
print(resp.status_code, resp.json())
const resp = await fetch("http://localhost:8080/keeptrusts/config/reload", {
method: "POST",
headers: { Authorization: `Bearer ${gatewayApiToken}` },
});
console.log(resp.status, await resp.json());
The current console does not yet ship a dedicated Provider Metrics page, so customers should verify provider-routing behavior through:
- Configurations, to confirm target order and
data_policymetadata - Gateways and gateway action history, to confirm the running config
- Events, to confirm runtime outcomes after representative traffic
When a dedicated provider-metrics surface is available, it should show real-time health for each configured provider, including:
- p50 time-to-first-token (TTFT)
- p50 throughput (tokens/second)
- Sample count
- Recent fallback events with triggers
For ZDR-sensitive workflows, operators should also verify which targets are eligible under the running data-routing-policy, not only which targets are healthy.
Use these metrics to tune your provider order or adjust fallback triggers.
Fail mode
Customers can configure whether the gateway behaves in allow or block fail mode. This matters during operational planning because fail mode determines what happens when the policy chain cannot make a normal forward progress decision.
Sending a request through the gateway
The gateway is OpenAI-compatible. Point your client at the gateway address and send requests as usual.
- cURL
- Python
- Node.js
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $UPSTREAM_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize this document."}
]
}'
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-upstream-api-key",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize this document."},
],
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "your-upstream-api-key",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Summarize this document." },
],
});
console.log(response.choices[0].message.content);
Response caching
Successful non-empty upstream responses can be cached. Customers should know:
- Caching is enabled by default.
- The cache backend can vary by environment.
- Error and rate-limited responses are not cached.
- A cache buster value can force a fresh upstream retry for otherwise identical prompts.
- Cache keys include tenant identity fields (
org_id,team_id) so cross-tenant cache hits are structurally impossible in both single-instance and horizontally scaled hosted deployments.
Agent context planning
Before each request is forwarded to the upstream model, the gateway calls
the agent-context resolver to obtain an
AgentContextPlan for the active agent. The plan:
- Orders knowledge, memory, and learned-session items by priority and relevance.
- Respects the per-agent token budgets (
context_knowledge_budget,context_memory_budget,context_learned_session_budget). - Returns a stable
plan_hash— if the plan hasn't changed since the last request, the gateway can skip re-injection and serve from the cached plan.
The injected context block appears before the user prompt and passes through
the full policy chain (prompt injection, DLP, redaction) as if it were user
content. Token usage for the context block is tracked separately as
context_tokens in the history entry.
Missing-secret fail policy
When the hosted-gateway gateway cannot resolve a secret_key_ref.store
reference for a provider credential, the behavior is controlled by the
missing_secret_policy config field:
| Policy | Behavior |
|---|---|
fail_closed (default) | Request fails immediately with an error — no partial execution |
fail_open | Warning is logged and the request continues with a partial context; provider targets without resolved credentials are skipped |
OTLP tracing
Tracing is enabled automatically when the gateway has the API URL and API key it needs. Traces can be:
- Sent to Keeptrusts’s built-in receiver.
- Sent directly to another collector through an explicit endpoint override.
Provider spans inherit the inbound traceparent context and can carry GenAI-specific attributes such as provider, operation, and model.
Adaptive rate-limit controls
The runtime can surface retry and concurrency state back into gateway runtime views. That gives customers visibility into:
- Automatic retry behavior.
- Maximum retries.
- Provider-scoped concurrency limits.
- Cooldown timing.
Why runtime verification matters
The strongest customer habit is to connect runtime behavior back to Gateways, Configurations, and Events:
- Verify what the gateway is running.
- Send representative traffic.
- Inspect resulting events, traces, and escalations.
- If you depend on ZDR or no-training guarantees, confirm the running config still declares the correct provider
data_policymetadata.
Without that loop, a template or local policy file is just an intention, not a verified control.
For AI systems
- Canonical terms: Keeptrusts, gateway runtime, policy enforcement, provider routing, multi-provider fallback, latency-based routing, data-routing-policy, zero_data_retention, training_opt_out, fail mode, response caching, agent context planning, AgentContextPlan, OTLP tracing, hot-reload, provider pinning.
- Runtime endpoints:
GET /keeptrusts/config,POST /keeptrusts/config/reload,GET /keeptrusts/providers/metrics,POST /v1/chat/completions. - Request headers:
X-Keeptrusts-Provider,X-Keeptrusts-Model. - Config fields:
on_no_compliant_provider,missing_secret_policy,context_knowledge_budget,context_memory_budget. - Related pages: Gateways and Actions, Declarative Config Reference, Events, Access Keys & Gateway Keys.
For engineers
- The gateway is OpenAI-compatible — point any OpenAI SDK at
http://localhost:8080/v1with your upstream API key. - Verify the running config with
GET /keeptrusts/config. Hot-reload after changes withPOST /keeptrusts/config/reload. - Multi-provider fallback triggers on 429, 5xx, timeouts, content-filter blocks, and zero-token completions. Configure target order in
providers.targets[]. - Data-routing policy filters targets before the fallback loop. If all targets are filtered, behavior depends on
on_no_compliant_provider(block returns 403, warn continues). - Cache keys include
org_idandteam_id— cross-tenant cache hits are structurally impossible. - When
missing_secret_policyisfail_closed(default), unresolvedsecret_key_refvalues halt the request immediately.
For leaders
- The gateway is the enforcement point — policies are intentions until the gateway executes them at request time. Verify runtime state, not just config files.
- Data-routing policies enforce data sovereignty and zero-data-retention requirements at the routing layer, before data reaches a non-compliant provider.
- Fail mode (
allowvsblock) is an operating decision:blockprioritizes safety over availability;allowprioritizes availability over enforcement. Choose based on your risk tolerance. - Multi-provider fallback improves availability but increases the set of providers that may process data — ensure all fallback targets meet your compliance requirements.
- Agent context planning budgets control how much retrieved knowledge is injected per request, directly impacting token spend.