Gateway Runtime Features

Keeptrusts’s gateway is where runtime enforcement actually happens. The customer docs should make that explicit because many important features live in gateway behavior, not just in the browser.

Use this page when

You need to understand what the Keeptrusts gateway runtime does at request time — policy enforcement, provider routing, failover, caching, redaction, and tracing.
You are configuring multi-provider fallback, data-routing policies, fail mode, or agent context planning.
You want to verify runtime behavior through gateway endpoints like /keeptrusts/config, /keeptrusts/providers/metrics, or /keeptrusts/config/reload.

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Core runtime role

The gateway sits between your application and the upstream model provider. It can:

Enforce policies on input, tool, and output phases.
Emit events to the control plane.
Export OTLP traces.
Cache successful upstream responses.
Apply fail-mode behavior when enforcement or upstream conditions require it.

Upstream connectivity features

The gateway supports:

Explicit upstream URL configuration.
Optional upstream API key configuration.
Configurable upstream auth header and prefix.
Empty auth prefixes for providers that expect raw key headers.
Multi-provider fallback with ordered or latency-based routing.
Provider selection constraints based on declared retention and training metadata.

The current explicit OpenAI-compatible upstream runtime subset includes aimlapi, alibaba, cloudera, cloudflare-gateway, litellm, localai, llamaApi, truefoundry, and xai, in addition to the previously documented OpenAI-compatible providers.

For cloudflare-gateway and cloudera, treat the configured base_url as part of the runtime contract rather than a cosmetic override:

cloudflare-gateway expects a Cloudflare AI Gateway base URL such as https://gateway.ai.cloudflare.com/v1/<account_id>/<gateway_id>/<provider>.
cloudera expects a Cloudera AI Inference endpoint URL and token-scoped auth.

Keeptrusts tracks several evaluator and script-style provider pages as explicit runtime owners, but kt gateway run still returns a clear 501 unsupported_execution_target for bare go, ruby, sequence, simulated-user, slack, and webhook providers because those flows require an explicit script target, a human loop, or an external service loop.

transformers:{text-generation|feature-extraction|embeddings}:... now has a native local runner path as well. The gateway invokes node and runs an embedded Transformers.js bridge that expects @huggingface/transformers to be installed in the configured working_dir or otherwise resolvable from the process environment. You can still override the contract with adapter_command if you need a custom local bridge.

Multi-provider fallback

For production resilience, configure multiple LLM providers in priority order. The gateway automatically falls back to the next provider when:

Rate limits are hit (HTTP 429)
Server errors occur (HTTP 5xx)
Requests time out
Content filters block the request
Zero-token completions are detected

Provider and model pinning

Clients can override the normal routing strategy by sending request headers:

X-Keeptrusts-Provider: <target-id> — Route the request to a specific provider target declared in the gateway config.
X-Keeptrusts-Model: <model-id> — Override the model within the selected (or routed) provider target.

Both headers are validated against the active config. Unknown or unauthorized values return 400 Bad Request. Pinning does not bypass policy evaluation, rate limits, or budget enforcement.

Key capabilities

Ordered fallback: Try providers in the order you specify
Latency-based routing: Automatically route to the fastest provider based on measured response times
Context compression: Automatically compress conversation history when switching to a provider with lower context limits
Zero-completion insurance: Retry on providers that return empty responses

Data-routing and Zero Data Retention controls

The gateway can apply a data-routing-policy before the fallback loop begins.

This policy works with operator-declared providers.targets[].data_policy metadata so you can express requirements like:

route only to providers with zero_data_retention: true
route only to providers with training_opt_out: true
exclude providers whose retention_days exceeds a configured maximum

If no providers remain after filtering:

on_no_compliant_provider: block returns HTTP 403
on_no_compliant_provider: warn logs the situation and continues with the full target list

Viewing provider metrics

The gateway exposes provider metrics at GET /keeptrusts/providers/metrics.

cURL
Python
Node.js

curl http://localhost:8080/keeptrusts/providers/metrics

import httpx

metrics = httpx.get("http://localhost:8080/keeptrusts/providers/metrics").json()
for provider in metrics:
    print(f"{provider['id']}: p50_ttft={provider.get('p50_ttft_ms')}ms")

const metrics = await fetch("http://localhost:8080/keeptrusts/providers/metrics").then(r => r.json());
metrics.forEach(p => console.log(`${p.id}: p50_ttft=${p.p50_ttft_ms}ms`));

Inspecting the active config

cURL
Python
Node.js

curl http://localhost:8080/keeptrusts/config

import httpx

config = httpx.get("http://localhost:8080/keeptrusts/config").json()
print(config)

const config = await fetch("http://localhost:8080/keeptrusts/config").then(r => r.json());
console.log(config);

Hot-reloading the config

cURL
Python
Node.js

curl -X POST http://localhost:8080/keeptrusts/config/reload \
  -H "Authorization: Bearer $KEEPTRUSTS_ADMIN_TOKEN"

import httpx

resp = httpx.post(
    "http://localhost:8080/keeptrusts/config/reload",
    headers={"Authorization": f"Bearer {admin_token}"},
)
print(resp.status_code, resp.json())

const resp = await fetch("http://localhost:8080/keeptrusts/config/reload", {
  method: "POST",
  headers: { Authorization: `Bearer ${gatewayApiToken}` },
});
console.log(resp.status, await resp.json());

The current console does not yet ship a dedicated Provider Metrics page, so customers should verify provider-routing behavior through:

Configurations, to confirm target order and data_policy metadata
Gateways and gateway action history, to confirm the running config
Events, to confirm runtime outcomes after representative traffic

When a dedicated provider-metrics surface is available, it should show real-time health for each configured provider, including:

p50 time-to-first-token (TTFT)
p50 throughput (tokens/second)
Sample count
Recent fallback events with triggers

For ZDR-sensitive workflows, operators should also verify which targets are eligible under the running data-routing-policy, not only which targets are healthy.

Use these metrics to tune your provider order or adjust fallback triggers.

Fail mode

Customers can configure whether the gateway behaves in allow or block fail mode. This matters during operational planning because fail mode determines what happens when the policy chain cannot make a normal forward progress decision.

Sending a request through the gateway

The gateway is OpenAI-compatible. Point your client at the gateway address and send requests as usual.

cURL
Python
Node.js

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $UPSTREAM_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize this document."}
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-upstream-api-key",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize this document."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "your-upstream-api-key",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Summarize this document." },
  ],
});
console.log(response.choices[0].message.content);

Response caching

Successful non-empty upstream responses can be cached. Customers should know:

Caching is enabled by default.
The cache backend can vary by environment.
Error and rate-limited responses are not cached.
A cache buster value can force a fresh upstream retry for otherwise identical prompts.
Cache keys include tenant identity fields (org_id, team_id) so cross-tenant cache hits are structurally impossible in both single-instance and horizontally scaled hosted deployments.

Agent context planning

Before each request is forwarded to the upstream model, the gateway calls the agent-context resolver to obtain an AgentContextPlan for the active agent. The plan:

Orders knowledge, memory, and learned-session items by priority and relevance.
Respects the per-agent token budgets (context_knowledge_budget, context_memory_budget, context_learned_session_budget).
Returns a stable plan_hash — if the plan hasn't changed since the last request, the gateway can skip re-injection and serve from the cached plan.

The injected context block appears before the user prompt and passes through the full policy chain (prompt injection, DLP, redaction) as if it were user content. Token usage for the context block is tracked separately as context_tokens in the history entry.

Missing-secret fail policy

When the hosted-gateway gateway cannot resolve a secret_key_ref.store reference for a provider credential, the behavior is controlled by the missing_secret_policy config field:

Policy	Behavior
`fail_closed` (default)	Request fails immediately with an error — no partial execution
`fail_open`	Warning is logged and the request continues with a partial context; provider targets without resolved credentials are skipped

OTLP tracing

Tracing is enabled automatically when the gateway has the API URL and API key it needs. Traces can be:

Sent to Keeptrusts’s built-in receiver.
Sent directly to another collector through an explicit endpoint override.

Provider spans inherit the inbound traceparent context and can carry GenAI-specific attributes such as provider, operation, and model.

Adaptive rate-limit controls

The runtime can surface retry and concurrency state back into gateway runtime views. That gives customers visibility into:

Automatic retry behavior.
Maximum retries.
Provider-scoped concurrency limits.
Cooldown timing.

Why runtime verification matters

The strongest customer habit is to connect runtime behavior back to Gateways, Configurations, and Events:

Verify what the gateway is running.
Send representative traffic.
Inspect resulting events, traces, and escalations.
If you depend on ZDR or no-training guarantees, confirm the running config still declares the correct provider data_policy metadata.

Without that loop, a template or local policy file is just an intention, not a verified control.

For AI systems

Canonical terms: Keeptrusts, gateway runtime, policy enforcement, provider routing, multi-provider fallback, latency-based routing, data-routing-policy, zero_data_retention, training_opt_out, fail mode, response caching, agent context planning, AgentContextPlan, OTLP tracing, hot-reload, provider pinning.
Runtime endpoints: GET /keeptrusts/config, POST /keeptrusts/config/reload, GET /keeptrusts/providers/metrics, POST /v1/chat/completions.
Request headers: X-Keeptrusts-Provider, X-Keeptrusts-Model.
Config fields: on_no_compliant_provider, missing_secret_policy, context_knowledge_budget, context_memory_budget.
Related pages: Gateways and Actions, Declarative Config Reference, Events, Access Keys & Gateway Keys.

For engineers

The gateway is OpenAI-compatible — point any OpenAI SDK at http://localhost:8080/v1 with your upstream API key.
Verify the running config with GET /keeptrusts/config. Hot-reload after changes with POST /keeptrusts/config/reload.
Multi-provider fallback triggers on 429, 5xx, timeouts, content-filter blocks, and zero-token completions. Configure target order in providers.targets[].
Data-routing policy filters targets before the fallback loop. If all targets are filtered, behavior depends on on_no_compliant_provider (block returns 403, warn continues).
Cache keys include org_id and team_id — cross-tenant cache hits are structurally impossible.
When missing_secret_policy is fail_closed (default), unresolved secret_key_ref values halt the request immediately.

For leaders

The gateway is the enforcement point — policies are intentions until the gateway executes them at request time. Verify runtime state, not just config files.
Data-routing policies enforce data sovereignty and zero-data-retention requirements at the routing layer, before data reaches a non-compliant provider.
Fail mode (allow vs block) is an operating decision: block prioritizes safety over availability; allow prioritizes availability over enforcement. Choose based on your risk tolerance.
Multi-provider fallback improves availability but increases the set of providers that may process data — ensure all fallback targets meet your compliance requirements.
Agent context planning budgets control how much retrieved knowledge is injected per request, directly impacting token spend.

Use this page when​

Primary audience​

Core runtime role​

Upstream connectivity features​

Multi-provider fallback​

Provider and model pinning​

Key capabilities​

Data-routing and Zero Data Retention controls​

Viewing provider metrics​

Inspecting the active config​

Hot-reloading the config​

Fail mode​

Sending a request through the gateway​

Response caching​

Agent context planning​

Missing-secret fail policy​

OTLP tracing​

Adaptive rate-limit controls​

Why runtime verification matters​

For AI systems​

For engineers​

For leaders​

Next steps​