How the Keeptrusts Gateway Intercepts LLM Traffic in Real Time
The Keeptrusts Gateway intercepts LLM traffic in real time by becoming the endpoint your application calls instead of the upstream provider. Every request reaches the gateway first, runs through the configured input policy chain, goes upstream only if it passes, and then returns through the output path where Keeptrusts can redact, block, score, or log the result before your caller receives it.
Use this page when
- You need a clear mental model for what the gateway is doing on each request.
- You are migrating an app from direct provider calls to a governed endpoint.
- You want to explain to your team why Keeptrusts is an inline control, not a reporting layer.
Primary audience
- Primary: Technical Engineers
- Secondary: Technical Leaders, AI Agents
The problem
Most teams start with direct provider calls. A service points an SDK at OpenAI, Anthropic, or another model provider, ships a bearer token, and gets a response back. That works for a prototype, but it leaves no central enforcement point. If you want to block prompt injection, redact PII, route only to approved providers, or keep an audit trail, every application has to solve those problems separately.
That fragmentation is the operational problem Keeptrusts is built to remove. A direct provider call can only be controlled in three weak places: in application code before the request, in provider settings you do not fully own, or after the fact in logs. The first option creates duplicated logic in every app. The second is provider-specific. The third is too late because the sensitive data or unsafe prompt has already left your boundary.
Real-time governance requires a choke point. It has to sit on the request path, not next to it. It also has to understand the request before the upstream call and the response before it goes back to the caller. That is what the gateway provides.
The solution
Keeptrusts exposes an OpenAI-compatible gateway endpoint. Your application keeps making normal chat or embeddings requests, but it sends them to the gateway instead of directly to the provider. The gateway loads policy-config.yaml, identifies the matching provider target, evaluates the configured policy chain, and decides what happens next.
At runtime, the flow is straightforward:
- The client sends a request to the gateway.
- The gateway evaluates request or input policies such as
prompt-injection,pii-detector,rbac, orsafety-filter. - If a policy blocks, the request ends there and the provider is never called.
- If the request passes, the gateway forwards it to the selected provider target.
- When the provider responds, Keeptrusts applies output behavior such as buffered redaction, output checks, rewrites, or audit capture.
- The caller receives the governed response, and the platform records the decision event.
That is why the docs treat the gateway as the primary runtime surface and Config-First Workflow as the operating model. You define behavior once in YAML, then every app using that endpoint gets the same enforcement.
The practical benefit is not just security. You also get consistent rollout. The same policy-config.yaml can be linted, tested, run locally with the CLI, versioned in Configurations, and then deployed to a hosted gateway. The application integration stays almost unchanged: in most cases you change the base URL and credentials, not the whole calling pattern.
Implementation
The smallest useful gateway config is one provider target plus a short policy chain. The example below is consistent with the gateway-first-run docs and the policy overview pages.
pack:
name: first-gateway
version: 0.1.0
enabled: true
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-5.4-mini-mini
base_url: https://api.openai.com
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- prompt-injection
- pii-detector
- audit-logger
policy:
prompt-injection:
use_embedding: false
detection:
attack_patterns:
- "ignore.*previous.*instructions"
- "forget.*system.*prompt"
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true
pii-detector:
action: redact
redaction:
marker_format: label
include_metadata: true
audit-logger:
retention_days: 30
Start the gateway and send a request through it:
kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4-mini-mini",
"messages": [
{"role": "user", "content": "What does this gateway enforce before a provider call is made?"}
]
}' | jq .
After that request, three things are true at once.
First, the application never talked to the provider directly. The network path now runs through the gateway. Second, the gateway applied the configured policies before forwarding the request. Third, the response came back through the same control point, so output behavior and auditing remained inline rather than best effort.
If you want to prove interception beyond a successful response, the first-run tutorial also recommends checking the health and active config endpoints while the gateway is running. That confirms the gateway is live and loaded with the config you think it is using.
curl -s http://localhost:41002/health | jq .
curl -s http://localhost:41002/keeptrusts/config | jq .
For production, the same pattern holds. The credentials may move from local environment variables to hosted config variables, and the endpoint may sit behind your own load balancer, but the core interception model does not change: the governed request path is the product.
Results and impact
When the gateway is in front of your LLM traffic, governance stops being advisory. A blocked request returns a response immediately and never consumes provider quota. A redacted request is sanitized before the upstream sees it. An audited request produces runtime evidence tied to the same call path that served the application.
That also changes rollout speed. Teams no longer need every application to reimplement prompt injection checks, PII scrubbing, and audit hooks. The integration surface becomes narrower: point the app at the gateway, keep your config under review, and update governance in one place.
Operationally, this is the difference between observing AI usage and governing it. Keeptrusts is not waiting for logs to arrive after the fact. It is making decisions on the live path.
Key takeaways
- Keeptrusts intercepts traffic by sitting inline as the endpoint the application calls.
- Input policies run before the provider call, so block and redact decisions happen before data leaves your boundary.
- Output behavior runs on the way back, so response checks and redactions stay inline too.
- The gateway makes config-first governance practical because one YAML contract can control every integrated app.
- Real-time interception is what turns policy from documentation into enforcement.