How the Keeptrusts Gateway Intercepts LLM Traffic in Real Time

The Keeptrusts Gateway intercepts LLM traffic in real time by becoming the endpoint your application calls instead of the upstream provider. Every request reaches the gateway first, runs through the configured input policy chain, goes upstream only if it passes, and then returns through the output path where Keeptrusts can redact, block, score, or log the result before your caller receives it.

Use this page when

You need a clear mental model for what the gateway is doing on each request.
You are migrating an app from direct provider calls to a governed endpoint.
You want to explain to your team why Keeptrusts is an inline control, not a reporting layer.

Primary audience

Primary: Technical Engineers
Secondary: Technical Leaders, AI Agents

The problem

Most teams start with direct provider calls. A service points an SDK at OpenAI, Anthropic, or another model provider, ships a bearer token, and gets a response back. That works for a prototype, but it leaves no central enforcement point. If you want to block prompt injection, redact PII, route only to approved providers, or keep an audit trail, every application has to solve those problems separately.

That fragmentation is the operational problem Keeptrusts is built to remove. A direct provider call can only be controlled in three weak places: in application code before the request, in provider settings you do not fully own, or after the fact in logs. The first option creates duplicated logic in every app. The second is provider-specific. The third is too late because the sensitive data or unsafe prompt has already left your boundary.

Real-time governance requires a choke point. It has to sit on the request path, not next to it. It also has to understand the request before the upstream call and the response before it goes back to the caller. That is what the gateway provides.

The solution

Keeptrusts exposes an OpenAI-compatible gateway endpoint. Your application keeps making normal chat or embeddings requests, but it sends them to the gateway instead of directly to the provider. The gateway loads policy-config.yaml, identifies the matching provider target, evaluates the configured policy chain, and decides what happens next.

At runtime, the flow is straightforward:

The client sends a request to the gateway.
The gateway evaluates request or input policies such as prompt-injection, pii-detector, rbac, or safety-filter.
If a policy blocks, the request ends there and the provider is never called.
If the request passes, the gateway forwards it to the selected provider target.
When the provider responds, Keeptrusts applies output behavior such as buffered redaction, output checks, rewrites, or audit capture.
The caller receives the governed response, and the platform records the decision event.

That is why the docs treat the gateway as the primary runtime surface and Config-First Workflow as the operating model. You define behavior once in YAML, then every app using that endpoint gets the same enforcement.

The practical benefit is not just security. You also get consistent rollout. The same policy-config.yaml can be linted, tested, run locally with the CLI, versioned in Configurations, and then deployed to a hosted gateway. The application integration stays almost unchanged: in most cases you change the base URL and credentials, not the whole calling pattern.

Implementation

The smallest useful gateway config is one provider target plus a short policy chain. The example below is consistent with the gateway-first-run docs and the policy overview pages.

pack:
  name: first-gateway
  version: 0.1.0
  enabled: true

providers:
  targets:
    - id: openai-primary
      provider: openai
      model: gpt-5.4-mini-mini
      base_url: https://api.openai.com
      secret_key_ref:
        env: OPENAI_API_KEY

policies:
  chain:
    - prompt-injection
    - pii-detector
    - audit-logger

policy:
  prompt-injection:
    use_embedding: false
    detection:
      attack_patterns:
        - "ignore.*previous.*instructions"
        - "forget.*system.*prompt"
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  pii-detector:
    action: redact
    redaction:
      marker_format: label
      include_metadata: true

  audit-logger:
    retention_days: 30

Start the gateway and send a request through it:

kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini-mini",
    "messages": [
      {"role": "user", "content": "What does this gateway enforce before a provider call is made?"}
    ]
  }' | jq .

After that request, three things are true at once.

First, the application never talked to the provider directly. The network path now runs through the gateway. Second, the gateway applied the configured policies before forwarding the request. Third, the response came back through the same control point, so output behavior and auditing remained inline rather than best effort.

If you want to prove interception beyond a successful response, the first-run tutorial also recommends checking the health and active config endpoints while the gateway is running. That confirms the gateway is live and loaded with the config you think it is using.

curl -s http://localhost:41002/health | jq .
curl -s http://localhost:41002/keeptrusts/config | jq .

For production, the same pattern holds. The credentials may move from local environment variables to hosted config variables, and the endpoint may sit behind your own load balancer, but the core interception model does not change: the governed request path is the product.

Results and impact

When the gateway is in front of your LLM traffic, governance stops being advisory. A blocked request returns a response immediately and never consumes provider quota. A redacted request is sanitized before the upstream sees it. An audited request produces runtime evidence tied to the same call path that served the application.

That also changes rollout speed. Teams no longer need every application to reimplement prompt injection checks, PII scrubbing, and audit hooks. The integration surface becomes narrower: point the app at the gateway, keep your config under review, and update governance in one place.

Operationally, this is the difference between observing AI usage and governing it. Keeptrusts is not waiting for logs to arrive after the fact. It is making decisions on the live path.

Key takeaways

Keeptrusts intercepts traffic by sitting inline as the endpoint the application calls.
Input policies run before the provider call, so block and redact decisions happen before data leaves your boundary.
Output behavior runs on the way back, so response checks and redactions stay inline too.
The gateway makes config-first governance practical because one YAML contract can control every integrated app.
Real-time interception is what turns policy from documentation into enforcement.

How the Keeptrusts Gateway Intercepts LLM Traffic in Real Time

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​