Skip to main content

Zero-Trust AI: Never Trust the Prompt, Always Verify

Zero-Trust AI: Never Trust the Prompt, Always Verify

Zero-trust AI starts with one assumption: a prompt is not a trusted object. It may contain adversarial instructions, hidden identifiers, leaked secrets, or requests that are safe only for a restricted provider path. Keeptrusts turns that assumption into a policy chain so every request is verified before it reaches a model and before cost is committed upstream.

Use this page when

  • You need a direct implementation model for zero-trust AI instead of a high-level security slogan.
  • You are hardening AI requests against prompt injection, data leaks, and non-compliant provider routing.
  • You want to connect request verification to session boundaries and spend controls.

Primary audience

  • Primary: Technical Engineers
  • Secondary: Technical Leaders, AI Agents

The problem

Traditional application security assumes the app defines the logic and the user supplies data. AI systems blur that line. The prompt can try to change instructions, alter tool use, reveal hidden content, or coerce the system into leaking secrets. That means the request itself becomes part of the attack surface.

Teams often respond with a single guardrail, usually a prompt-injection regex or a generic moderation API. That is not enough. If prompt injection is the only control, a prompt that slips through can still carry PII, payment data, or internal codenames to the provider. If DLP is the only control, an adversarial prompt can still attempt boundary confusion or role spoofing. If provider routing is the only control, you may keep data in the right place while still forwarding content that never should have left the gateway at all.

Zero-trust AI therefore has to answer multiple questions for every request.

  • Is the prompt trying to bypass or rewrite instructions?
  • Does it contain identifiers that should be redacted?
  • Does it contain terms or patterns that should cause a hard block?
  • Is there still at least one provider target allowed to receive the remaining request?
  • Is the request operating inside an authenticated, server-controlled session boundary?

If you cannot answer all five at runtime, your AI stack is not zero trust. It is selective trust.

The solution

Keeptrusts gives you a layered answer.

At the request boundary, Implement Zero-Trust AI with Defense-in-Depth Policies describes why controls should be chained, not collapsed into one stage. In practice, the first line is the Prompt Injection Detection policy, which normalizes text, checks attack patterns, detects fake boundaries and delimiter confusion, and can add embedding-based similarity detection. Importantly, the current runtime blocks on detection. There is no "warn-only" mode for this policy.

Next comes data minimization. PII Detector redacts common identifiers and can add PCI or healthcare heuristics when needed. DLP Filter blocks organization-specific terms and patterns that the platform cannot know in advance.

Then comes provider enforcement. Data Routing Policy filters provider targets based on declared retention, training, and locality guarantees. If no compliant provider remains, the request should fail instead of being sent somewhere merely convenient.

Zero trust also applies outside the prompt itself. In the Keeptrusts console architecture, the browser never sees the upstream API bearer token. Authenticated browser traffic goes through server-side BFF routes, and mutating console requests use the x-keeptrusts-csrf-token header. That matters because zero trust is not only about prompt content. It is also about not exposing more authority to the client than the client should hold.

Implementation

The following configuration expresses the request-side core of a zero-trust chain.

pack:
name: zero-trust-request-boundary
version: "1.0.0"
enabled: true

providers:
targets:
- id: zdr-primary
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0
in_memory_only: true
sanitized: true
accepts_tokenized_input: true
allow_internet_egress: false
local_only_processing: true

policies:
chain:
- prompt-injection
- dlp-filter
- pii-detector
- data-routing-policy

policy:
prompt-injection:
use_embedding: true
detection:
embedding_threshold: 0.8
attack_patterns:
- "ignore.*previous.*instructions"
- "forget.*system.*prompt"
- "reveal.*system.*prompt"
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true

dlp-filter:
detect_patterns:
- 'AKIA[0-9A-Z]{16}'
- 'ghp_[0-9A-Za-z]{36}'
- '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
blocked_terms:
- Project Atlas
- restricted merger room
action: block
fuzzy_matching: true
max_distance: 1
sensitivity_level: restricted

pii-detector:
action: redact
pci_mode: true
redaction:
marker_format: label
include_metadata: true

data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
on_no_compliant_provider: block
log_provider_selection: true

This chain is intentionally strict.

prompt-injection goes first because the gateway should stop clearly adversarial requests before it spends time on redaction or provider selection. dlp-filter follows because leaked credentials and restricted project terms are hard-block material. pii-detector then sanitizes legitimate requests that still contain personal or payment data. Finally, data-routing-policy ensures the remaining prompt can only travel through an allowed provider lane.

Operationally, you should verify the chain with live events, not just lint output:

kt events tail --filter "outcome=blocked" --since 30d --limit 20

That check tells you whether prompt-injection and DLP blocks are actually firing and whether the policy is too noisy. If you see a pattern of blocked requests that are legitimate, fix the local slice that is wrong. Do not weaken the whole model out of convenience.

This is also where Prevent Sensitive Data Leaks in AI Requests and Spend & Wallets intersect with zero trust. Data leakage is the content-side risk. Spend controls are the execution-side risk. If sensitive requests depend on a smaller, more expensive compliant provider pool, Keeptrusts reserves cost before dispatch and holds the request if balance is insufficient. That is aligned with zero trust because it prevents the common failure mode where governance is bypassed under cost or capacity pressure.

Results and impact

The first effect of a zero-trust chain is fewer assumptions. Engineers stop treating prompts as friendly input and start treating them as untrusted requests that must earn their path through the gateway.

The second effect is better failure behavior. Instead of one giant all-or-nothing control, each stage has a clear job. Prompt injection blocks adversarial requests. DLP blocks secrets and restricted phrases. PII detection redacts structured identifiers. Routing filters provider targets. When something fails, the reason is easier to understand and the fix is easier to target.

The third effect is architectural discipline. Once teams accept that the browser should not hold upstream bearer tokens and that session integrity needs explicit CSRF protection, zero trust stops being a content-only discussion. It becomes a whole-request discussion.

Key takeaways

  • Zero-trust AI means every prompt is untrusted until multiple controls say otherwise.
  • Put Prompt Injection Detection first, because adversarial input should not reach later stages.
  • Use DLP Filter for hard-block content and PII Detector for structured identifier redaction.
  • Use Data Routing Policy so compliant provider selection is enforced, not assumed.
  • Extend zero trust beyond prompt text by keeping upstream authority server-side and using session-integrity controls.

Next steps