Implement Zero-Trust AI with Defense-in-Depth Policies
A single security control always fails eventually. Zero-trust AI means no request is trusted by default — every request passes through multiple, independent security layers before reaching a model and again before reaching a user. Keeptrusts enforces defense-in-depth with chained policies across input, tool, and output phases.
Use this page when
- You want to implement defense-in-depth for AI with multiple independent security layers on input and output.
- You need to understand the full 8-layer policy chain model (network → identity → input → data → agent → output → audit).
- You are building a comprehensive security posture where no single control failure compromises the system.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What you'll achieve
- Layered input security — prompt injection, PII detection, DLP, and identity checks in sequence
- Layered output security — quality scoring, citation verification, and content filtering on responses
- Agent firewall — tool-level access control and session limits
- DLP enforcement — custom pattern matching for secrets, internal data, and proprietary content
- Network controls — IP allowlisting and bot detection at the gateway edge
The defense-in-depth model
Request arrives at gateway
┌─ Layer 1: Network controls (IP allowlist, bot detection)
├─ Layer 2: Identity (RBAC — team, role, authentication)
├─ Layer 3: Input safety (prompt injection, safety filter)
├─ Layer 4: Data protection (PII detector, DLP filter)
├─ Layer 5: Agent controls (agent firewall, tool validation)
├─ → Forward to upstream provider
├─ Layer 6: Output quality (quality scorer, citation verifier)
├─ Layer 7: Output safety (content filter, response rewriter)
└─ Layer 8: Audit (audit logger, event record)
Response delivered to caller
Each layer is independent. A failure in one layer doesn't compromise the others. If a prompt injection evades pattern detection, PII redaction still removes sensitive data. If PII detection misses a custom format, the DLP filter catches it.
Full defense-in-depth config
pack:
name: zero-trust-gateway
version: "1.0"
policies:
chain:
# Layer 2: Identity
- rbac
# Layer 3: Input safety
- prompt-injection
- safety-filter
# Layer 4: Data protection
- pii-detector
- dlp-filter
# Layer 5: Agent controls
- agent-firewall
# Layer 6–7: Output controls
- quality-scorer
- citation-verifier
# Layer 8: Audit
- audit-logger
policy:
rbac:
require_auth: true
deny_if_missing:
- role
- team
prompt-injection:
embedding_threshold: 0.8
response:
action: block
encoding:
decode_base64: true
normalize_unicode: true
boundaries:
enforce_delimiters: true
safety-filter:
categories:
- violence
- hate_speech
- self_harm
- sexual_content
action: block
pii-detector:
action: redact
redaction:
marker_format: label
categories:
- email
- phone
- ssn
- credit_card
dlp-filter:
patterns:
- name: api_keys
regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16}|ghp_[a-zA-Z0-9]{36})"
action: block
- name: internal_hostnames
regex: "[a-z0-9-]+\\.internal\\.yourco\\.com"
action: redact
- name: aws_arns
regex: "arn:aws:[a-z0-9-]+:[a-z0-9-]*:\\d{12}:"
action: redact
agent-firewall:
allowed_tools:
- search
- summarize
- retrieve_document
blocked_tools:
- execute_code
- shell_command
- delete_record
max_actions_per_session: 50
quality-scorer:
overall_min_score: 0.6
on_fail: escalate
citation-verifier:
mode: strict
min_grounding_score: 0.7
on_ungrounded: escalate
audit-logger:
retention_days: 365
immutable: true
Layer-by-layer breakdown
Network controls
Control which networks and clients can reach the gateway:
gateway:
ip_allowlist:
enabled: true
ranges:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
on_deny: block
bot_detection:
enabled: true
block_known_bots: true
require_user_agent: true
Prompt injection with layered detection
The prompt injection policy uses multiple detection methods in sequence:
- Pattern matching — known attack signatures
- Base64/Unicode decoding — encoded evasion attempts
- Delimiter enforcement — boundary escape attempts
- Embedding analysis — semantic similarity to known injection patterns
Each method is independent. An attack must evade all four to succeed.
DLP beyond PII
The dlp-filter catches organization-specific sensitive data that standard PII detectors miss:
| Pattern type | Example | Risk |
|---|---|---|
| API keys | sk-abc123..., AKIA... | Credential exposure |
| Internal hostnames | db.internal.yourco.com | Infrastructure leak |
| Cloud ARNs | arn:aws:s3:::... | Resource identification |
| Project codenames | PROJECT-ALPHA-2025 | Competitive intelligence |
| Internal IPs | 10.42.8.100 | Network topology leak |
Agent firewall
The agent-firewall is the zero-trust control for AI agents that call tools:
- Explicit allowlist — only approved tools can be called
- Explicit blocklist — dangerous tools are always denied
- Session limits — cap the total number of actions per session
- Cost limits — cap the dollar amount per session
Monitoring defense effectiveness
Track how each layer is performing:
# See which policies are blocking the most requests
kt events list \
--filter "action:blocked" \
--from "2025-04-01" \
--to "2025-04-30" \
--limit 100
# Export policy trigger breakdown
kt events export \
--from "2025-04-01" \
--to "2025-04-30" \
--format csv \
--output defense-report.csv
In the console Events page, filter by individual policies to see:
- How many requests each layer processes
- Block and escalation rates per policy
- False positive rates (blocks that get overridden in escalation review)
Quick wins
- Start with three layers —
rbac+prompt-injection+pii-detectorcovers identity, safety, and data - Add
dlp-filterfor API key patterns — catch the highest-risk data leak vector - Enable
agent-firewallwith an allowlist — block dangerous tool calls immediately - Set
audit-loggerto immutable — ensure your audit trail can't be tampered with
For AI systems
- Canonical terms: defense-in-depth, zero-trust, policy chain, input phase, output phase, rbac, prompt-injection, safety-filter, pii-detector, dlp-filter, agent-firewall, quality-scorer, citation-verifier, audit-logger.
- 8 layers: network (IP allowlist) → identity (RBAC) → input safety → data protection → agent controls → output quality → output safety → audit.
- Config: full
packwith all policies chained in thepolicies.chainarray. - Best next pages: Block Prompt Injection, Prevent Data Leaks, Govern AI Agents, Policy Controls Catalog.
For engineers
- Prerequisites: gateway running; layer policies incrementally (don’t enable all 8 layers at once in production).
- Start with
audit-logger+pii-detector, then addprompt-injection, thenagent-firewall. - Each layer is independent — a bypass in one layer doesn’t compromise the others.
- Validate: test each layer independently by sending a payload it should catch and confirming the correct action.
- Use the full zero-trust config YAML in this page as a reference for maximum-security deployments.
For leaders
- Zero-trust means no request is trusted by default — every interaction passes through multiple independent verification layers.
- Defense-in-depth eliminates single points of failure: if one control is bypassed, others still protect the system.
- This architecture satisfies NIST Zero Trust Architecture (SP 800-207) principles applied to AI workloads.
- Incremental layer deployment means you can adopt zero-trust progressively without disrupting existing traffic.
Next steps
- Block Prompt Injection Attacks — deep dive into injection defense
- Prevent Sensitive Data Leaks — PII and DLP controls
- Govern AI Agents — agent-specific security controls
- Policy Controls Catalog — full inventory of available controls
- Pass AI Compliance Audits — turn defense-in-depth into audit evidence