Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Implement Zero-Trust AI with Defense-in-Depth Policies

A single security control always fails eventually. Zero-trust AI means no request is trusted by default — every request passes through multiple, independent security layers before reaching a model and again before reaching a user. Keeptrusts enforces defense-in-depth with chained policies across input, tool, and output phases.

Use this page when

  • You want to implement defense-in-depth for AI with multiple independent security layers on input and output.
  • You need to understand the full 8-layer policy chain model (network → identity → input → data → agent → output → audit).
  • You are building a comprehensive security posture where no single control failure compromises the system.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

What you'll achieve

  • Layered input security — prompt injection, PII detection, DLP, and identity checks in sequence
  • Layered output security — quality scoring, citation verification, and content filtering on responses
  • Agent firewall — tool-level access control and session limits
  • DLP enforcement — custom pattern matching for secrets, internal data, and proprietary content
  • Network controls — IP allowlisting and bot detection at the gateway edge

The defense-in-depth model

Request arrives at gateway
┌─ Layer 1: Network controls (IP allowlist, bot detection)
├─ Layer 2: Identity (RBAC — team, role, authentication)
├─ Layer 3: Input safety (prompt injection, safety filter)
├─ Layer 4: Data protection (PII detector, DLP filter)
├─ Layer 5: Agent controls (agent firewall, tool validation)
├─ → Forward to upstream provider
├─ Layer 6: Output quality (quality scorer, citation verifier)
├─ Layer 7: Output safety (content filter, response rewriter)
└─ Layer 8: Audit (audit logger, event record)
Response delivered to caller

Each layer is independent. A failure in one layer doesn't compromise the others. If a prompt injection evades pattern detection, PII redaction still removes sensitive data. If PII detection misses a custom format, the DLP filter catches it.


Full defense-in-depth config

pack:
name: zero-trust-gateway
version: "1.0"

policies:
chain:
# Layer 2: Identity
- rbac
# Layer 3: Input safety
- prompt-injection
- safety-filter
# Layer 4: Data protection
- pii-detector
- dlp-filter
# Layer 5: Agent controls
- agent-firewall
# Layer 6–7: Output controls
- quality-scorer
- citation-verifier
# Layer 8: Audit
- audit-logger

policy:
rbac:
require_auth: true
deny_if_missing:
- role
- team

prompt-injection:
embedding_threshold: 0.8
response:
action: block
encoding:
decode_base64: true
normalize_unicode: true
boundaries:
enforce_delimiters: true

safety-filter:
categories:
- violence
- hate_speech
- self_harm
- sexual_content
action: block

pii-detector:
action: redact
redaction:
marker_format: label
categories:
- email
- phone
- ssn
- credit_card

dlp-filter:
patterns:
- name: api_keys
regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16}|ghp_[a-zA-Z0-9]{36})"
action: block
- name: internal_hostnames
regex: "[a-z0-9-]+\\.internal\\.yourco\\.com"
action: redact
- name: aws_arns
regex: "arn:aws:[a-z0-9-]+:[a-z0-9-]*:\\d{12}:"
action: redact

agent-firewall:
allowed_tools:
- search
- summarize
- retrieve_document
blocked_tools:
- execute_code
- shell_command
- delete_record
max_actions_per_session: 50

quality-scorer:
overall_min_score: 0.6
on_fail: escalate

citation-verifier:
mode: strict
min_grounding_score: 0.7
on_ungrounded: escalate

audit-logger:
retention_days: 365
immutable: true

Layer-by-layer breakdown

Network controls

Control which networks and clients can reach the gateway:

gateway:
ip_allowlist:
enabled: true
ranges:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
on_deny: block

bot_detection:
enabled: true
block_known_bots: true
require_user_agent: true

Prompt injection with layered detection

The prompt injection policy uses multiple detection methods in sequence:

  1. Pattern matching — known attack signatures
  2. Base64/Unicode decoding — encoded evasion attempts
  3. Delimiter enforcement — boundary escape attempts
  4. Embedding analysis — semantic similarity to known injection patterns

Each method is independent. An attack must evade all four to succeed.

DLP beyond PII

The dlp-filter catches organization-specific sensitive data that standard PII detectors miss:

Pattern typeExampleRisk
API keyssk-abc123..., AKIA...Credential exposure
Internal hostnamesdb.internal.yourco.comInfrastructure leak
Cloud ARNsarn:aws:s3:::...Resource identification
Project codenamesPROJECT-ALPHA-2025Competitive intelligence
Internal IPs10.42.8.100Network topology leak

Agent firewall

The agent-firewall is the zero-trust control for AI agents that call tools:

  • Explicit allowlist — only approved tools can be called
  • Explicit blocklist — dangerous tools are always denied
  • Session limits — cap the total number of actions per session
  • Cost limits — cap the dollar amount per session

Monitoring defense effectiveness

Track how each layer is performing:

# See which policies are blocking the most requests
kt events list \
--filter "action:blocked" \
--from "2025-04-01" \
--to "2025-04-30" \
--limit 100

# Export policy trigger breakdown
kt events export \
--from "2025-04-01" \
--to "2025-04-30" \
--format csv \
--output defense-report.csv

In the console Events page, filter by individual policies to see:

  • How many requests each layer processes
  • Block and escalation rates per policy
  • False positive rates (blocks that get overridden in escalation review)

Quick wins

  1. Start with three layersrbac + prompt-injection + pii-detector covers identity, safety, and data
  2. Add dlp-filter for API key patterns — catch the highest-risk data leak vector
  3. Enable agent-firewall with an allowlist — block dangerous tool calls immediately
  4. Set audit-logger to immutable — ensure your audit trail can't be tampered with

For AI systems

  • Canonical terms: defense-in-depth, zero-trust, policy chain, input phase, output phase, rbac, prompt-injection, safety-filter, pii-detector, dlp-filter, agent-firewall, quality-scorer, citation-verifier, audit-logger.
  • 8 layers: network (IP allowlist) → identity (RBAC) → input safety → data protection → agent controls → output quality → output safety → audit.
  • Config: full pack with all policies chained in the policies.chain array.
  • Best next pages: Block Prompt Injection, Prevent Data Leaks, Govern AI Agents, Policy Controls Catalog.

For engineers

  • Prerequisites: gateway running; layer policies incrementally (don’t enable all 8 layers at once in production).
  • Start with audit-logger + pii-detector, then add prompt-injection, then agent-firewall.
  • Each layer is independent — a bypass in one layer doesn’t compromise the others.
  • Validate: test each layer independently by sending a payload it should catch and confirming the correct action.
  • Use the full zero-trust config YAML in this page as a reference for maximum-security deployments.

For leaders

  • Zero-trust means no request is trusted by default — every interaction passes through multiple independent verification layers.
  • Defense-in-depth eliminates single points of failure: if one control is bypassed, others still protect the system.
  • This architecture satisfies NIST Zero Trust Architecture (SP 800-207) principles applied to AI workloads.
  • Incremental layer deployment means you can adopt zero-trust progressively without disrupting existing traffic.

Next steps