Implement Zero-Trust AI with Defense-in-Depth Policies

A single security control always fails eventually. Zero-trust AI means no request is trusted by default — every request passes through multiple, independent security layers before reaching a model and again before reaching a user. Keeptrusts enforces defense-in-depth with chained policies across input, tool, and output phases.

Use this page when

You want to implement defense-in-depth for AI with multiple independent security layers on input and output.
You need to understand the full 8-layer policy chain model (network → identity → input → data → agent → output → audit).
You are building a comprehensive security posture where no single control failure compromises the system.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

What you'll achieve

Layered input security — prompt injection, PII detection, DLP, and identity checks in sequence
Layered output security — quality scoring, citation verification, and content filtering on responses
Agent firewall — tool-level access control and session limits
DLP enforcement — custom pattern matching for secrets, internal data, and proprietary content
Network controls — IP allowlisting and bot detection at the gateway edge

The defense-in-depth model

Request arrives at gateway
  ┌─ Layer 1: Network controls (IP allowlist, bot detection)
  ├─ Layer 2: Identity (RBAC — team, role, authentication)
  ├─ Layer 3: Input safety (prompt injection, safety filter)
  ├─ Layer 4: Data protection (PII detector, DLP filter)
  ├─ Layer 5: Agent controls (agent firewall, tool validation)
  ├─ → Forward to upstream provider
  ├─ Layer 6: Output quality (quality scorer, citation verifier)
  ├─ Layer 7: Output safety (content filter, response rewriter)
  └─ Layer 8: Audit (audit logger, event record)
Response delivered to caller

Each layer is independent. A failure in one layer doesn't compromise the others. If a prompt injection evades pattern detection, PII redaction still removes sensitive data. If PII detection misses a custom format, the DLP filter catches it.

Full defense-in-depth config

pack:
  name: zero-trust-gateway
  version: "1.0"

policies:
  chain:
    # Layer 2: Identity
    - rbac
    # Layer 3: Input safety
    - prompt-injection
    - safety-filter
    # Layer 4: Data protection
    - pii-detector
    - dlp-filter
    # Layer 5: Agent controls
    - agent-firewall
    # Layer 6–7: Output controls
    - quality-scorer
    - citation-verifier
    # Layer 8: Audit
    - audit-logger

policy:
  rbac:
    require_auth: true
    deny_if_missing:
      - role
      - team

  prompt-injection:
    embedding_threshold: 0.8
    response:
      action: block
    encoding:
      decode_base64: true
      normalize_unicode: true
    boundaries:
      enforce_delimiters: true

  safety-filter:
    categories:
      - violence
      - hate_speech
      - self_harm
      - sexual_content
    action: block

  pii-detector:
    action: redact
    redaction:
      marker_format: label
    categories:
      - email
      - phone
      - ssn
      - credit_card

  dlp-filter:
    patterns:
      - name: api_keys
        regex: "(sk-[a-zA-Z0-9]{48}|AKIA[A-Z0-9]{16}|ghp_[a-zA-Z0-9]{36})"
        action: block
      - name: internal_hostnames
        regex: "[a-z0-9-]+\\.internal\\.yourco\\.com"
        action: redact
      - name: aws_arns
        regex: "arn:aws:[a-z0-9-]+:[a-z0-9-]*:\\d{12}:"
        action: redact

  agent-firewall:
    allowed_tools:
      - search
      - summarize
      - retrieve_document
    blocked_tools:
      - execute_code
      - shell_command
      - delete_record
    max_actions_per_session: 50

  quality-scorer:
    overall_min_score: 0.6
    on_fail: escalate

  citation-verifier:
    mode: strict
    min_grounding_score: 0.7
    on_ungrounded: escalate

  audit-logger:
    retention_days: 365
    immutable: true

Layer-by-layer breakdown

Network controls

Control which networks and clients can reach the gateway:

gateway:
  ip_allowlist:
    enabled: true
    ranges:
      - "10.0.0.0/8"
      - "172.16.0.0/12"
      - "192.168.0.0/16"
    on_deny: block

  bot_detection:
    enabled: true
    block_known_bots: true
    require_user_agent: true

Prompt injection with layered detection

The prompt injection policy uses multiple detection methods in sequence:

Pattern matching — known attack signatures
Base64/Unicode decoding — encoded evasion attempts
Delimiter enforcement — boundary escape attempts
Embedding analysis — semantic similarity to known injection patterns

Each method is independent. An attack must evade all four to succeed.

DLP beyond PII

The dlp-filter catches organization-specific sensitive data that standard PII detectors miss:

Pattern type	Example	Risk
API keys	`sk-abc123...`, `AKIA...`	Credential exposure
Internal hostnames	`db.internal.yourco.com`	Infrastructure leak
Cloud ARNs	`arn:aws:s3:::...`	Resource identification
Project codenames	`PROJECT-ALPHA-2025`	Competitive intelligence
Internal IPs	`10.42.8.100`	Network topology leak

Agent firewall

The agent-firewall is the zero-trust control for AI agents that call tools:

Explicit allowlist — only approved tools can be called
Explicit blocklist — dangerous tools are always denied
Session limits — cap the total number of actions per session
Cost limits — cap the dollar amount per session

Monitoring defense effectiveness

Track how each layer is performing:

# See which policies are blocking the most requests
kt events list \
  --filter "action:blocked" \
  --from "2025-04-01" \
  --to "2025-04-30" \
  --limit 100

# Export policy trigger breakdown
kt events export \
  --from "2025-04-01" \
  --to "2025-04-30" \
  --format csv \
  --output defense-report.csv

In the console Events page, filter by individual policies to see:

How many requests each layer processes
Block and escalation rates per policy
False positive rates (blocks that get overridden in escalation review)

Quick wins

Start with three layers — rbac + prompt-injection + pii-detector covers identity, safety, and data
Add dlp-filter for API key patterns — catch the highest-risk data leak vector
Enable agent-firewall with an allowlist — block dangerous tool calls immediately
Set audit-logger to immutable — ensure your audit trail can't be tampered with

For AI systems

Canonical terms: defense-in-depth, zero-trust, policy chain, input phase, output phase, rbac, prompt-injection, safety-filter, pii-detector, dlp-filter, agent-firewall, quality-scorer, citation-verifier, audit-logger.
8 layers: network (IP allowlist) → identity (RBAC) → input safety → data protection → agent controls → output quality → output safety → audit.
Config: full pack with all policies chained in the policies.chain array.
Best next pages: Block Prompt Injection, Prevent Data Leaks, Govern AI Agents, Policy Controls Catalog.

For engineers

Prerequisites: gateway running; layer policies incrementally (don’t enable all 8 layers at once in production).
Start with audit-logger + pii-detector, then add prompt-injection, then agent-firewall.
Each layer is independent — a bypass in one layer doesn’t compromise the others.
Validate: test each layer independently by sending a payload it should catch and confirming the correct action.
Use the full zero-trust config YAML in this page as a reference for maximum-security deployments.

For leaders

Zero-trust means no request is trusted by default — every interaction passes through multiple independent verification layers.
Defense-in-depth eliminates single points of failure: if one control is bypassed, others still protect the system.
This architecture satisfies NIST Zero Trust Architecture (SP 800-207) principles applied to AI workloads.
Incremental layer deployment means you can adopt zero-trust progressively without disrupting existing traffic.

Next steps

Block Prompt Injection Attacks — deep dive into injection defense
Prevent Sensitive Data Leaks — PII and DLP controls
Govern AI Agents — agent-specific security controls
Policy Controls Catalog — full inventory of available controls
Pass AI Compliance Audits — turn defense-in-depth into audit evidence

Use this page when​

Primary audience​

What you'll achieve​

The defense-in-depth model​

Full defense-in-depth config​

Layer-by-layer breakdown​

Network controls​

Prompt injection with layered detection​

DLP beyond PII​

Agent firewall​

Monitoring defense effectiveness​

Quick wins​

For AI systems​

For engineers​

For leaders​

Next steps​