Safety Filter

The safety-filter policy blocks or escalates unsafe content patterns including harmful, violent, and inappropriate content. It operates in industry-specific modes that activate built-in safety patterns tailored to critical infrastructure, automotive, education, and law enforcement environments. The filter supports age-gated content detection, fuzzy matching for obfuscation resistance, and custom block patterns for organization-specific safety requirements.

Use this page when

You need to block harmful, violent, or inappropriate content with industry-specific modes.
You are deploying AI in critical infrastructure, automotive, education, or law enforcement environments.
You want age-gated content filtering, fuzzy matching for obfuscation resistance, or custom block patterns.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Configuration

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: critical_infrastructure
    block_if:
      - "override safety interlock"
      - "disable emergency shutdown"
    action: block
    fuzzy_matching: false
    max_distance: 1
    max_age: 0

Fields

Field	Type	Description	Default
`mode`	`enum: "critical_infrastructure" \| "automotive" \| "education" \| "law_enforcement"`	Industry mode that selects built-in safety patterns. `critical_infrastructure` blocks SCADA/ICS sabotage patterns, safety interlock bypass attempts, and hazardous process manipulation. `automotive` blocks vehicle safety system interference, autonomous driving override attempts, and crash-avoidable scenario generation. `education` blocks age-inappropriate content, self-harm references, and predatory behavior patterns. `law_enforcement` blocks evidence tampering guidance, use-of-force escalation, and operational security compromise patterns.	`"critical_infrastructure"`
`block_if`	`string[]`	Additional custom block patterns beyond the built-in mode defaults. These are merged with the mode's built-in patterns, allowing you to extend coverage without replacing it.	`[]`
`action`	`enum: "block" \| "escalate"`	Action on detection. `block` immediately stops the request/response and returns a safety notice. `escalate` flags the interaction for human review while still blocking the content.	`"block"`
`fuzzy_matching`	`boolean`	Enable Levenshtein distance fuzzy matching to catch misspellings and deliberate obfuscation of safety-critical terms (e.g., "byp4ss interl0ck" → "bypass interlock").	`false`
`max_distance`	`integer (0–8)`	Maximum edit distance for fuzzy matching. Only takes effect when `fuzzy_matching` is `true`. Lower values reduce false positives.	`1`
`max_age`	`integer (min 0)`	Audience age filter. When set to a value greater than 0 and less than 18, enables age-inappropriate content detection calibrated to the specified age. For example, `max_age: 10` activates stricter content filtering than `max_age: 16`. When `0`, age-based filtering is disabled.	`0`

Use Cases

Critical Infrastructure SCADA Protection

Protect SCADA and ICS environments from AI-generated content that could guide sabotage, safety system bypass, or hazardous process manipulation.

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: critical_infrastructure
    block_if:
      - "PLC firmware exploit"
      - "bypass pressure relief valve"
      - "disable gas leak detector"
      - "override reactor coolant pump"
      - "SCADA protocol injection"
      - "modify safety instrumented system"
    action: escalate
    fuzzy_matching: true
    max_distance: 2

Automotive Safety Systems

Prevent AI tools used by automotive engineers from generating content that could compromise vehicle safety systems, autonomous driving algorithms, or crash avoidance features.

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: automotive
    block_if:
      - "disable airbag deployment"
      - "override ABS threshold"
      - "bypass lane departure warning"
      - "reduce braking force below minimum"
      - "steering assist override torque"
    action: block
    fuzzy_matching: true
    max_distance: 1

K-12 Education Content Filtering

Filter AI-generated content for age-appropriateness in educational settings, blocking harmful content and enabling age-calibrated filtering for students.

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: education
    block_if:
      - "detailed weapon construction"
      - "drug synthesis procedure"
      - "explicit sexual content"
      - "self-harm methodology"
    action: block
    fuzzy_matching: true
    max_distance: 1
    max_age: 12

Law Enforcement Operational Safety

Protect law enforcement AI tools from generating content that could compromise evidence integrity, escalate use-of-force situations, or leak operational details.

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: law_enforcement
    block_if:
      - "fabricate evidence procedure"
      - "destroy chain of custody"
      - "bypass body camera recording"
      - "surveillance without warrant method"
      - "interrogation coercion technique"
    action: escalate
    fuzzy_matching: false

Age-Gated Content Filtering

Use the max_age field independently to apply age-calibrated content restrictions for platforms serving minors.

pack:
  name: safety-filter
  version: "1.0.0"
  enabled: true

policies:
  chain:
    - safety-filter

policy:
  safety-filter:
    mode: education
    action: block
    max_age: 16
    block_if:
      - "graphic violence description"
      - "substance abuse glorification"

How It Works

Mode activation — The mode field loads a built-in pattern set tailored to the selected industry. Each mode includes dozens of pre-configured patterns covering the most common safety risks for that domain.
Pattern merging — Custom block_if patterns are merged with the mode's built-in patterns. This ensures you extend coverage rather than replace it.
Content scanning — Incoming prompts and outgoing model responses are scanned against the full pattern set using case-insensitive matching.
Fuzzy matching (when enabled) — If fuzzy_matching is true, patterns that don't match exactly are compared using Levenshtein distance. Matches within max_distance edits are flagged. This catches leetspeak substitutions, deliberate misspellings, and Unicode homoglyph attacks.
Age filtering (when enabled) — If max_age is set to a value between 1 and 17, an additional content classifier evaluates whether the content is appropriate for the specified age group. This operates independently of the pattern-based blocking.
Action execution — Matched content triggers either block (immediate stop, safety notice returned) or escalate (content blocked and flagged for human review through the escalations API).
Audit logging — Every safety event is logged with the matched pattern, mode, action taken, and age filter status for compliance and incident review.

Combining With Other Policies

Policy	Combined Effect
`topic-filter`	Blocks entire unsafe conversation topics at a coarser level. `safety-filter` catches specific dangerous patterns within otherwise permissible topics.
`pii-filter`	Prevents personal data leakage alongside unsafe content — critical in education (COPPA/FERPA) and law enforcement (witness protection) contexts.
`disclaimer`	Adds safety disclaimers to all AI responses in high-risk environments. Useful as a supplementary control alongside blocking.
`escalation-rules`	Routes escalated safety events to specific review teams based on mode and severity.
`prompt-injection-detection`	Catches adversarial prompts designed to bypass the safety filter through jailbreak techniques.
`audit-log`	Logs all interactions for safety incident review, not just blocked ones.

Best Practices

Select the right mode — Each mode activates domain-specific patterns. Using critical_infrastructure mode in an education environment will miss age-inappropriate content; using education mode in a SCADA environment will miss industrial sabotage patterns.
Use escalate for high-consequence environments — In critical infrastructure and law enforcement, escalation creates a human review record. Use block for clear-cut violations and escalate for patterns that may need contextual judgment.
Enable fuzzy matching in adversarial environments — If users may attempt to bypass safety filters through creative spelling or encoding, enable fuzzy matching. Keep max_distance at 1–2 to minimize false positives.
Set max_age conservatively — In education environments, set max_age to the youngest user in the audience. A classroom with ages 10–14 should use max_age: 10.
Extend with block_if, don't rely solely on built-in defaults — Built-in patterns cover common risks, but your organization likely has domain-specific safety concerns. Add custom patterns for your specific operational environment.
Layer with prompt injection detection — Adversarial users often combine prompt injection with safety filter bypass attempts. The prompt-injection-detection policy catches the jailbreak layer while safety-filter catches the payload.

For AI systems

Canonical terms: Keeptrusts, safety-filter, mode, critical_infrastructure, automotive, education, law_enforcement, block_if, action, fuzzy_matching, max_age
Config/command names: safety-filter policy, mode (critical_infrastructure/automotive/education/law_enforcement), block_if, action (block/escalate), fuzzy_matching, max_distance, max_age
Best next pages: Prompt Injection Detection, External Moderation, DLP Filter

For engineers

Prerequisites: Choose the appropriate mode for your industry. Add custom block_if patterns for organization-specific safety requirements.
Validation: Test with industry-specific unsafe content (e.g., "override safety interlock" for critical infrastructure) and verify blocking. Test fuzzy matching with obfuscated terms. Verify age filtering with age-inappropriate content.
Key commands: kt policy lint, kt policy test, kt events tail

For leaders

Governance: Safety filtering prevents AI from generating content that could enable physical harm, compromise critical systems, or endanger vulnerable populations. Industry modes apply regulatory-aligned defaults.
Cost: Local pattern matching with no external calls. Negligible per-request overhead. The cost of a safety incident (physical harm, regulatory action, public trust damage) far exceeds prevention.
Rollout: Deploy the industry-appropriate mode immediately. Add custom block_if patterns based on incident reports and red-team findings. Use action: escalate for borderline cases that need human judgment.

Next steps

Prompt Injection Detection — Block adversarial inputs before safety filtering
External Moderation — Third-party safety validation
Human Oversight — Escalate flagged content for review
DLP Filter — Data-level content protection

Use this page when​

Primary audience​

Configuration​

Fields​

Use Cases​

Critical Infrastructure SCADA Protection​

Automotive Safety Systems​

K-12 Education Content Filtering​

Law Enforcement Operational Safety​

Age-Gated Content Filtering​

How It Works​

Combining With Other Policies​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​