Data Classification for AI: A Framework for What Should Never Reach an LLM

The safest AI request is the one that never contains data you cannot afford to disclose. In practice, data classification for AI is not a spreadsheet exercise. It is a runtime decision about whether content should be blocked, redacted, or routed only to a provider with declared handling guarantees. Keeptrusts gives you that decision surface in policy, not in tribal knowledge.

Use this page when

You need a practical way to decide what data can reach an LLM, what must be redacted, and what must be blocked.
You are designing policy chains for customer support, engineering, legal, healthcare, or finance workloads.
You want a classification model that maps directly to Keeptrusts controls instead of producing labels nobody enforces.

Primary audience

Primary: Technical Engineers
Secondary: Technical Leaders, AI Agents

The problem

Most teams start AI governance with a false binary: either "LLMs are banned for sensitive work" or "staff will use good judgment." Neither survives contact with production traffic.

The real problem is that sensitive data arrives in mixed form. A single prompt can contain a customer email address, an internal hostname, a payment card number, a merger codename, and a perfectly harmless request for a summary. If your controls only ask whether the whole prompt is allowed, you lose the ability to govern at the right boundary.

This is where AI traffic differs from ordinary document classification. The same user may legitimately need model help, but only after specific fields are removed or transformed. Some content should never leave the gateway in raw form. Some content is acceptable only when routed to a provider target that declares zero retention and no training. Some content is not inherently restricted, but still needs audit evidence and predictable cost controls.

Classification also fails when it is detached from provider selection. A policy that labels content as sensitive but still lets a general-purpose provider receive it without retention constraints is not meaningful governance. The control plane has to connect content classification to routing and execution.

The solution

The simplest model is to classify AI-bound data into enforcement outcomes rather than abstract labels.

1. Block in raw form

These are values that should not cross the model boundary at all: leaked API keys, private keys, internal investigation codes, restricted codenames, and request patterns that indicate prompt injection or policy bypass attempts. In Keeptrusts, this is typically where DLP Filter and the prompt boundary controls described in Implement Zero-Trust AI with Defense-in-Depth Policies do their work.

2. Redact before forwarding

These are identifiers that can appear in otherwise valid business workflows but should not reach the provider unmodified: email addresses, SSNs, phone numbers, PANs, MRNs, and similar records. This is the job of PII Detector, which can redact or block request-side PII and also powers buffered output redaction when present in the chain.

3. Route only to compliant targets

Some content is usable only if the provider target meets strict handling guarantees such as zero retention, no training, local-only execution, tokenized input support, or no internet egress. This is enforced by Data Routing Policy, which filters provider targets by declared data_policy metadata before normal routing runs.

4. Allow with evidence and cost controls

Low-risk operational context may be permitted, but it still needs governance. That means event visibility, a defined policy chain, and a spend posture that does not force unsafe routing choices later. Prevent Sensitive Data Leaks in AI Requests explains the layered protection model, while Spend & Wallets explains how Keeptrusts reserves spend before dispatch and holds requests when eligible wallet balance is not available.

That model is intentionally operational. You are not asking users to memorize sensitivity color codes. You are deciding what the gateway must do.

Implementation

Start by encoding the categories you actually care about. Use dlp-filter for organization-specific secrets and terms, pii-detector for structured identifiers, and data-routing-policy for provider-side guarantees.

pack:
  name: classified-ai-boundary
  version: "1.0.0"
  enabled: true

providers:
  targets:
    - id: eu-zdr
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0
        in_memory_only: true
        sanitized: true
        accepts_tokenized_input: true
        allow_internet_egress: false
        local_only_processing: true

    - id: standard-cloud
      provider: openai
      model: gpt-5.4-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: false
        training_opt_out: true
        retention_days: 30
        allow_internet_egress: true
        local_only_processing: false

policies:
  chain:
    - prompt-injection
    - dlp-filter
    - pii-detector
    - data-routing-policy

policy:
  prompt-injection:
    use_embedding: true
    detection:
      attack_patterns:
        - "ignore.*previous.*instructions"
        - "reveal.*system.*prompt"
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  dlp-filter:
    detect_patterns:
      - 'AKIA[0-9A-Z]{16}'
      - 'ghp_[0-9A-Za-z]{36}'
      - 'CASE-[0-9]{4}-SEALED-[0-9]{5}'
    blocked_terms:
      - Project Atlas
      - internal settlement memo
      - restricted merger room
    action: block
    fuzzy_matching: true
    max_distance: 1
    sensitivity_level: restricted

  pii-detector:
    action: redact
    pci_mode: true
    detect_patterns:
      - 'EMP-\d{6}'
    redaction:
      marker_format: label
      include_metadata: true
      custom_markers:
        generic_id: "[REDACTED-ID]"

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    require_in_memory_only: true
    sanitize_before_provider: true
    tokenize_sensitive_fields: true
    allow_internet_egress: false
    local_only_processing: true
    on_no_compliant_provider: block
    log_provider_selection: true

This chain does four different jobs.

First, prompt-injection protects the request boundary before data protection logic runs. If the prompt is trying to bypass policy or reveal hidden instructions, the request is blocked outright. That is a core zero-trust move: do not let adversarial input drive later controls.

Second, dlp-filter blocks content that should never be forwarded in any form. This is the right place for internal project names, legal hold references, and secret-like patterns that the platform cannot know in advance.

Third, pii-detector redacts structured personal or payment data that may appear inside otherwise legitimate prompts. That keeps useful work moving without letting raw identifiers leave the gateway.

Fourth, data-routing-policy refuses to route classified traffic to a provider target that lacks the declared handling guarantees you require. This is the step that converts a classification decision into provider enforcement.

Once the config exists, validate it before rollout:

kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

If you expect sensitive workloads to use a smaller, more expensive compliant provider pool, tie those workloads to a dedicated wallet scope. Spend & Wallets matters here because Keeptrusts reserves cost before dispatch. If balance is insufficient, the request is held instead of being quietly sent somewhere cheaper and less governed.

Results and impact

Teams that adopt this model usually get three immediate improvements.

The first is consistency. Engineers stop debating whether a workload is "kind of sensitive" and start asking whether it should block, redact, or route under stricter provider guarantees. That is a decision people can implement and test.

The second is lower false confidence. Many AI programs think they have classification because they have a document. In reality, nothing changes at runtime. A policy-driven model closes that gap. If the request contains a card number, pci_mode redacts or blocks it. If it references an internal codename, dlp-filter catches it. If the provider target does not meet retention requirements, the request never routes there.

The third is cleaner audit posture. When someone asks how customer data, payment data, and confidential project information are handled before an LLM call, you can answer in technical terms and in policy terms. That is a much stronger governance position than asking reviewers to trust application code scattered across teams.

Key takeaways

AI data classification should map to enforcement outcomes: block, redact, strictly route, or allow with evidence.
Use PII Detector for built-in identifier detection and redaction, not for internal codenames or proprietary terms.
Use DLP Filter for organization-specific patterns and literal blocked terms.
Use Data Routing Policy when the sensitivity decision depends on provider retention, training, or locality guarantees.
Use Spend & Wallets to keep compliant routing enforceable under real budget pressure.

Data Classification for AI: A Framework for What Should Never Reach an LLM

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​