Data Classification for AI: A Framework for What Should Never Reach an LLM
Data Classification for AI: A Framework for What Should Never Reach an LLM
The safest AI request is the one that never contains data you cannot afford to disclose. In practice, data classification for AI is not a spreadsheet exercise. It is a runtime decision about whether content should be blocked, redacted, or routed only to a provider with declared handling guarantees. Keeptrusts gives you that decision surface in policy, not in tribal knowledge.
Use this page when
- You need a practical way to decide what data can reach an LLM, what must be redacted, and what must be blocked.
- You are designing policy chains for customer support, engineering, legal, healthcare, or finance workloads.
- You want a classification model that maps directly to Keeptrusts controls instead of producing labels nobody enforces.
Primary audience
- Primary: Technical Engineers
- Secondary: Technical Leaders, AI Agents
The problem
Most teams start AI governance with a false binary: either "LLMs are banned for sensitive work" or "staff will use good judgment." Neither survives contact with production traffic.
The real problem is that sensitive data arrives in mixed form. A single prompt can contain a customer email address, an internal hostname, a payment card number, a merger codename, and a perfectly harmless request for a summary. If your controls only ask whether the whole prompt is allowed, you lose the ability to govern at the right boundary.
This is where AI traffic differs from ordinary document classification. The same user may legitimately need model help, but only after specific fields are removed or transformed. Some content should never leave the gateway in raw form. Some content is acceptable only when routed to a provider target that declares zero retention and no training. Some content is not inherently restricted, but still needs audit evidence and predictable cost controls.
Classification also fails when it is detached from provider selection. A policy that labels content as sensitive but still lets a general-purpose provider receive it without retention constraints is not meaningful governance. The control plane has to connect content classification to routing and execution.
The solution
The simplest model is to classify AI-bound data into enforcement outcomes rather than abstract labels.
1. Block in raw form
These are values that should not cross the model boundary at all: leaked API keys, private keys, internal investigation codes, restricted codenames, and request patterns that indicate prompt injection or policy bypass attempts. In Keeptrusts, this is typically where DLP Filter and the prompt boundary controls described in Implement Zero-Trust AI with Defense-in-Depth Policies do their work.
2. Redact before forwarding
These are identifiers that can appear in otherwise valid business workflows but should not reach the provider unmodified: email addresses, SSNs, phone numbers, PANs, MRNs, and similar records. This is the job of PII Detector, which can redact or block request-side PII and also powers buffered output redaction when present in the chain.
3. Route only to compliant targets
Some content is usable only if the provider target meets strict handling guarantees such as zero retention, no training, local-only execution, tokenized input support, or no internet egress. This is enforced by Data Routing Policy, which filters provider targets by declared data_policy metadata before normal routing runs.
4. Allow with evidence and cost controls
Low-risk operational context may be permitted, but it still needs governance. That means event visibility, a defined policy chain, and a spend posture that does not force unsafe routing choices later. Prevent Sensitive Data Leaks in AI Requests explains the layered protection model, while Spend & Wallets explains how Keeptrusts reserves spend before dispatch and holds requests when eligible wallet balance is not available.
That model is intentionally operational. You are not asking users to memorize sensitivity color codes. You are deciding what the gateway must do.
Implementation
Start by encoding the categories you actually care about. Use dlp-filter for organization-specific secrets and terms, pii-detector for structured identifiers, and data-routing-policy for provider-side guarantees.
pack:
name: classified-ai-boundary
version: "1.0.0"
enabled: true
providers:
targets:
- id: eu-zdr
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0
in_memory_only: true
sanitized: true
accepts_tokenized_input: true
allow_internet_egress: false
local_only_processing: true
- id: standard-cloud
provider: openai
model: gpt-5.4-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: false
training_opt_out: true
retention_days: 30
allow_internet_egress: true
local_only_processing: false
policies:
chain:
- prompt-injection
- dlp-filter
- pii-detector
- data-routing-policy
policy:
prompt-injection:
use_embedding: true
detection:
attack_patterns:
- "ignore.*previous.*instructions"
- "reveal.*system.*prompt"
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true
dlp-filter:
detect_patterns:
- 'AKIA[0-9A-Z]{16}'
- 'ghp_[0-9A-Za-z]{36}'
- 'CASE-[0-9]{4}-SEALED-[0-9]{5}'
blocked_terms:
- Project Atlas
- internal settlement memo
- restricted merger room
action: block
fuzzy_matching: true
max_distance: 1
sensitivity_level: restricted
pii-detector:
action: redact
pci_mode: true
detect_patterns:
- 'EMP-\d{6}'
redaction:
marker_format: label
include_metadata: true
custom_markers:
generic_id: "[REDACTED-ID]"
data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
require_in_memory_only: true
sanitize_before_provider: true
tokenize_sensitive_fields: true
allow_internet_egress: false
local_only_processing: true
on_no_compliant_provider: block
log_provider_selection: true
This chain does four different jobs.
First, prompt-injection protects the request boundary before data protection logic runs. If the prompt is trying to bypass policy or reveal hidden instructions, the request is blocked outright. That is a core zero-trust move: do not let adversarial input drive later controls.
Second, dlp-filter blocks content that should never be forwarded in any form. This is the right place for internal project names, legal hold references, and secret-like patterns that the platform cannot know in advance.
Third, pii-detector redacts structured personal or payment data that may appear inside otherwise legitimate prompts. That keeps useful work moving without letting raw identifiers leave the gateway.
Fourth, data-routing-policy refuses to route classified traffic to a provider target that lacks the declared handling guarantees you require. This is the step that converts a classification decision into provider enforcement.
Once the config exists, validate it before rollout:
kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
If you expect sensitive workloads to use a smaller, more expensive compliant provider pool, tie those workloads to a dedicated wallet scope. Spend & Wallets matters here because Keeptrusts reserves cost before dispatch. If balance is insufficient, the request is held instead of being quietly sent somewhere cheaper and less governed.
Results and impact
Teams that adopt this model usually get three immediate improvements.
The first is consistency. Engineers stop debating whether a workload is "kind of sensitive" and start asking whether it should block, redact, or route under stricter provider guarantees. That is a decision people can implement and test.
The second is lower false confidence. Many AI programs think they have classification because they have a document. In reality, nothing changes at runtime. A policy-driven model closes that gap. If the request contains a card number, pci_mode redacts or blocks it. If it references an internal codename, dlp-filter catches it. If the provider target does not meet retention requirements, the request never routes there.
The third is cleaner audit posture. When someone asks how customer data, payment data, and confidential project information are handled before an LLM call, you can answer in technical terms and in policy terms. That is a much stronger governance position than asking reviewers to trust application code scattered across teams.
Key takeaways
- AI data classification should map to enforcement outcomes: block, redact, strictly route, or allow with evidence.
- Use PII Detector for built-in identifier detection and redaction, not for internal codenames or proprietary terms.
- Use DLP Filter for organization-specific patterns and literal blocked terms.
- Use Data Routing Policy when the sensitivity decision depends on provider retention, training, or locality guarantees.
- Use Spend & Wallets to keep compliant routing enforceable under real budget pressure.