Skip to main content

Data Minimization: Sending Only What's Needed to LLM Providers

Data Minimization: Sending Only What's Needed to LLM Providers

Data minimization is the discipline that most AI teams say they support and very few enforce at runtime. People still send complete support tickets when only the issue summary is needed. They still include account numbers when a status label would do. They still rely on a provider's retention promise while forwarding far more information than the task requires. Keeptrusts turns minimization into a real gateway decision by combining request redaction, custom DLP rules, and provider filtering based on declared data-handling guarantees.

Use this page when

  • You want to reduce the amount of personal or confidential data that reaches upstream models.
  • You need a practical minimization pattern that uses current Keeptrusts policy behavior instead of vague privacy principles.
  • You want routing decisions to reinforce content minimization instead of undermining it.

Primary audience

  • Primary: Technical Leaders
  • Secondary: Technical Engineers, AI Agents

The problem

Most AI data leaks happen before anyone argues about geography or retention. They happen because the request simply contains too much information.

The common failure mode is convenience. A developer sends the full customer thread because it is easier than extracting only the relevant facts. A support analyst pastes the raw billing export into a summarization prompt because there is no enforced redaction step. A compliance team chooses a zero-retention provider but still forwards raw personal identifiers that never needed to leave the environment in the first place.

This is why minimization is not the same thing as residency, zero retention, or contract language. Those controls answer what the provider may do after receiving the data. Minimization answers what the provider should receive at all.

There is also a structural problem in many AI stacks: one generic chain handles every endpoint. Embeddings, chat, support triage, and regulated document review all flow through the same policy path. When that happens, the safest route gets watered down for the least sensitive use case, or the strictest route becomes so noisy that teams stop trusting it.

The solution

Keeptrusts gives you three distinct levers that map well to minimization.

Use PII Detector to redact structured identifiers before the request leaves the gateway. This is the fastest way to remove email addresses, card data, or other common identifiers from otherwise valid prompts.

Use DLP Filter for organization-specific terms and patterns that a generic detector will never know about, such as sealed case numbers, project codenames, or literal export labels.

Use Data Routing Policy when the remaining provider pool must still satisfy sanitization, tokenized-input, or zero-retention requirements. Minimization is stronger when the only eligible providers are the ones that can handle already-sanitized content the way you expect.

Finally, use Conditional Chains Configuration so the stricter controls run only on paths or headers that actually need them. That avoids turning minimization into an all-or-nothing platform tax.

Implementation

This example applies the heavier minimization path to regulated chat traffic identified by X-Data-Class: regulated, while still letting the global chain stay simple for lower-risk routes.

pack:
name: minimized-regulated-chat
version: "1.0.0"
enabled: true

providers:
targets:
- id: sanitized-zdr
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_MINIMIZED_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0
in_memory_only: true
sanitized: true
accepts_tokenized_input: true
allow_internet_egress: false
local_only_processing: true

policies:
chain:
- pii-detector:
stage: pre-request
parallel: true
when:
path: "/v1/chat/completions"
- dlp-filter:
stage: pre-request
when:
header:
X-Data-Class: regulated
- data-routing-policy
- audit-logger

policy:
pii-detector:
action: redact
pci_mode: true
detect_patterns:
- 'EMP-\d{6}'
redaction:
marker_format: label
include_metadata: true
custom_markers:
generic_id: "[REDACTED-ID]"

dlp-filter:
detect_patterns:
- 'CASE-[0-9]{4}-SEALED-[0-9]{5}'
blocked_terms:
- merger room transcript
- raw customer export
action: block
fuzzy_matching: true
max_distance: 1
sensitivity_level: restricted

data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
sanitize_before_provider: true
tokenize_sensitive_fields: true
allow_internet_egress: false
local_only_processing: true
on_no_compliant_provider: block
log_provider_selection: true

audit-logger: {}

The sequence matters.

pii-detector removes the structured values that should almost never reach the provider in raw form. dlp-filter catches the organization-specific content that a built-in redactor cannot infer. data-routing-policy then narrows the provider pool to targets that explicitly declare they can handle sanitized, tokenized, tightly governed traffic. If no compliant provider remains, the request blocks instead of falling through to a looser route.

That is a better operating model than trying to do minimization with one monolithic rule. Each policy answers a different question: what must be redacted, what must never leave, and which providers are still eligible after minimization.

If you want to make the behavior safer over time, add inline suites from Testing Configuration and run kt policy test --json before deployment. Minimization rules are much easier to trust when you can show concrete examples of content that should redact, block, or pass.

Results and impact

The immediate effect is smaller upstream payloads. That sounds obvious, but it changes several downstream obligations at once.

Privacy teams get fewer raw identifiers reaching third parties. Security teams get fewer opportunities for sensitive internal labels to escape in prompts. Platform teams get a clearer routing contract because only sanitized-capable providers remain eligible for regulated traffic.

Minimization also improves audit posture. A reviewer can see that the system did not merely route traffic to a zero-retention provider. It first removed or blocked data that never needed to go upstream at all.

Key takeaways

  • Minimization is a request-boundary decision, not just a provider-contract decision.
  • Use PII Detector for structured identifiers and DLP Filter for organization-specific terms.
  • Use Conditional Chains Configuration so heavier controls run only where they are needed.
  • Use Data Routing Policy to make sure compliant providers remain the only eligible targets after sanitization.
  • Test minimization behavior before rollout instead of assuming the chain will behave the way you intended.

Next steps