Data Minimization: Sending Only What's Needed to LLM Providers

Data minimization is the discipline that most AI teams say they support and very few enforce at runtime. People still send complete support tickets when only the issue summary is needed. They still include account numbers when a status label would do. They still rely on a provider's retention promise while forwarding far more information than the task requires. Keeptrusts turns minimization into a real gateway decision by combining request redaction, custom DLP rules, and provider filtering based on declared data-handling guarantees.

Use this page when

You want to reduce the amount of personal or confidential data that reaches upstream models.
You need a practical minimization pattern that uses current Keeptrusts policy behavior instead of vague privacy principles.
You want routing decisions to reinforce content minimization instead of undermining it.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

The problem

Most AI data leaks happen before anyone argues about geography or retention. They happen because the request simply contains too much information.

The common failure mode is convenience. A developer sends the full customer thread because it is easier than extracting only the relevant facts. A support analyst pastes the raw billing export into a summarization prompt because there is no enforced redaction step. A compliance team chooses a zero-retention provider but still forwards raw personal identifiers that never needed to leave the environment in the first place.

This is why minimization is not the same thing as residency, zero retention, or contract language. Those controls answer what the provider may do after receiving the data. Minimization answers what the provider should receive at all.

There is also a structural problem in many AI stacks: one generic chain handles every endpoint. Embeddings, chat, support triage, and regulated document review all flow through the same policy path. When that happens, the safest route gets watered down for the least sensitive use case, or the strictest route becomes so noisy that teams stop trusting it.

The solution

Keeptrusts gives you three distinct levers that map well to minimization.

Use PII Detector to redact structured identifiers before the request leaves the gateway. This is the fastest way to remove email addresses, card data, or other common identifiers from otherwise valid prompts.

Use DLP Filter for organization-specific terms and patterns that a generic detector will never know about, such as sealed case numbers, project codenames, or literal export labels.

Use Data Routing Policy when the remaining provider pool must still satisfy sanitization, tokenized-input, or zero-retention requirements. Minimization is stronger when the only eligible providers are the ones that can handle already-sanitized content the way you expect.

Finally, use Conditional Chains Configuration so the stricter controls run only on paths or headers that actually need them. That avoids turning minimization into an all-or-nothing platform tax.

Implementation

This example applies the heavier minimization path to regulated chat traffic identified by X-Data-Class: regulated, while still letting the global chain stay simple for lower-risk routes.

pack:
  name: minimized-regulated-chat
  version: "1.0.0"
  enabled: true

providers:
  targets:
    - id: sanitized-zdr
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_MINIMIZED_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0
        in_memory_only: true
        sanitized: true
        accepts_tokenized_input: true
        allow_internet_egress: false
        local_only_processing: true

policies:
  chain:
    - pii-detector:
        stage: pre-request
        parallel: true
        when:
          path: "/v1/chat/completions"
    - dlp-filter:
        stage: pre-request
        when:
          header:
            X-Data-Class: regulated
    - data-routing-policy
    - audit-logger

policy:
  pii-detector:
    action: redact
    pci_mode: true
    detect_patterns:
      - 'EMP-\d{6}'
    redaction:
      marker_format: label
      include_metadata: true
      custom_markers:
        generic_id: "[REDACTED-ID]"

  dlp-filter:
    detect_patterns:
      - 'CASE-[0-9]{4}-SEALED-[0-9]{5}'
    blocked_terms:
      - merger room transcript
      - raw customer export
    action: block
    fuzzy_matching: true
    max_distance: 1
    sensitivity_level: restricted

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    sanitize_before_provider: true
    tokenize_sensitive_fields: true
    allow_internet_egress: false
    local_only_processing: true
    on_no_compliant_provider: block
    log_provider_selection: true

  audit-logger: {}

The sequence matters.

pii-detector removes the structured values that should almost never reach the provider in raw form. dlp-filter catches the organization-specific content that a built-in redactor cannot infer. data-routing-policy then narrows the provider pool to targets that explicitly declare they can handle sanitized, tokenized, tightly governed traffic. If no compliant provider remains, the request blocks instead of falling through to a looser route.

That is a better operating model than trying to do minimization with one monolithic rule. Each policy answers a different question: what must be redacted, what must never leave, and which providers are still eligible after minimization.

If you want to make the behavior safer over time, add inline suites from Testing Configuration and run kt policy test --json before deployment. Minimization rules are much easier to trust when you can show concrete examples of content that should redact, block, or pass.

Results and impact

The immediate effect is smaller upstream payloads. That sounds obvious, but it changes several downstream obligations at once.

Privacy teams get fewer raw identifiers reaching third parties. Security teams get fewer opportunities for sensitive internal labels to escape in prompts. Platform teams get a clearer routing contract because only sanitized-capable providers remain eligible for regulated traffic.

Minimization also improves audit posture. A reviewer can see that the system did not merely route traffic to a zero-retention provider. It first removed or blocked data that never needed to go upstream at all.

Key takeaways

Minimization is a request-boundary decision, not just a provider-contract decision.
Use PII Detector for structured identifiers and DLP Filter for organization-specific terms.
Use Conditional Chains Configuration so heavier controls run only where they are needed.
Use Data Routing Policy to make sure compliant providers remain the only eligible targets after sanitization.
Test minimization behavior before rollout instead of assuming the chain will behave the way you intended.

Data Minimization: Sending Only What's Needed to LLM Providers

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​