Data Residency Enforcement: Keeping AI Data in the Right Geography

Data residency failures usually happen long before a legal review. They happen when a supposedly sensitive workload is allowed to route to whatever provider target is healthy, cheap, or already configured. Keeptrusts fixes that at the routing layer by filtering provider targets against declared handling metadata before a model call is made.

Use this page when

You need AI traffic to stay inside a defined geography, processing boundary, or provider assurance tier.
You are trying to move from contractual promises about residency to runtime enforcement.
You need a clear operating model for combining data minimization with provider selection.

Primary audience

Primary: Technical Leaders
Secondary: Technical Engineers, AI Agents

The problem

Many AI programs talk about data residency as if it were a vendor checkbox. In production, it is a routing problem.

If your gateway has multiple provider targets and only one of them meets strict locality or retention requirements, the traffic has to be filtered before normal model selection occurs. Otherwise, fallback logic, manual configuration drift, or cost pressure will eventually send the wrong request to the wrong place.

There is a second problem that security teams sometimes overlook: data residency is not the same as data minimization. You can keep traffic in the right geography and still expose raw PII, payment data, or confidential project terms. That is why residency controls cannot stand alone. They must sit next to content controls such as PII Detector and DLP Filter, which reduce what is sent before routing happens.

There is also a subtle but important implementation detail in Keeptrusts: Data Routing Policy does not inspect prompt text or infer geography from provider marketing. It evaluates the data_policy metadata that you declare on each providers.targets[] entry and removes non-compliant targets before normal routing continues. If the metadata is wrong or incomplete, your residency posture is wrong too.

The solution

Treat data residency as a provider selection contract expressed in policy.

In Keeptrusts, that means each provider target declares handling properties such as zero retention, no-training guarantees, retention window, in-memory processing, support for tokenized inputs, internet egress, and local-only processing. The data-routing-policy then enforces the subset you require for a particular workload.

This gives you a clean division of responsibility.

Content policies answer: "What in this prompt must be removed or blocked?"

Routing policy answers: "Given the remaining provider targets, which ones are even eligible to receive this request?"

That division matters because it lets you implement strict geography and strict minimization at the same time. The first reduces where traffic can go. The second reduces what goes there.

Implementation

Use two or more provider targets with explicit data_policy metadata, then let data-routing-policy enforce the path.

pack:
  name: eu-residency-enforcement
  version: "1.0.0"
  enabled: true

providers:
  targets:
    - id: eu-local-zdr
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0
        in_memory_only: true
        sanitized: true
        accepts_tokenized_input: true
        allow_internet_egress: false
        local_only_processing: true

    - id: global-standard
      provider: openai
      model: gpt-5.4-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: false
        training_opt_out: true
        retention_days: 30
        in_memory_only: false
        sanitized: false
        accepts_tokenized_input: false
        allow_internet_egress: true
        local_only_processing: false

policies:
  chain:
    - dlp-filter
    - pii-detector
    - data-routing-policy

policy:
  dlp-filter:
    detect_patterns:
      - 'CASE-[0-9]{4}-SEALED-[0-9]{5}'
      - 'AKIA[0-9A-Z]{16}'
    blocked_terms:
      - export controlled memo
      - internal acquisition room
    action: block
    fuzzy_matching: true
    max_distance: 1
    sensitivity_level: restricted

  pii-detector:
    action: redact
    pci_mode: true
    redaction:
      marker_format: label
      include_metadata: true

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    require_in_memory_only: true
    sanitize_before_provider: true
    tokenize_sensitive_fields: true
    allow_internet_egress: false
    local_only_processing: true
    on_no_compliant_provider: block
    log_provider_selection: true

The point of this configuration is not just that global-standard is present. The point is that it is present but becomes ineligible for governed traffic. That is stronger than relying on convention or manual runbook discipline.

A safe rollout pattern is to begin with on_no_compliant_provider: warn while you confirm the data_policy declarations on each target. Once logs show that the right targets are being excluded for the right reasons, move to block. The Data Routing Policy page documents that behavior explicitly.

Validate the config before rollout:

kt policy lint --file policy-config.yaml
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Then inspect gateway logs for excluded targets and exclusion reasons. If no target remains after filtering, Keeptrusts should return 403 with code: no_compliant_provider when block is configured. That is the right outcome. A failed compliant route is better than silent spillover into a non-compliant provider.

This is also where budget discipline matters. Region-locked or zero-retention targets can be more expensive and less numerous. Tie them to the correct wallet scope and keep the funding model explicit. Spend & Wallets is not just a finance page in this context; it is part of the enforcement story because the request is held when eligible balance is missing rather than being sent down a cheaper path.

Finally, do not confuse this control with prompt sanitation. data-routing-policy is a provider filter, not a content scanner. If the prompt contains PANs, email addresses, or internal secrets, residency alone does not fix the leak. Use Prevent Sensitive Data Leaks in AI Requests as the content-side companion pattern.

Results and impact

The immediate benefit is predictable routing. Teams stop assuming that high-risk traffic is "probably" using the right provider. The gateway either has an eligible target after filtering or it does not.

The second benefit is tighter control over fallback behavior. Fallback is useful for availability, but it can undermine residency if every backup target is not equally governed. Pre-routing provider filtering fixes that by shrinking the candidate set before availability logic runs.

The third benefit is better separation of responsibilities. Privacy, legal, and security teams can review declared provider handling metadata. Platform teams can validate that runtime routing honors it. Application teams do not need to embed region logic in every client.

Key takeaways

Data residency in AI is a routing problem, not just a contract problem.
Data Routing Policy enforces declared provider metadata before normal routing continues.
Use PII Detector and DLP Filter alongside residency controls because the right geography does not sanitize the wrong content.
Start with warn when metadata is incomplete, then move to block once the provider inventory is trustworthy.
Use Spend & Wallets to prevent budget pressure from undermining the compliant route.

Data Residency Enforcement: Keeping AI Data in the Right Geography

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​