Spain's AI Regulatory Sandbox: Testing Innovation with Full Governance

Spain's AI regulatory sandbox is important because it rejects a lazy idea that still shows up in too many pilots: the idea that experimentation can happen first and governance can be added later. The Spanish sandbox, established to support preparation for the EU AI Act, is valuable precisely because it turns testing into a controlled exercise. It does not suspend privacy law, remove product accountability, or excuse weak internal governance. Keeptrusts is useful in that environment because it gives teams a way to make their test controls visible at the route level before the system graduates into production.

For organizations building or deploying potentially high-impact AI systems, the sandbox should be treated as a rehearsal for durable compliance. That means evidence, review gates, data-handling controls, and technical documentation need to be part of the pilot itself, not a cleanup task after the results look promising.

Use this page when

You are planning to participate in or align with Spain's AI sandbox model and want a technical governance pattern that survives beyond the pilot.
You need to show that testing under supervision still includes privacy, security, and human review controls.
You want to map Keeptrusts controls to an EU AI Act readiness program instead of a loose experimentation workflow.

Primary audience

Primary: Product compliance leads, platform engineers, risk managers
Secondary: Innovation teams, privacy officers, internal audit

The problem

Sandbox projects often fail for a basic reason: the organization treats the pilot like an exception process rather than an evidence-building process.

A project team defines a narrow use case, gets approval to test it, and then optimizes only for model quality or user adoption. Data minimization is handled through training slides instead of enforcement. Human review exists as a procedural promise rather than a technical stop. Logs are captured inconsistently, and no one decides what evidence will matter if the system later needs a conformity assessment, a procurement review, or a regulator conversation.

That is a weak pattern in any jurisdiction, but it is especially weak in Spain because the sandbox sits alongside real legal duties. GDPR still applies. Sector requirements still apply. If the system could become high risk under the EU AI Act, then documentation, human oversight, risk management, and post-market thinking need to start early. The Spanish Agency for the Supervision of Artificial Intelligence (AESIA) can help structure the testing environment, but it does not turn governance into an optional workstream.

There is also a practical transition problem. Pilots rarely stay frozen. Once a business sponsor sees promising results, the pressure to widen the scope arrives quickly. New teams want access, different data sources are added, and the output begins to influence operational decisions. If the pilot route was permissive, the organization ends up trying to retrofit controls during expansion. That is harder than starting with a governed test lane.

The solution

The right sandbox approach is to make the pilot stricter than the eventual low-risk production path, not looser.

Start by defining an explicit route for the test use case. Limit who can use it, what data can enter it, which provider paths are allowed, and what kind of output can leave it. Use prompt-injection to protect the request boundary and data-routing-policy to make provider approval enforceable. Use citation-verifier when answers should stay tied to approved documents or internal knowledge artifacts. Add human-oversight when the pilot output should be reviewed before anyone acts on it. Keep audit-logger in the chain so the policy decision stream records that audit logging is active for the route.

This gives the sandbox two things pilots usually lack. First, it provides a technical boundary that aligns with the documentation narrative. Second, it creates reusable evidence. If the project later moves toward a higher-risk classification or a formal assessment, the organization can show what the pilot controlled, how exceptions were handled, and when human review occurred.

A well-governed sandbox also improves product quality. Teams learn early whether the use case still works after redaction, provider restrictions, source verification, and review gating are applied. That is more valuable than a pilot that looks good only because it ignored the controls the real deployment will eventually need.

Implementation

The example below shows a supervised document-assistant route suitable for sandbox testing. It assumes the project should answer from approved internal documents, restrict providers to approved paths, and escalate outputs for review before they are reused in live operations.

pack:
  name: spain-sandbox-governed-pilot
  version: "1.0.0"
  enabled: true

providers:
  targets:
    - id: sandbox-approved-provider
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0
        accepts_tokenized_input: true

policies:
  chain:
    - prompt-injection
    - data-routing-policy
    - citation-verifier
    - human-oversight
    - audit-logger

policy:
  prompt-injection:
    use_embedding: false
    detection:
      attack_patterns:
        - "ignore.*previous.*instructions"
        - "reveal.*system.*prompt"
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    tokenize_sensitive_fields: true
    on_no_compliant_provider: block
    log_provider_selection: true

  citation-verifier:
    require_sources: true
    require_source_match: true
    output_action:
      unverified_action: block

  human-oversight:
    action: escalate

  audit-logger: {}

This route is useful in a sandbox because it creates a realistic governance baseline. The assistant cannot answer from nowhere. It cannot silently fall back to a non-approved provider. It cannot bypass review just because the pilot team is moving fast. If the project later proves to be lower risk than expected, you can relax the chain carefully. If it proves to be higher risk, you already have the evidence discipline and review controls in place.

The most relevant support pages are EU AI Act, Pass Compliance Audits, Configuration & Policy Overview, Human Oversight, and Export Evidence for a Review. Those pages help teams connect pilot design to the broader governance work that the sandbox is meant to accelerate.

Results and impact

The main benefit is that the sandbox becomes a proving ground for governance, not just model performance. That changes the internal conversation. Product teams learn whether the use case still works once citations, provider restrictions, and review gates are applied. Compliance teams get earlier visibility into operational behavior. Internal audit gets a clearer story about how the pilot was controlled and what evidence exists.

It also makes transition decisions easier. If the pilot is not ready to move forward, the reason is visible. If it is ready, the organization is not starting its evidence collection and route-hardening work from zero. The sandbox has already done part of that job.

Key takeaways

Spain's AI sandbox should be used as a controlled governance exercise, not a temporary exemption from normal controls.
Pilots should be stricter than future low-risk production lanes, because the pilot must generate evidence and reveal governance weaknesses early.
citation-verifier, human-oversight, and data-routing-policy are especially useful in supervised testing environments.
A sandbox route is more valuable when it proves that the use case still works after controls are applied.
Keeptrusts can make the pilot's control story concrete, but organizations still need product documentation, risk assessment, and privacy review outside the gateway.

Next steps

Review EU AI Act to understand how sandbox learning should feed into later assessment work.
Use Pass Compliance Audits to define the evidence you want the pilot to generate.
Build the route in Configuration & Policy Overview.
Add a review stop with Human Oversight where the pilot output should never bypass a person.
Prepare the supporting artifact trail with Export Evidence for a Review.

Spain's AI Regulatory Sandbox: Testing Innovation with Full Governance

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​