Spain's AI Regulatory Sandbox: Testing Innovation with Full Governance
Spain's AI regulatory sandbox is important because it rejects a lazy idea that still shows up in too many pilots: the idea that experimentation can happen first and governance can be added later. The Spanish sandbox, established to support preparation for the EU AI Act, is valuable precisely because it turns testing into a controlled exercise. It does not suspend privacy law, remove product accountability, or excuse weak internal governance. Keeptrusts is useful in that environment because it gives teams a way to make their test controls visible at the route level before the system graduates into production.
For organizations building or deploying potentially high-impact AI systems, the sandbox should be treated as a rehearsal for durable compliance. That means evidence, review gates, data-handling controls, and technical documentation need to be part of the pilot itself, not a cleanup task after the results look promising.
Use this page when
- You are planning to participate in or align with Spain's AI sandbox model and want a technical governance pattern that survives beyond the pilot.
- You need to show that testing under supervision still includes privacy, security, and human review controls.
- You want to map Keeptrusts controls to an EU AI Act readiness program instead of a loose experimentation workflow.
Primary audience
- Primary: Product compliance leads, platform engineers, risk managers
- Secondary: Innovation teams, privacy officers, internal audit
The problem
Sandbox projects often fail for a basic reason: the organization treats the pilot like an exception process rather than an evidence-building process.
A project team defines a narrow use case, gets approval to test it, and then optimizes only for model quality or user adoption. Data minimization is handled through training slides instead of enforcement. Human review exists as a procedural promise rather than a technical stop. Logs are captured inconsistently, and no one decides what evidence will matter if the system later needs a conformity assessment, a procurement review, or a regulator conversation.
That is a weak pattern in any jurisdiction, but it is especially weak in Spain because the sandbox sits alongside real legal duties. GDPR still applies. Sector requirements still apply. If the system could become high risk under the EU AI Act, then documentation, human oversight, risk management, and post-market thinking need to start early. The Spanish Agency for the Supervision of Artificial Intelligence (AESIA) can help structure the testing environment, but it does not turn governance into an optional workstream.
There is also a practical transition problem. Pilots rarely stay frozen. Once a business sponsor sees promising results, the pressure to widen the scope arrives quickly. New teams want access, different data sources are added, and the output begins to influence operational decisions. If the pilot route was permissive, the organization ends up trying to retrofit controls during expansion. That is harder than starting with a governed test lane.
The solution
The right sandbox approach is to make the pilot stricter than the eventual low-risk production path, not looser.
Start by defining an explicit route for the test use case. Limit who can use it, what data can enter it, which provider paths are allowed, and what kind of output can leave it. Use prompt-injection to protect the request boundary and data-routing-policy to make provider approval enforceable. Use citation-verifier when answers should stay tied to approved documents or internal knowledge artifacts. Add human-oversight when the pilot output should be reviewed before anyone acts on it. Keep audit-logger in the chain so the policy decision stream records that audit logging is active for the route.
This gives the sandbox two things pilots usually lack. First, it provides a technical boundary that aligns with the documentation narrative. Second, it creates reusable evidence. If the project later moves toward a higher-risk classification or a formal assessment, the organization can show what the pilot controlled, how exceptions were handled, and when human review occurred.
A well-governed sandbox also improves product quality. Teams learn early whether the use case still works after redaction, provider restrictions, source verification, and review gating are applied. That is more valuable than a pilot that looks good only because it ignored the controls the real deployment will eventually need.
Implementation
The example below shows a supervised document-assistant route suitable for sandbox testing. It assumes the project should answer from approved internal documents, restrict providers to approved paths, and escalate outputs for review before they are reused in live operations.
pack:
name: spain-sandbox-governed-pilot
version: "1.0.0"
enabled: true
providers:
targets:
- id: sandbox-approved-provider
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0
accepts_tokenized_input: true
policies:
chain:
- prompt-injection
- data-routing-policy
- citation-verifier
- human-oversight
- audit-logger
policy:
prompt-injection:
use_embedding: false
detection:
attack_patterns:
- "ignore.*previous.*instructions"
- "reveal.*system.*prompt"
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true
data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
tokenize_sensitive_fields: true
on_no_compliant_provider: block
log_provider_selection: true
citation-verifier:
require_sources: true
require_source_match: true
output_action:
unverified_action: block
human-oversight:
action: escalate
audit-logger: {}
This route is useful in a sandbox because it creates a realistic governance baseline. The assistant cannot answer from nowhere. It cannot silently fall back to a non-approved provider. It cannot bypass review just because the pilot team is moving fast. If the project later proves to be lower risk than expected, you can relax the chain carefully. If it proves to be higher risk, you already have the evidence discipline and review controls in place.
The most relevant support pages are EU AI Act, Pass Compliance Audits, Configuration & Policy Overview, Human Oversight, and Export Evidence for a Review. Those pages help teams connect pilot design to the broader governance work that the sandbox is meant to accelerate.
Results and impact
The main benefit is that the sandbox becomes a proving ground for governance, not just model performance. That changes the internal conversation. Product teams learn whether the use case still works once citations, provider restrictions, and review gates are applied. Compliance teams get earlier visibility into operational behavior. Internal audit gets a clearer story about how the pilot was controlled and what evidence exists.
It also makes transition decisions easier. If the pilot is not ready to move forward, the reason is visible. If it is ready, the organization is not starting its evidence collection and route-hardening work from zero. The sandbox has already done part of that job.
Key takeaways
- Spain's AI sandbox should be used as a controlled governance exercise, not a temporary exemption from normal controls.
- Pilots should be stricter than future low-risk production lanes, because the pilot must generate evidence and reveal governance weaknesses early.
citation-verifier,human-oversight, anddata-routing-policyare especially useful in supervised testing environments.- A sandbox route is more valuable when it proves that the use case still works after controls are applied.
- Keeptrusts can make the pilot's control story concrete, but organizations still need product documentation, risk assessment, and privacy review outside the gateway.
Next steps
- Review EU AI Act to understand how sandbox learning should feed into later assessment work.
- Use Pass Compliance Audits to define the evidence you want the pilot to generate.
- Build the route in Configuration & Policy Overview.
- Add a review stop with Human Oversight where the pilot output should never bypass a person.
- Prepare the supporting artifact trail with Export Evidence for a Review.