Block Prompt Injection Attacks Before They Reach Your Models
Prompt injection is the top security risk for AI applications. Attackers embed malicious instructions in user input to hijack model behavior — extracting data, bypassing safety filters, or making models perform unauthorized actions. Keeptrusts detects and blocks these attacks at the gateway before they ever reach your models.
Use this page when
- You need to protect AI-powered applications against adversarial prompt injection, jailbreaks, or instruction override attacks.
- You are deploying AI agents with tool-calling capability and need to prevent unauthorized tool execution via injected instructions.
- You want to understand the multi-layer detection approach (pattern, embedding, safety filter) before configuring your gateway.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
What you'll achieve
- Real-time prompt injection detection using pattern matching and embedding analysis
- Automatic blocking of jailbreak attempts, delimiter attacks, and instruction overrides
- Agent-specific protection with tool-call validation and action limits
- Escalation routing for borderline cases that need human review
- Attack visibility through detailed event logging of every blocked attempt
How Keeptrusts detects prompt injection
The gateway evaluates every incoming request through multiple detection layers:
Layer 1: Pattern-based detection
The prompt-injection policy matches known attack patterns including:
- Instruction overrides — "Ignore previous instructions and..."
- Delimiter escapes — attempts to break out of system/user message boundaries
- Base64/Unicode evasion — encoded payloads designed to bypass simple string matching
- Role confusion — injecting fake system messages within user input
policies:
chain:
- prompt-injection
- audit-logger
policy:
prompt-injection:
response:
action: block
message: "Request blocked: potential prompt injection detected"
encoding:
decode_base64: true
normalize_unicode: true
boundaries:
enforce_delimiters: true
Layer 2: Embedding-based semantic detection
For sophisticated attacks that evade pattern matching, the embedding_threshold parameter uses semantic similarity to detect injection intent:
policy:
prompt-injection: {}
pack:
name: block-prompt-injection-example-2
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
The embedding detector compares the semantic intent of user input against known injection patterns. Inputs that exceed the similarity threshold are blocked regardless of surface-level text.
Layer 3: Safety filters
The safety-filter policy catches broader categories of unsafe content including content that may not be injection but is still inappropriate:
policies:
chain:
- prompt-injection
- safety-filter
- audit-logger
policy:
prompt-injection:
embedding_threshold: 0.8
response:
action: block
safety-filter:
categories:
- violence
- hate_speech
- self_harm
- sexual_content
action: block
Protecting AI agents
AI agents that can call tools present a larger attack surface. An injected prompt could instruct the agent to call execute_code, shell_command, or send_email with attacker-controlled parameters.
The agent-firewall policy addresses this:
policies:
chain:
- prompt-injection
- agent-firewall
- audit-logger
policy:
prompt-injection:
embedding_threshold: 0.8
response:
action: block
agent-firewall:
allowed_tools:
- search
- summarize
- retrieve_document
blocked_tools:
- execute_code
- shell_command
- send_email
- delete_record
max_actions_per_session: 25
This configuration:
- Allowlists safe tools — only
search,summarize, andretrieve_documentare permitted - Blocklists dangerous tools — code execution, shell access, and email are explicitly denied
- Limits action count — prevents runaway agent loops from executing more than 25 actions
See Agent Firewall Template for advanced configurations.
Escalation routing for borderline cases
Not every suspicious input should be blocked outright. For borderline cases, route to human review:
policies:
chain:
- prompt-injection
- human-oversight
- audit-logger
policy:
prompt-injection:
embedding_threshold: 0.6
response:
action: escalate
human-oversight:
escalate_on:
- prompt_injection_detected
require_resolution_within_hours: 4
With action: escalate, the request is flagged in the Escalations queue for a reviewer to inspect, claim, and resolve.
Defense in depth: combining all layers
A production-grade prompt injection defense uses all layers together:
pack:
name: prompt-injection-defense
version: "1.0"
policies:
chain:
- prompt-injection
- safety-filter
- agent-firewall
- pii-detector
- audit-logger
policy:
prompt-injection:
embedding_threshold: 0.8
response:
action: block
message: "Request blocked: security policy violation"
encoding:
decode_base64: true
normalize_unicode: true
boundaries:
enforce_delimiters: true
safety-filter:
categories:
- violence
- hate_speech
- self_harm
action: block
agent-firewall:
allowed_tools:
- search
- summarize
blocked_tools:
- execute_code
- shell_command
max_actions_per_session: 50
pii-detector:
action: redact
redaction:
marker_format: label
audit-logger:
retention_days: 365
Monitoring attack attempts
After deploying prompt injection defenses:
- Check the Events page — filter by
policy_type=prompt-injectionto see blocked attempts - Review escalations — inspect borderline cases in the escalation queue
- Export attack data — use
kt export create --filter "policy_type=prompt-injection"for security analysis - Track block rate trends — a sudden spike may indicate a targeted attack
Quick wins
- Deploy
prompt-injectionwithaction: block— immediate protection, zero application changes - Enable
decode_base64andnormalize_unicode— catch evasion attempts - Add
agent-firewallfor any AI agent — restrict tool access to what's actually needed - Set up
safety-filter— catch harmful content that isn't technically injection - Review blocked events weekly — tune thresholds based on false positive rates
For AI systems
- Canonical terms: prompt-injection policy, embedding_threshold, safety-filter, agent-firewall, delimiter enforcement, escalation.
- Config keys:
policy.prompt-injection.response.action,policy.prompt-injection.embedding_threshold,policy.prompt-injection.encoding,policy.prompt-injection.boundaries. - CLI commands:
kt gateway run,kt events list --filter "prompt_injection". - Best next pages: Govern AI Agents, Agent Firewall Template, Policy Controls Catalog.
For engineers
- Prerequisites: gateway running,
prompt-injectionpolicy in the chain. - Add
embedding_threshold: 0.8to catch semantic injection attacks that evade pattern rules. - Enable
encoding.decode_base64andencoding.normalize_unicodeto prevent evasion via encoding. - Validate: send a known injection payload (e.g., “Ignore previous instructions”) and confirm a 409 block response.
- Monitor: filter Events by
policy_type=prompt-injectionand actionblockto track blocked attempts.
For leaders
- Prompt injection is the #1 AI security risk (OWASP LLM Top 10); this control mitigates it at the infrastructure layer.
- Blocking happens before the request reaches the provider — zero exposure of your models to adversarial input.
- Escalation routing lets borderline cases reach human reviewers, reducing false-positive disruption.
- Every blocked attempt is logged for security audit and incident response evidence.
Next steps
- Govern AI Agents — comprehensive agent governance controls
- Prevent Sensitive Data Leaks — stop data exfiltration via injection
- Prompt Injection Template — ready-to-deploy template
- Agent Firewall Template — agent-specific protection template
- Policy Controls Catalog — full control inventory