Skip to main content

OWASP Top 10 for LLM: Governance Mitigations for Each Risk

OWASP Top 10 for LLM: Governance Mitigations for Each Risk

The most useful way to read OWASP guidance for LLM systems is not as a poster. It is as a control-design prompt. Every risk category asks the same question in a different form: what can go wrong at runtime, and what boundary will stop it before the system does something expensive, unsafe, or non-compliant? Keeptrusts is effective in that conversation because it gives teams enforceable controls at the request boundary, the tool boundary, the output boundary, and the evidence boundary.

Use this page when

  • You want to translate common OWASP-for-LLM risk categories into concrete governance controls.
  • You need one control map that engineers can deploy and security teams can review.
  • You want to avoid “one policy fixes everything” thinking and instead build a layered chain.

Primary audience

  • Primary: Security engineers and platform engineers
  • Secondary: Technical Leaders evaluating AI risk treatment plans

The problem

OWASP-style LLM risks rarely arrive one at a time. A prompt injection attempt can also be a data-leak attempt. Tool misuse can be preceded by undeclared tools, then followed by dangerous arguments, then hidden under repeated low-variance traffic from the same script. If you answer each risk with a different spreadsheet and no runtime chain, you end up with documentation coverage instead of risk coverage.

The second problem is overconfidence in generic safeguards. Teams often say they have moderation, logging, or human review, but that still leaves large holes. Moderation does not validate declared tools. Logging does not keep a dangerous tool call from executing. Human review is useless if the system has already sent sensitive data to the wrong provider. OWASP is most valuable when it forces you to separate prevention, containment, and evidence.

The solution

Map each major LLM risk to the closest Keeptrusts control surface, then chain those controls in the order they can actually interrupt the failure.

For prompt-driven attacks, use Prompt Injection Detection and the operational guidance in Block Prompt Injection Attacks Before They Reach Your Models. That covers adversarial instructions, fake boundaries, encoded payloads, and embedding-confirmed similarity checks.

For sensitive-information disclosure, use DLP Filter, PII Detector, and Data Routing Policy. Those three answer different questions: should this content block, should structured identifiers redact, and is any configured provider allowed to receive the remaining request at all.

For abuse, automation, and denial-of-wallet style traffic, use Bot Detector and rate-limiting guidance from Policy Rate Limits Configuration or Advanced Rate Limiting. Bot detection looks for duplicate fingerprints or highly similar prompts inside a rolling window. Rate limits then cap what still gets through.

For excessive agency and insecure tool use, the correct stack is Tool Validation, Tool Security, and Agent Firewall. Validation checks whether the requested tool was declared. Security scans the serialized request for dangerous substrings or blocked entity types. The firewall governs exact tool names, action counts, transaction thresholds, and suspicious patterns.

For insecure output handling, use the Code Sanitizer policy. The implemented policy_kind is code-sanitizer, and it scans joined response text for dangerous built-in patterns plus your additional regexes. That matters when generated shell commands or SQL may be copied into automation.

Finally, for weak visibility and weak evidence, keep Audit Logger in the chain, then use kt events and Regulated Execution when a review needs exported decision records or DSSE-signed evidence.

Implementation

The easiest way to operationalize the OWASP list is to build one baseline chain that covers the most common failure modes and then tune the individual policies as you learn.

pack:
name: owasp-llm-baseline
version: 1.0.0
enabled: true

providers:
targets:
- id: openai-zdr
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_API_KEY
data_policy:
zero_data_retention: true
training_opt_out: true
retention_days: 0

policies:
chain:
- prompt-injection
- dlp-filter
- pii-detector
- bot-detector
- tool-validation
- tool-security
- agent-firewall
- data-routing-policy
- code-sanitizer
- audit-logger

policy:
prompt-injection:
use_embedding: true
detection:
embedding_threshold: 0.8
encoding:
decode_base64: true
normalize_unicode: true
detect_homoglyphs: true
boundaries:
enforce_delimiters: true
reject_fake_boundaries: true

dlp-filter:
detect_patterns:
- 'AKIA[0-9A-Z]{16}'
- 'ghp_[0-9A-Za-z]{36}'
action: block

pii-detector:
action: redact

bot-detector:
action: warn
similarity_threshold: 0.9
max_requests_per_window: 5

tool-validation:
declared_tools:
- web_search
- knowledge_lookup
allow_undeclared: false

tool-security:
analysis_mode: local

agent-firewall:
blocked_tools:
- shell_command
- delete_database
max_actions_per_window: 2

data-routing-policy:
require_zero_data_retention: true
require_no_training: true
max_retention_days: 0
on_no_compliant_provider: block

code-sanitizer:
enabled: true
block_on_match: true
additional_patterns:
- 'curl\s+https?://localhost'

audit-logger: {}

The point of this chain is not perfection. It is coverage. The request boundary blocks adversarial input, secrets, and obvious abuse. The tool boundary rejects undeclared or unsafe execution paths. The provider boundary enforces data handling constraints. The output boundary catches clearly dangerous code patterns. The audit surface preserves the result for later analysis.

You can then verify the control mix with the decision stream:

kt policy lint --file owasp-llm-baseline.yaml
kt events tail --since 1h --verdict blocked --json

Results and impact

This mapping changes OWASP from a generic risk list into a deployable security baseline. Teams stop asking whether they “cover OWASP” and start asking whether a specific runtime stage blocks, redacts, escalates, or records a specific risk. That is a much healthier conversation.

It also improves prioritization. If you only have time to harden one area first, the mapping makes the gap obvious. For most organizations, prompt injection, data leakage, and tool abuse are higher-value first controls than later-stage reporting polish. OWASP is most useful when it helps sequence work rather than flatten it.

Key takeaways

Next steps