OWASP Top 10 for LLM: Governance Mitigations for Each Risk

The most useful way to read OWASP guidance for LLM systems is not as a poster. It is as a control-design prompt. Every risk category asks the same question in a different form: what can go wrong at runtime, and what boundary will stop it before the system does something expensive, unsafe, or non-compliant? Keeptrusts is effective in that conversation because it gives teams enforceable controls at the request boundary, the tool boundary, the output boundary, and the evidence boundary.

Use this page when

You want to translate common OWASP-for-LLM risk categories into concrete governance controls.
You need one control map that engineers can deploy and security teams can review.
You want to avoid “one policy fixes everything” thinking and instead build a layered chain.

Primary audience

Primary: Security engineers and platform engineers
Secondary: Technical Leaders evaluating AI risk treatment plans

The problem

OWASP-style LLM risks rarely arrive one at a time. A prompt injection attempt can also be a data-leak attempt. Tool misuse can be preceded by undeclared tools, then followed by dangerous arguments, then hidden under repeated low-variance traffic from the same script. If you answer each risk with a different spreadsheet and no runtime chain, you end up with documentation coverage instead of risk coverage.

The second problem is overconfidence in generic safeguards. Teams often say they have moderation, logging, or human review, but that still leaves large holes. Moderation does not validate declared tools. Logging does not keep a dangerous tool call from executing. Human review is useless if the system has already sent sensitive data to the wrong provider. OWASP is most valuable when it forces you to separate prevention, containment, and evidence.

The solution

Map each major LLM risk to the closest Keeptrusts control surface, then chain those controls in the order they can actually interrupt the failure.

For prompt-driven attacks, use Prompt Injection Detection and the operational guidance in Block Prompt Injection Attacks Before They Reach Your Models. That covers adversarial instructions, fake boundaries, encoded payloads, and embedding-confirmed similarity checks.

For sensitive-information disclosure, use DLP Filter, PII Detector, and Data Routing Policy. Those three answer different questions: should this content block, should structured identifiers redact, and is any configured provider allowed to receive the remaining request at all.

For abuse, automation, and denial-of-wallet style traffic, use Bot Detector and rate-limiting guidance from Policy Rate Limits Configuration or Advanced Rate Limiting. Bot detection looks for duplicate fingerprints or highly similar prompts inside a rolling window. Rate limits then cap what still gets through.

For excessive agency and insecure tool use, the correct stack is Tool Validation, Tool Security, and Agent Firewall. Validation checks whether the requested tool was declared. Security scans the serialized request for dangerous substrings or blocked entity types. The firewall governs exact tool names, action counts, transaction thresholds, and suspicious patterns.

For insecure output handling, use the Code Sanitizer policy. The implemented policy_kind is code-sanitizer, and it scans joined response text for dangerous built-in patterns plus your additional regexes. That matters when generated shell commands or SQL may be copied into automation.

Finally, for weak visibility and weak evidence, keep Audit Logger in the chain, then use kt events and Regulated Execution when a review needs exported decision records or DSSE-signed evidence.

Implementation

The easiest way to operationalize the OWASP list is to build one baseline chain that covers the most common failure modes and then tune the individual policies as you learn.

pack:
  name: owasp-llm-baseline
  version: 1.0.0
  enabled: true

providers:
  targets:
    - id: openai-zdr
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0

policies:
  chain:
    - prompt-injection
    - dlp-filter
    - pii-detector
    - bot-detector
    - tool-validation
    - tool-security
    - agent-firewall
    - data-routing-policy
    - code-sanitizer
    - audit-logger

policy:
  prompt-injection:
    use_embedding: true
    detection:
      embedding_threshold: 0.8
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  dlp-filter:
    detect_patterns:
      - 'AKIA[0-9A-Z]{16}'
      - 'ghp_[0-9A-Za-z]{36}'
    action: block

  pii-detector:
    action: redact

  bot-detector:
    action: warn
    similarity_threshold: 0.9
    max_requests_per_window: 5

  tool-validation:
    declared_tools:
      - web_search
      - knowledge_lookup
    allow_undeclared: false

  tool-security:
    analysis_mode: local

  agent-firewall:
    blocked_tools:
      - shell_command
      - delete_database
    max_actions_per_window: 2

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    on_no_compliant_provider: block

  code-sanitizer:
    enabled: true
    block_on_match: true
    additional_patterns:
      - 'curl\s+https?://localhost'

  audit-logger: {}

The point of this chain is not perfection. It is coverage. The request boundary blocks adversarial input, secrets, and obvious abuse. The tool boundary rejects undeclared or unsafe execution paths. The provider boundary enforces data handling constraints. The output boundary catches clearly dangerous code patterns. The audit surface preserves the result for later analysis.

You can then verify the control mix with the decision stream:

kt policy lint --file owasp-llm-baseline.yaml
kt events tail --since 1h --verdict blocked --json

Results and impact

This mapping changes OWASP from a generic risk list into a deployable security baseline. Teams stop asking whether they “cover OWASP” and start asking whether a specific runtime stage blocks, redacts, escalates, or records a specific risk. That is a much healthier conversation.

It also improves prioritization. If you only have time to harden one area first, the mapping makes the gap obvious. For most organizations, prompt injection, data leakage, and tool abuse are higher-value first controls than later-stage reporting polish. OWASP is most useful when it helps sequence work rather than flatten it.

Key takeaways

OWASP-for-LLM risks need a layered chain, not a single catch-all safeguard.
Prompt Injection Detection, DLP Filter, and PII Detector cover different content risks and should not be collapsed into one story.
Tool Validation, Tool Security, and Agent Firewall are the core Keeptrusts controls for excessive agency.
The implemented code-output control is code-sanitizer, documented on Code Sanitizer.
Visibility still matters: use kt events and, where needed, Regulated Execution for stronger evidence.

OWASP Top 10 for LLM: Governance Mitigations for Each Risk

Use this page when​

Primary audience​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​