AI Security Incident Response: Complete Playbook

AI incident response gets messy when teams treat the model provider, the application, and the governance layer as separate stories. They are not. A prompt-injection wave, a data-leak attempt, a provider-side trust issue, or a spend-exhaustion event all cross those boundaries quickly. The value of Keeptrusts in incident response is that it gives responders one place to tighten controls, inspect decisions, and export evidence while the incident is still active.

Use this page when

You want a practical incident-response sequence for AI-specific security events.
You need to move from detection to containment without editing every client or service.
You want a playbook that uses documented Keeptrusts controls rather than generic AI governance language.

Primary audience

Primary: Security responders, platform engineers, and technical incident commanders
Secondary: Technical Leaders, compliance stakeholders, AI Agents

The control map

Keep these five references in scope throughout the incident: Rate Limits, Audit Logger, Prompt Injection Detection, Block Prompt Injection Attacks Before They Reach Your Models, and Prevent Sensitive Data Leaks in AI Requests.

Phase 1: Prepare before the incident

AI incident response is slowest when the first useful control has to be invented mid-incident. Preparation should therefore focus on pre-authoring a small number of safe baseline configurations.

At minimum, keep a normal production policy and an emergency containment policy ready to deploy. The baseline should already include prompt-boundary defense, data protection appropriate to the workload, explicit audit visibility, and scoped quotas. The emergency version should tighten those controls and narrow provider eligibility.

Just as important, make sure the team already knows how to gather evidence. The CLI and export workflows should be part of routine operations before the day you need them urgently.

Phase 2: Detect and classify

AI incidents usually surface through one of four signals.

A spike in blocked or escalated request events.
A report that suspicious content reached a model or tried to reveal hidden context.
A provider issue that changes your trust assumptions.
A cost or traffic surge that suggests abuse rather than organic demand.

Your first task is classification, not overreaction. Ask which boundary is failing.

Input boundary: likely prompt injection or adversarial content.
Data boundary: likely sensitive-data leakage or redaction failure.
Provider boundary: likely routing or trust problem.
Capacity and cost boundary: likely rate or wallet abuse.

That classification determines which controls need to tighten first.

Phase 3: Contain at the gateway

Containment is where the gateway earns its keep. Instead of pushing urgent changes across many applications, you adjust the control point in front of providers.

This emergency configuration is intentionally strict. It tightens prompt-injection checks, blocks request-side sensitive data, constrains provider eligibility, and lowers global throughput while the incident is active.

pack:
  name: ai-ir-emergency-containment
  version: 1.0.0
  enabled: true

providers:
  routing:
    strategy: ordered
  targets:
    - id: backup-zdr
      provider: openai
      model: gpt-5.4-mini-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      data_policy:
        zero_data_retention: true
        training_opt_out: true
        retention_days: 0
        in_memory_only: true
        sanitized: true
        accepts_tokenized_input: true
        allow_internet_egress: false
        local_only_processing: true

rate_limits:
  global:
    rpm: 200
    tpm: 300000
    max_parallel_requests: 20

policies:
  chain:
    - prompt-injection
    - pii-detector
    - data-routing-policy
    - audit-logger

policy:
  prompt-injection:
    use_embedding: true
    detection:
      embedding_threshold: 0.78
      attack_patterns:
        - "ignore.*previous.*instructions"
        - "reveal.*system.*prompt"
        - "print.*all.*hidden.*context"
        - "dump.*conversation.*history"
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  pii-detector:
    action: block
    healthcare_mode: false
    pci_mode: true
    detect_patterns:
      - 'AKIA[0-9A-Z]{16}'
      - 'ghp_[0-9A-Za-z]{36}'
      - '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers: {}

  data-routing-policy:
    require_zero_data_retention: true
    require_no_training: true
    max_retention_days: 0
    require_in_memory_only: true
    allow_internet_egress: false
    local_only_processing: true
    on_no_compliant_provider: block
    log_provider_selection: true

  audit-logger: {}

This is not meant to be comfortable. It is meant to be survivable. In incident response, narrower is often safer than smarter.

Phase 4: Gather evidence while the system is stable enough to observe

Once the containment config is in place, start collecting evidence from the governed stream. You want time-bounded, queryable data tied to the actual enforcement point.

kt policy lint --file ai-ir-emergency-containment.yaml
kt events tail --since 30m --json
kt escalation list
kt events export --since 24h --format json --output ai-ir-events.json

Those commands answer four core questions.

Did the emergency config validate?
What verdicts are firing right now?
Is there human-review work building up that needs staffing?
Do we have a durable artifact for later review?

If the incident window is larger or the audience is broader, use export jobs as well so the evidence package is asynchronous and easier to hand off.

Phase 5: Investigate the scope

With evidence in hand, determine the real blast radius.

Look for repeated reason_code values, shared identities, affected gateways, specific providers, and configuration versions. The important discipline here is not to chase every symptom at once. Start with the narrowest explanation the event stream supports.

If the issue is prompt-boundary abuse, focus on the blocked and escalated traffic that triggered Prompt Injection Detection. If it is a data boundary issue, review the requests that should have been covered by Prevent Sensitive Data Leaks in AI Requests. If it is a provider trust issue, review provider selection and exclusion behavior under Data Routing Policy. If it is cost abuse, correlate the event stream with wallet balance and rate-limit behavior.

The point is not to make the event stream say everything. The point is to let it narrow the search enough that the rest of your investigation stays disciplined.

Phase 6: Recover deliberately

Recovery means restoring useful service without undoing the lessons of the incident.

Do not immediately revert to the old configuration because the urgent symptom is gone. Instead, test the recovered lane against the reason the incident happened.

Send safe requests through the restored path.
Confirm blocked traffic still blocks.
Confirm compliant providers remain selectable.
Confirm global and scoped quotas still protect the lane.
Confirm events and exports still reflect the new steady state.

Recovery should feel slightly slower than a panic revert. That is a feature. A rushed rollback is how incidents repeat.

Phase 7: Learn and harden

The final stage is where AI incident response often fails. Teams close the ticket and keep the same blind spots.

Instead, translate the incident into tighter artifacts.

Add or refine attack patterns for prompt-injection defense.
Promote temporary throttles into permanent scoped rate limits if they proved useful.
Narrow provider metadata requirements if the incident exposed routing ambiguity.
Update reviewer playbooks and export routines so evidence collection is faster next time.

This is also where Audit Logger and the surrounding evidence surfaces matter most. Incidents teach better lessons when the team can inspect what the platform actually did, not what everyone thinks it probably did.

Key takeaways

Good AI incident response starts with pre-authored normal and emergency gateway configurations.
Containment should happen at the gateway so clients do not need emergency edits at the same time.
Use Prompt Injection Detection, Prevent Sensitive Data Leaks in AI Requests, routing policy, and Rate Limits as separate levers for separate failure modes.
Keep Audit Logger active so events, escalations, and exports remain easy to trust.
Recovery is not done until the safer steady state is validated and the playbook is improved.

AI Security Incident Response: Complete Playbook

Use this page when​

Primary audience​

The control map​

Phase 1: Prepare before the incident​

Phase 2: Detect and classify​

Phase 3: Contain at the gateway​

Phase 4: Gather evidence while the system is stable enough to observe​

Phase 5: Investigate the scope​

Phase 6: Recover deliberately​

Phase 7: Learn and harden​

Key takeaways​

Next steps​