Automate AI Incident Response with PagerDuty
When AI governance policies trigger high-severity escalations, automated incident response ensures the right team is paged immediately. This guide covers mapping Keeptrusts escalations to PagerDuty incidents, on-call routing, runbook automation, and SLA tracking.
Use this page when
- You want to page on-call engineers when Keeptrusts escalations hit critical severity.
- You need to map escalation severity to PagerDuty incident urgency with tiered escalation policies.
- You are building webhook middleware that transforms Keeptrusts payloads into PagerDuty Events API v2 format.
- You want auto-resolution of PagerDuty incidents when escalations are resolved in Keeptrusts.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Architecture overview
Keeptrusts Gateway
→ policy violation detected
→ escalation created in Keeptrusts API
→ /v1/webhooks → PagerDuty Events API v2
→ PagerDuty service
→ on-call schedule
→ page responder
→ attach runbook
→ track SLA
Prerequisites
- PagerDuty account with Events API v2 access
- PagerDuty service configured for AI governance incidents
- Keeptrusts organization with webhook permissions
- Keeptrusts API key
PagerDuty service setup
Create a dedicated service
- Go to Services → Service Directory → New Service
- Name: "AI Governance — Keeptrusts"
- Integration: Events API v2
- Copy the Integration Key (routing key)
- Assign an escalation policy with appropriate on-call schedules
Escalation policy
Configure a tiered escalation policy:
| Level | Target | Timeout |
|---|---|---|
| 1 | AI Governance on-call engineer | 5 minutes |
| 2 | Platform team lead | 15 minutes |
| 3 | VP Engineering | 30 minutes |
Configure Keeptrusts webhook
Direct webhook to PagerDuty Events API
curl -X POST https://api.keeptrusts.com/v1/webhooks \
-H "Authorization: Bearer $KEEPTRUSTS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://events.pagerduty.com/v2/enqueue",
"description": "PagerDuty incidents for critical AI escalations",
"event_types": ["escalation.created"],
"active": true
}'
Webhook payload transformation
Use a middleware service (AWS Lambda, Cloudflare Worker) to transform Keeptrusts webhook payloads into PagerDuty Events API v2 format:
export default async function handler(req) {
const event = req.body;
const severity = mapSeverity(event);
const pdPayload = {
routing_key: process.env.PAGERDUTY_ROUTING_KEY,
event_action: 'trigger',
dedup_key: `keeptrusts-${event.escalation_id}`,
payload: {
summary: `AI Policy Escalation: ${event.policy_name} — ${event.action}`,
severity: severity,
source: `keeptrusts-gateway-${event.gateway_id}`,
component: event.model,
group: event.org_id,
class: 'ai-governance',
timestamp: event.timestamp,
custom_details: {
event_id: event.event_id,
escalation_id: event.escalation_id,
policy_name: event.policy_name,
model: event.model,
user_id: event.user_id,
console_url: `https://console.keeptrusts.com/escalations/${event.escalation_id}`,
},
},
links: [
{
href: `https://console.keeptrusts.com/escalations/${event.escalation_id}`,
text: 'View in Keeptrusts Console',
},
],
};
const response = await fetch('https://events.pagerduty.com/v2/enqueue', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(pdPayload),
});
return new Response(JSON.stringify({ status: response.status }), {
status: 200,
headers: { 'Content-Type': 'application/json' },
});
}
function mapSeverity(event) {
if (event.action === 'block') return 'critical';
if (event.action === 'escalate') return 'error';
if (event.action === 'redact') return 'warning';
return 'info';
}
Severity classification
Map Keeptrusts policy actions to PagerDuty severity levels:
| Keeptrusts Action | Policy Type | PagerDuty Severity | Example |
|---|---|---|---|
block | Content safety, compliance | critical | Harmful content blocked |
escalate | Review required | error | Sensitive topic flagged |
redact | Data protection | warning | PII redacted from response |
log | Audit only | info | Unusual usage pattern logged |
On-call routing by policy type
Route different policy violations to different PagerDuty services:
content-safety violations → "AI Safety" service → Safety team on-call
compliance violations → "AI Compliance" service → Legal/compliance on-call
cost-limit violations → "AI Cost Control" service → FinOps on-call
Implement routing in the middleware:
const serviceMap = {
'content-safety': process.env.PD_KEY_SAFETY,
'compliance': process.env.PD_KEY_COMPLIANCE,
'cost-limit': process.env.PD_KEY_FINOPS,
'default': process.env.PD_KEY_DEFAULT,
};
const routingKey = serviceMap[event.policy_type] || serviceMap['default'];
Runbook automation
Attach runbooks to PagerDuty incidents for guided response:
Runbook template
# AI Governance Escalation Response
## Triage
1. Open the Keeptrusts Console link in the incident
2. Review the event details: policy name, action, model, user
3. Check if this is a false positive or genuine violation
## Investigate
1. Run `kt events tail --filter escalation_id=<ID> --format json` for event context
2. Review the conversation history if available
3. Check the policy configuration: `kt policy lint --file <config>`
## Resolve
- **False positive**: Update the policy to reduce false positives, resolve the escalation
- **Genuine violation**: Document the finding, notify the user's manager, resolve
- **Policy gap**: Create a ticket to update the policy, resolve with follow-up
## Post-incident
1. Export evidence: `kt export create --format json --window 1h`
2. Update the escalation status in the Keeptrusts Console
3. Document lessons learned
Link runbook to PagerDuty service
- Go to the PagerDuty service → Integrations → Runbook Automation
- Add a link to the runbook URL
- Or embed automation steps using PagerDuty Automation Actions
SLA tracking
Track response and resolution SLAs for AI governance incidents:
| Severity | Response SLA | Resolution SLA |
|---|---|---|
| Critical | 5 minutes | 1 hour |
| Error | 15 minutes | 4 hours |
| Warning | 1 hour | 24 hours |
| Info | Next business day | 1 week |
Configure SLA tracking in PagerDuty:
- Go to Analytics → Service SLAs
- Set targets per service and priority level
- Enable SLA breach notifications
Auto-resolution
Automatically resolve PagerDuty incidents when escalations are resolved in Keeptrusts:
// When escalation is resolved in Keeptrusts
const resolvePayload = {
routing_key: process.env.PAGERDUTY_ROUTING_KEY,
event_action: 'resolve',
dedup_key: `keeptrusts-${event.escalation_id}`,
};
await fetch('https://events.pagerduty.com/v2/enqueue', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(resolvePayload),
});
Subscribe to the escalation.resolved event type in the Keeptrusts webhook.
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| No incidents created | Routing key invalid | Verify the integration key in PagerDuty service settings |
| Duplicate incidents | Dedup key not set | Ensure dedup_key uses the escalation ID |
| Wrong on-call paged | Escalation policy misconfigured | Check the schedule and escalation levels in PagerDuty |
| Incidents not auto-resolving | escalation.resolved webhook missing | Add the event type to the Keeptrusts webhook subscription |
For AI systems
- Canonical terms: Keeptrusts escalation,
/v1/webhooks,escalation.createdevent,escalation.resolvedevent, PagerDuty Events API v2, routing key,dedup_key, severity mapping. - Key config: PagerDuty Integration Key (routing key), webhook
event_types: ["escalation.created", "escalation.resolved"],dedup_key: keeptrusts-{escalation_id}. - Integration pattern: Keeptrusts webhook → middleware →
POST https://events.pagerduty.com/v2/enqueue. - Best next pages: Jira workflows, Slack & Teams alerting, Datadog observability.
For engineers
- Prerequisites: PagerDuty account with Events API v2 access, a PagerDuty service with Integration Key, Keeptrusts organization with webhook permissions.
- Validate: Trigger a test escalation, verify PagerDuty incident fires to the correct on-call schedule, resolve in Keeptrusts and confirm PagerDuty auto-resolves.
- Deduplication: Always set
dedup_keytokeeptrusts-{escalation_id}to prevent duplicate incidents on webhook retries. - Severity mapping: Map Keeptrusts
action=blockto PagerDutycritical,action=escalatetoerror,action=redacttowarning.
For leaders
- Incident response SLAs: Critical AI governance violations page the on-call engineer within 5 minutes; tiered escalation ensures coverage.
- Runbook automation: Attach AI-governance-specific runbooks to the PagerDuty service for consistent response procedures.
- Mean time to resolve: PagerDuty analytics track MTTR per severity level, informing governance process improvements.
- Auto-resolution: Bi-directional sync prevents stale incidents and reduces on-call noise.
Next steps
- Create Jira tickets from escalations for tracking
- Set up Slack & Teams alerts for real-time notifications
- Monitor with Datadog for performance correlation