Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

End-to-End Test Automation for AI Governance

A complete AI governance testing strategy spans the full stack — from gateway policy enforcement through the control-plane API to the management console and notification webhooks. This guide covers building an automated test framework that validates the entire governance pipeline.

Use this page when

  • You are building end-to-end test automation spanning gateway, API, console, and webhooks
  • You need API contract tests, Playwright console E2E tests, or webhook delivery verification
  • You want to set up CI pipelines that run all governance test surfaces in parallel

Primary audience

  • Primary: Technical Engineers
  • Secondary: AI Agents, Technical Leaders

Test Framework Architecture

The Keeptrusts governance platform has four testable surfaces:

┌─────────────────────────────────────────────────┐
│ Console (Playwright E2E) │
│ ├─ Dashboard displays events correctly │
│ ├─ Policy configuration UI works │
│ └─ Escalation workflows complete │
├─────────────────────────────────────────────────┤
│ API (Contract Tests) │
│ ├─ Events API returns correct schemas │
│ ├─ Auth flows enforce access control │
│ └─ Export jobs produce valid artifacts │
├─────────────────────────────────────────────────┤
│ Gateway (Policy Tests) │
│ ├─ Policies block/redact/escalate correctly │
│ ├─ Rate limits and spend controls enforce │
│ └─ Events emitted for every decision │
├─────────────────────────────────────────────────┤
│ Webhooks (Notification Tests) │
│ ├─ Escalation webhooks fire on trigger │
│ ├─ Payload schemas match contract │
│ └─ Retry logic handles failures │
└─────────────────────────────────────────────────┘

API Contract Tests

API contract tests verify that the control-plane API returns responses matching the documented schema. These catch breaking changes before they reach consumers.

Setting Up Contract Tests

#!/bin/bash
# test-api-contracts.sh — verify API response schemas

API_URL="http://localhost:8080"
API_TOKEN="$TEST_API_TOKEN"
FAILURES=0

# Helper function
assert_status() {
local NAME="$1"
local EXPECTED="$2"
local ACTUAL="$3"

if [ "$EXPECTED" = "$ACTUAL" ]; then
echo "PASS [$NAME]: HTTP $ACTUAL"
else
echo "FAIL [$NAME]: Expected HTTP $EXPECTED, got $ACTUAL"
FAILURES=$((FAILURES + 1))
fi
}

assert_json_field() {
local NAME="$1"
local BODY="$2"
local FIELD="$3"

VALUE=$(echo "$BODY" | jq -r "$FIELD")
if [ "$VALUE" != "null" ] && [ -n "$VALUE" ]; then
echo "PASS [$NAME]: Field $FIELD present"
else
echo "FAIL [$NAME]: Field $FIELD missing or null"
FAILURES=$((FAILURES + 1))
fi
}

# Test: Login endpoint
echo "=== Auth Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/v1/auth/login" \
-H "Content-Type: application/json" \
-d '{"email": "admin@example.com", "password": "test-password"}')
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -n -1)
assert_status "auth-login" "200" "$HTTP_CODE"
assert_json_field "auth-login-token" "$BODY" ".token"

# Test: Events list endpoint
echo ""
echo "=== Events Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/v1/events?limit=5" \
-H "Authorization: Bearer $API_TOKEN")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -n -1)
assert_status "events-list" "200" "$HTTP_CODE"
assert_json_field "events-data" "$BODY" ".data"
assert_json_field "events-pagination" "$BODY" ".pagination"

# Test: Health endpoint
echo ""
echo "=== Health Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/health")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
assert_status "health" "200" "$HTTP_CODE"

echo ""
echo "Contract test results: $FAILURES failure(s)"
[ "$FAILURES" -eq 0 ] || exit 1

Schema Validation with OpenAPI

Validate responses against the OpenAPI specification:

# Install openapi-diff for schema comparison
npm install -g openapi-diff

# Compare current API responses with the spec
openapi-diff docs/implementation/openapi-mvp.yaml live-api-spec.yaml

Console E2E with Playwright

The management console is tested end-to-end using Playwright against a mock backend.

Test Environment Setup

# Start the mock backend (simulates API + gateway)
npx tsx console/scripts/mock-server.ts &
MOCK_PID=$!

# Start the console in development mode
cd console && npm run dev &
CONSOLE_PID=$!

# Wait for both to be ready
sleep 10

Playwright Test Examples

// tests/e2e/events-dashboard.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Events Dashboard', () => {
test('displays recent events', async ({ page }) => {
await page.goto('/events');

// Wait for events table to load
await expect(page.getByRole('table')).toBeVisible();

// Verify event rows are present
const rows = page.getByRole('row');
await expect(rows).toHaveCount.greaterThan(1); // header + data rows

// Verify key columns are visible
await expect(page.getByText('Decision')).toBeVisible();
await expect(page.getByText('Model')).toBeVisible();
await expect(page.getByText('Timestamp')).toBeVisible();
});

test('filters events by decision type', async ({ page }) => {
await page.goto('/events');

// Apply filter for blocked events
await page.getByRole('combobox', { name: /decision/i }).click();
await page.getByRole('option', { name: 'Blocked' }).click();

// Verify all visible events show blocked status
const decisionCells = page.locator('[data-testid="decision-cell"]');
for (const cell of await decisionCells.all()) {
await expect(cell).toHaveText('blocked');
}
});

test('event detail shows policy information', async ({ page }) => {
await page.goto('/events');

// Click first event row
await page.getByRole('row').nth(1).click();

// Verify detail panel shows policy information
await expect(page.getByText('Policies Applied')).toBeVisible();
await expect(page.getByText('Request')).toBeVisible();
await expect(page.getByText('Response')).toBeVisible();
});
});

Running Console E2E Tests

# Run all E2E tests
cd console && npm run test:e2e

# Run specific test file
cd console && npx playwright test tests/e2e/events-dashboard.spec.ts

# Run with visible browser for debugging
cd console && npx playwright test --headed

# Generate HTML report
cd console && npx playwright show-report

Webhook Verification

Governance events can trigger webhooks for external integrations. Test that webhooks fire correctly and deliver valid payloads.

Webhook Test Receiver

Set up a lightweight webhook receiver for testing:

#!/bin/bash
# start-webhook-receiver.sh — capture webhook deliveries

PORT=9999
LOG_FILE="webhook-deliveries.jsonl"

# Simple webhook receiver using netcat
echo "Webhook receiver listening on port $PORT"
while true; do
{
read -r REQUEST_LINE
HEADERS=""
while read -r HEADER && [ "$HEADER" != $'\r' ]; do
HEADERS="$HEADERS$HEADER\n"
done

CONTENT_LENGTH=$(echo -e "$HEADERS" | grep -i "Content-Length" | awk '{print $2}' | tr -d '\r')
if [ -n "$CONTENT_LENGTH" ]; then
BODY=$(head -c "$CONTENT_LENGTH")
fi

TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
echo "{\"timestamp\": \"$TIMESTAMP\", \"body\": $BODY}" >> "$LOG_FILE"

echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"
} | nc -l "$PORT"
done

Testing Webhook Delivery

#!/bin/bash
# test-webhooks.sh — verify webhook delivery on escalation

GATEWAY="http://localhost:41002"
WEBHOOK_LOG="webhook-deliveries.jsonl"

# Clear previous deliveries
> "$WEBHOOK_LOG"

# Start webhook receiver
./scripts/start-webhook-receiver.sh &
RECEIVER_PID=$!
sleep 2

# Trigger an escalation (which should fire a webhook)
curl -s "$GATEWAY/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Evaluate this employee for termination"}]
}' > /dev/null

# Wait for webhook delivery
sleep 5

# Verify webhook was received
DELIVERY_COUNT=$(wc -l < "$WEBHOOK_LOG")
if [ "$DELIVERY_COUNT" -gt 0 ]; then
echo "PASS: Webhook delivered ($DELIVERY_COUNT delivery/deliveries)"

# Verify payload schema
LAST_DELIVERY=$(tail -1 "$WEBHOOK_LOG")
EVENT_TYPE=$(echo "$LAST_DELIVERY" | jq -r '.body.event_type // empty')
if [ "$EVENT_TYPE" = "escalation" ]; then
echo "PASS: Webhook event type is 'escalation'"
else
echo "FAIL: Unexpected event type: $EVENT_TYPE"
fi
else
echo "FAIL: No webhook delivery received"
fi

kill $RECEIVER_PID 2>/dev/null

Webhook Retry Testing

Verify the API retries failed webhook deliveries:

#!/bin/bash
# test-webhook-retry.sh — verify retry on delivery failure

# Start receiver that rejects first 2 attempts
ATTEMPT=0
start_flaky_receiver() {
while true; do
{
read -r REQUEST_LINE
while read -r HEADER && [ "$HEADER" != $'\r' ]; do :; done
CONTENT_LENGTH=$(echo -e "$HEADERS" | grep -i "Content-Length" | awk '{print $2}' | tr -d '\r')
[ -n "$CONTENT_LENGTH" ] && head -c "$CONTENT_LENGTH" > /dev/null

ATTEMPT=$((ATTEMPT + 1))
if [ "$ATTEMPT" -le 2 ]; then
echo -e "HTTP/1.1 500 Internal Server Error\r\n\r\n"
else
echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"
echo "Accepted on attempt $ATTEMPT" >> webhook-retry.log
fi
} | nc -l 9999
done
}

start_flaky_receiver &
RECEIVER_PID=$!

# Trigger escalation event
# ... (same as above)

# Wait for retries
sleep 30

# Check if delivery eventually succeeded
if [ -f webhook-retry.log ]; then
echo "PASS: Webhook delivery succeeded after retries"
else
echo "FAIL: Webhook never delivered after retries"
fi

kill $RECEIVER_PID 2>/dev/null

CI Pipeline Integration

Complete CI Workflow

# .github/workflows/governance-e2e.yml
name: Governance E2E Tests

on:
pull_request:
paths:
- 'api/**'
- 'cli/**'
- 'console/**'
- 'policy-config.yaml'

jobs:
api-contracts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start API
run: |
docker compose -f docker-compose.test.yml up -d
cd api && cargo run --release &
sleep 10
- name: Run contract tests
run: ./scripts/test-api-contracts.sh

gateway-policies:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start mock gateway
run: |
# Start the gateway after your fixture-backed mock upstream is ready.
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml &
sleep 3
- name: Run policy tests
run: |
./scripts/test-policy-chain.sh
./scripts/test-dlp-patterns.sh
./scripts/test-prompt-injection.sh

console-e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: cd console && npm ci
- name: Install Playwright browsers
run: cd console && npx playwright install --with-deps
- name: Start mock backend
run: |
npx tsx console/scripts/mock-server.ts &
sleep 5
- name: Run E2E tests
run: cd console && npm run test:e2e
- name: Upload test report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: console/playwright-report/

webhook-verification:
runs-on: ubuntu-latest
needs: [api-contracts, gateway-policies]
steps:
- uses: actions/checkout@v4
- name: Start services
run: docker compose up -d
- name: Run webhook tests
run: ./scripts/test-webhooks.sh

Test Execution Order

For efficiency, run independent test suites in parallel and dependent suites sequentially:

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ API Contracts │ │ Gateway Policies │ │ Console E2E │
│ (parallel) │ │ (parallel) │ │ (parallel) │
└────────┬────────┘ └────────┬─────────┘ └─────────────────┘
│ │
└──────┬─────────────┘

┌───────────▼───────────┐
│ Webhook Verification │
│ (depends on API + │
│ gateway passing) │
└───────────────────────┘

Test Reporting

Unified Test Report

Aggregate results from all test surfaces:

#!/bin/bash
# generate-test-report.sh

REPORT="test-report.md"

cat > "$REPORT" << EOF
# Governance Test Report
Generated: $(date -u +"%Y-%m-%dT%H:%M:%SZ")

## Summary
| Suite | Status | Duration |
|-------|--------|----------|
EOF

# Append results from each suite
for RESULT_FILE in test-results/*.json; do
SUITE=$(jq -r '.suite' "$RESULT_FILE")
STATUS=$(jq -r '.status' "$RESULT_FILE")
DURATION=$(jq -r '.duration_seconds' "$RESULT_FILE")
echo "| $SUITE | $STATUS | ${DURATION}s |" >> "$REPORT"
done

echo "" >> "$REPORT"
echo "## Failures" >> "$REPORT"

for RESULT_FILE in test-results/*.json; do
jq -r '.failures[]? | "- **\(.id)**: \(.message)"' "$RESULT_FILE" >> "$REPORT"
done

echo "Report saved to $REPORT"

Key Takeaways

  • Structure E2E tests across all four surfaces: gateway, API, console, and webhooks
  • API contract tests catch schema-breaking changes before they affect consumers
  • Console E2E tests use Playwright against a mock backend for deterministic validation
  • Webhook tests verify delivery, payload schema, and retry behavior
  • Run independent test suites in parallel in CI for faster feedback
  • Generate unified test reports aggregating results from all governance test surfaces

For AI systems

  • Canonical terms: E2E test automation, API contract tests, Playwright, webhook verification, mock backend, CI pipeline, test framework architecture
  • Four testable surfaces: Console (Playwright E2E), API (contract tests), Gateway (policy tests), Webhooks (notification tests)
  • Console E2E: Playwright against mock backend (scripts/mock-server.ts, ports 18088/41008)
  • API contracts: verify response schemas match documented OpenAPI spec
  • Webhook tests: verify delivery on escalation trigger, payload schema, retry logic
  • Related pages: Mock Gateway, Regression Testing, Testing AI Systems

For engineers

  • Structure tests by surface: tests/gateway/, tests/api/, tests/console/, tests/webhooks/
  • API contract tests: curl each endpoint, assert HTTP status and response JSON schema matches the OpenAPI spec
  • Console E2E: use Playwright against the mock backend (npx tsx scripts/mock-server.ts) for deterministic tests
  • Gateway policy tests: send prompts through a running gateway and assert decisions (409 for block, 200 for allow)
  • Webhook tests: start a local HTTP listener, trigger an escalation, verify the webhook fires with correct payload and retries on failure
  • Run independent suites in parallel in CI for faster feedback; generate unified reports aggregating all surfaces
  • Validate: all four test surfaces pass green before any deployment to production

For leaders

  • Full-stack test automation catches issues across the entire governance pipeline, not just individual components
  • API contract tests prevent breaking changes from reaching consumers (console, chat, CLI)
  • Parallel CI execution across four surfaces provides fast feedback without sacrificing coverage
  • Webhook testing validates that escalation notifications actually reach operations teams
  • Unified test reports give a single health signal for the entire AI governance platform

Next steps