End-to-End Test Automation for AI Governance

A complete AI governance testing strategy spans the full stack — from gateway policy enforcement through the control-plane API to the management console and notification webhooks. This guide covers building an automated test framework that validates the entire governance pipeline.

Use this page when

You are building end-to-end test automation spanning gateway, API, console, and webhooks
You need API contract tests, Playwright console E2E tests, or webhook delivery verification
You want to set up CI pipelines that run all governance test surfaces in parallel

Primary audience

Primary: Technical Engineers
Secondary: AI Agents, Technical Leaders

Test Framework Architecture

The Keeptrusts governance platform has four testable surfaces:

┌─────────────────────────────────────────────────┐
│  Console (Playwright E2E)                       │
│  ├─ Dashboard displays events correctly         │
│  ├─ Policy configuration UI works               │
│  └─ Escalation workflows complete               │
├─────────────────────────────────────────────────┤
│  API (Contract Tests)                           │
│  ├─ Events API returns correct schemas          │
│  ├─ Auth flows enforce access control           │
│  └─ Export jobs produce valid artifacts          │
├─────────────────────────────────────────────────┤
│  Gateway (Policy Tests)                         │
│  ├─ Policies block/redact/escalate correctly    │
│  ├─ Rate limits and spend controls enforce      │
│  └─ Events emitted for every decision           │
├─────────────────────────────────────────────────┤
│  Webhooks (Notification Tests)                  │
│  ├─ Escalation webhooks fire on trigger         │
│  ├─ Payload schemas match contract              │
│  └─ Retry logic handles failures                │
└─────────────────────────────────────────────────┘

API Contract Tests

API contract tests verify that the control-plane API returns responses matching the documented schema. These catch breaking changes before they reach consumers.

Setting Up Contract Tests

#!/bin/bash
# test-api-contracts.sh — verify API response schemas

API_URL="http://localhost:8080"
API_TOKEN="$TEST_API_TOKEN"
FAILURES=0

# Helper function
assert_status() {
  local NAME="$1"
  local EXPECTED="$2"
  local ACTUAL="$3"

  if [ "$EXPECTED" = "$ACTUAL" ]; then
    echo "PASS [$NAME]: HTTP $ACTUAL"
  else
    echo "FAIL [$NAME]: Expected HTTP $EXPECTED, got $ACTUAL"
    FAILURES=$((FAILURES + 1))
  fi
}

assert_json_field() {
  local NAME="$1"
  local BODY="$2"
  local FIELD="$3"

  VALUE=$(echo "$BODY" | jq -r "$FIELD")
  if [ "$VALUE" != "null" ] && [ -n "$VALUE" ]; then
    echo "PASS [$NAME]: Field $FIELD present"
  else
    echo "FAIL [$NAME]: Field $FIELD missing or null"
    FAILURES=$((FAILURES + 1))
  fi
}

# Test: Login endpoint
echo "=== Auth Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/v1/auth/login" \
  -H "Content-Type: application/json" \
  -d '{"email": "admin@example.com", "password": "test-password"}')
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -n -1)
assert_status "auth-login" "200" "$HTTP_CODE"
assert_json_field "auth-login-token" "$BODY" ".token"

# Test: Events list endpoint
echo ""
echo "=== Events Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/v1/events?limit=5" \
  -H "Authorization: Bearer $API_TOKEN")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -n -1)
assert_status "events-list" "200" "$HTTP_CODE"
assert_json_field "events-data" "$BODY" ".data"
assert_json_field "events-pagination" "$BODY" ".pagination"

# Test: Health endpoint
echo ""
echo "=== Health Contract ==="
RESPONSE=$(curl -s -w "\n%{http_code}" "$API_URL/health")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
assert_status "health" "200" "$HTTP_CODE"

echo ""
echo "Contract test results: $FAILURES failure(s)"
[ "$FAILURES" -eq 0 ] || exit 1

Schema Validation with OpenAPI

Validate responses against the OpenAPI specification:

# Install openapi-diff for schema comparison
npm install -g openapi-diff

# Compare current API responses with the spec
openapi-diff docs/implementation/openapi-mvp.yaml live-api-spec.yaml

Console E2E with Playwright

The management console is tested end-to-end using Playwright against a mock backend.

Test Environment Setup

# Start the mock backend (simulates API + gateway)
npx tsx console/scripts/mock-server.ts &
MOCK_PID=$!

# Start the console in development mode
cd console && npm run dev &
CONSOLE_PID=$!

# Wait for both to be ready
sleep 10

Playwright Test Examples

// tests/e2e/events-dashboard.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Events Dashboard', () => {
  test('displays recent events', async ({ page }) => {
    await page.goto('/events');

    // Wait for events table to load
    await expect(page.getByRole('table')).toBeVisible();

    // Verify event rows are present
    const rows = page.getByRole('row');
    await expect(rows).toHaveCount.greaterThan(1); // header + data rows

    // Verify key columns are visible
    await expect(page.getByText('Decision')).toBeVisible();
    await expect(page.getByText('Model')).toBeVisible();
    await expect(page.getByText('Timestamp')).toBeVisible();
  });

  test('filters events by decision type', async ({ page }) => {
    await page.goto('/events');

    // Apply filter for blocked events
    await page.getByRole('combobox', { name: /decision/i }).click();
    await page.getByRole('option', { name: 'Blocked' }).click();

    // Verify all visible events show blocked status
    const decisionCells = page.locator('[data-testid="decision-cell"]');
    for (const cell of await decisionCells.all()) {
      await expect(cell).toHaveText('blocked');
    }
  });

  test('event detail shows policy information', async ({ page }) => {
    await page.goto('/events');

    // Click first event row
    await page.getByRole('row').nth(1).click();

    // Verify detail panel shows policy information
    await expect(page.getByText('Policies Applied')).toBeVisible();
    await expect(page.getByText('Request')).toBeVisible();
    await expect(page.getByText('Response')).toBeVisible();
  });
});

Running Console E2E Tests

# Run all E2E tests
cd console && npm run test:e2e

# Run specific test file
cd console && npx playwright test tests/e2e/events-dashboard.spec.ts

# Run with visible browser for debugging
cd console && npx playwright test --headed

# Generate HTML report
cd console && npx playwright show-report

Webhook Verification

Governance events can trigger webhooks for external integrations. Test that webhooks fire correctly and deliver valid payloads.

Webhook Test Receiver

Set up a lightweight webhook receiver for testing:

#!/bin/bash
# start-webhook-receiver.sh — capture webhook deliveries

PORT=9999
LOG_FILE="webhook-deliveries.jsonl"

# Simple webhook receiver using netcat
echo "Webhook receiver listening on port $PORT"
while true; do
  {
    read -r REQUEST_LINE
    HEADERS=""
    while read -r HEADER && [ "$HEADER" != $'\r' ]; do
      HEADERS="$HEADERS$HEADER\n"
    done

    CONTENT_LENGTH=$(echo -e "$HEADERS" | grep -i "Content-Length" | awk '{print $2}' | tr -d '\r')
    if [ -n "$CONTENT_LENGTH" ]; then
      BODY=$(head -c "$CONTENT_LENGTH")
    fi

    TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
    echo "{\"timestamp\": \"$TIMESTAMP\", \"body\": $BODY}" >> "$LOG_FILE"

    echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"
  } | nc -l "$PORT"
done

Testing Webhook Delivery

#!/bin/bash
# test-webhooks.sh — verify webhook delivery on escalation

GATEWAY="http://localhost:41002"
WEBHOOK_LOG="webhook-deliveries.jsonl"

# Clear previous deliveries
> "$WEBHOOK_LOG"

# Start webhook receiver
./scripts/start-webhook-receiver.sh &
RECEIVER_PID=$!
sleep 2

# Trigger an escalation (which should fire a webhook)
curl -s "$GATEWAY/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Evaluate this employee for termination"}]
  }' > /dev/null

# Wait for webhook delivery
sleep 5

# Verify webhook was received
DELIVERY_COUNT=$(wc -l < "$WEBHOOK_LOG")
if [ "$DELIVERY_COUNT" -gt 0 ]; then
  echo "PASS: Webhook delivered ($DELIVERY_COUNT delivery/deliveries)"

  # Verify payload schema
  LAST_DELIVERY=$(tail -1 "$WEBHOOK_LOG")
  EVENT_TYPE=$(echo "$LAST_DELIVERY" | jq -r '.body.event_type // empty')
  if [ "$EVENT_TYPE" = "escalation" ]; then
    echo "PASS: Webhook event type is 'escalation'"
  else
    echo "FAIL: Unexpected event type: $EVENT_TYPE"
  fi
else
  echo "FAIL: No webhook delivery received"
fi

kill $RECEIVER_PID 2>/dev/null

Webhook Retry Testing

Verify the API retries failed webhook deliveries:

#!/bin/bash
# test-webhook-retry.sh — verify retry on delivery failure

# Start receiver that rejects first 2 attempts
ATTEMPT=0
start_flaky_receiver() {
  while true; do
    {
      read -r REQUEST_LINE
      while read -r HEADER && [ "$HEADER" != $'\r' ]; do :; done
      CONTENT_LENGTH=$(echo -e "$HEADERS" | grep -i "Content-Length" | awk '{print $2}' | tr -d '\r')
      [ -n "$CONTENT_LENGTH" ] && head -c "$CONTENT_LENGTH" > /dev/null

      ATTEMPT=$((ATTEMPT + 1))
      if [ "$ATTEMPT" -le 2 ]; then
        echo -e "HTTP/1.1 500 Internal Server Error\r\n\r\n"
      else
        echo -e "HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n"
        echo "Accepted on attempt $ATTEMPT" >> webhook-retry.log
      fi
    } | nc -l 9999
  done
}

start_flaky_receiver &
RECEIVER_PID=$!

# Trigger escalation event
# ... (same as above)

# Wait for retries
sleep 30

# Check if delivery eventually succeeded
if [ -f webhook-retry.log ]; then
  echo "PASS: Webhook delivery succeeded after retries"
else
  echo "FAIL: Webhook never delivered after retries"
fi

kill $RECEIVER_PID 2>/dev/null

CI Pipeline Integration

Complete CI Workflow

# .github/workflows/governance-e2e.yml
name: Governance E2E Tests

on:
  pull_request:
    paths:
      - 'api/**'
      - 'cli/**'
      - 'console/**'
      - 'policy-config.yaml'

jobs:
  api-contracts:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Start API
        run: |
          docker compose -f docker-compose.test.yml up -d
          cd api && cargo run --release &
          sleep 10
      - name: Run contract tests
        run: ./scripts/test-api-contracts.sh

  gateway-policies:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Start mock gateway
        run: |
          # Start the gateway after your fixture-backed mock upstream is ready.
          kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml &
          sleep 3
      - name: Run policy tests
        run: |
          ./scripts/test-policy-chain.sh
          ./scripts/test-dlp-patterns.sh
          ./scripts/test-prompt-injection.sh

  console-e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install dependencies
        run: cd console && npm ci
      - name: Install Playwright browsers
        run: cd console && npx playwright install --with-deps
      - name: Start mock backend
        run: |
          npx tsx console/scripts/mock-server.ts &
          sleep 5
      - name: Run E2E tests
        run: cd console && npm run test:e2e
      - name: Upload test report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: console/playwright-report/

  webhook-verification:
    runs-on: ubuntu-latest
    needs: [api-contracts, gateway-policies]
    steps:
      - uses: actions/checkout@v4
      - name: Start services
        run: docker compose up -d
      - name: Run webhook tests
        run: ./scripts/test-webhooks.sh

Test Execution Order

For efficiency, run independent test suites in parallel and dependent suites sequentially:

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ API Contracts   │  │ Gateway Policies │  │ Console E2E     │
│ (parallel)      │  │ (parallel)       │  │ (parallel)      │
└────────┬────────┘  └────────┬─────────┘  └─────────────────┘
         │                    │
         └──────┬─────────────┘
                │
    ┌───────────▼───────────┐
    │ Webhook Verification  │
    │ (depends on API +     │
    │  gateway passing)     │
    └───────────────────────┘

Test Reporting

Unified Test Report

Aggregate results from all test surfaces:

#!/bin/bash
# generate-test-report.sh

REPORT="test-report.md"

cat > "$REPORT" << EOF
# Governance Test Report
Generated: $(date -u +"%Y-%m-%dT%H:%M:%SZ")

## Summary
| Suite | Status | Duration |
|-------|--------|----------|
EOF

# Append results from each suite
for RESULT_FILE in test-results/*.json; do
  SUITE=$(jq -r '.suite' "$RESULT_FILE")
  STATUS=$(jq -r '.status' "$RESULT_FILE")
  DURATION=$(jq -r '.duration_seconds' "$RESULT_FILE")
  echo "| $SUITE | $STATUS | ${DURATION}s |" >> "$REPORT"
done

echo "" >> "$REPORT"
echo "## Failures" >> "$REPORT"

for RESULT_FILE in test-results/*.json; do
  jq -r '.failures[]? | "- **\(.id)**: \(.message)"' "$RESULT_FILE" >> "$REPORT"
done

echo "Report saved to $REPORT"

Key Takeaways

Structure E2E tests across all four surfaces: gateway, API, console, and webhooks
API contract tests catch schema-breaking changes before they affect consumers
Console E2E tests use Playwright against a mock backend for deterministic validation
Webhook tests verify delivery, payload schema, and retry behavior
Run independent test suites in parallel in CI for faster feedback
Generate unified test reports aggregating results from all governance test surfaces

For AI systems

Canonical terms: E2E test automation, API contract tests, Playwright, webhook verification, mock backend, CI pipeline, test framework architecture
Four testable surfaces: Console (Playwright E2E), API (contract tests), Gateway (policy tests), Webhooks (notification tests)
Console E2E: Playwright against mock backend (scripts/mock-server.ts, ports 18088/41008)
API contracts: verify response schemas match documented OpenAPI spec
Webhook tests: verify delivery on escalation trigger, payload schema, retry logic
Related pages: Mock Gateway, Regression Testing, Testing AI Systems

For engineers

Structure tests by surface: tests/gateway/, tests/api/, tests/console/, tests/webhooks/
API contract tests: curl each endpoint, assert HTTP status and response JSON schema matches the OpenAPI spec
Console E2E: use Playwright against the mock backend (npx tsx scripts/mock-server.ts) for deterministic tests
Gateway policy tests: send prompts through a running gateway and assert decisions (409 for block, 200 for allow)
Webhook tests: start a local HTTP listener, trigger an escalation, verify the webhook fires with correct payload and retries on failure
Run independent suites in parallel in CI for faster feedback; generate unified reports aggregating all surfaces
Validate: all four test surfaces pass green before any deployment to production

For leaders

Full-stack test automation catches issues across the entire governance pipeline, not just individual components
API contract tests prevent breaking changes from reaching consumers (console, chat, CLI)
Parallel CI execution across four surfaces provides fast feedback without sacrificing coverage
Webhook testing validates that escalation notifications actually reach operations teams
Unified test reports give a single health signal for the entire AI governance platform

Next steps

Use Mock Gateway for deterministic gateway policy tests in CI
Add Regression Testing to catch unintended behavioral changes on policy updates
Start with Testing AI Systems for the foundational policy-as-test-oracle approach

Use this page when​

Primary audience​

Test Framework Architecture​

API Contract Tests​

Setting Up Contract Tests​

Schema Validation with OpenAPI​

Console E2E with Playwright​

Test Environment Setup​

Playwright Test Examples​

Running Console E2E Tests​

Webhook Verification​

Webhook Test Receiver​

Testing Webhook Delivery​

Webhook Retry Testing​

CI Pipeline Integration​

Complete CI Workflow​

Test Execution Order​

Test Reporting​

Unified Test Report​

Key Takeaways​

For AI systems​

For engineers​

For leaders​

Next steps​