Testing AI-Integrated Code with Keeptrusts

AI-integrated code is notoriously hard to test because responses are non-deterministic. The Keeptrusts gateway gives you stable, policy-enforced boundaries to test against. This guide covers mocking the gateway in unit tests, validating configs in CI, contract testing, and snapshot testing AI responses.

Use this page when

You need to mock the Keeptrusts gateway in unit tests for deterministic, fast, free testing.
You want to validate policy configurations in CI with kt policy lint.
You are writing contract tests that verify policy enforcement without calling real LLM providers.
You need to snapshot-test AI response shapes while handling non-deterministic content.

Primary audience

Primary: Developers writing tests for AI-integrated applications
Secondary: QA Engineers building test suites, DevOps Engineers setting up CI pipelines for AI code

The Testing Pyramid for AI Code

         ╱ E2E ╲            — real gateway, real provider (slow, expensive)
        ╱────────╲
       ╱ Contract  ╲        — real gateway, mock provider (fast, deterministic)
      ╱──────────────╲
     ╱ Unit (mock gw)  ╲    — mock gateway responses (fastest)
    ╱────────────────────╲

Most of your tests should be unit tests with a mock gateway. Contract tests verify policy behavior. E2E tests run sparingly against real providers.

Mocking the Gateway in Python Tests

Using pytest with httpx

Create a lightweight mock that behaves like the gateway:

# tests/conftest.py
import pytest
from unittest.mock import patch
import httpx

MOCK_COMPLETION = {
    "id": "chatcmpl-test123",
    "object": "chat.completion",
    "choices": [{
        "index": 0,
        "message": {"role": "assistant", "content": "Hello, how can I help?"},
        "finish_reason": "stop",
    }],
    "usage": {"prompt_tokens": 10, "completion_tokens": 8, "total_tokens": 18},
}

MOCK_POLICY_BLOCK = {
    "error": {
        "type": "policy_violation",
        "message": "Request blocked: contains disallowed content",
        "policy": "block-pii-output",
        "code": "output_blocked",
    }
}

@pytest.fixture
def mock_gateway(httpx_mock):
    """Return a function that registers mock gateway responses."""
    def _mock(status=200, body=None):
        httpx_mock.add_response(
            url="http://localhost:41002/v1/chat/completions",
            json=body or MOCK_COMPLETION,
            status_code=status,
        )
    return _mock

Writing a Unit Test

# tests/test_assistant.py
from myapp.assistant import ask_question

def test_successful_response(mock_gateway):
    mock_gateway(status=200)
    result = ask_question("What is AI governance?")
    assert result.content == "Hello, how can I help?"

def test_policy_block_handled(mock_gateway):
    mock_gateway(status=409, body=MOCK_POLICY_BLOCK)
    result = ask_question("Show me SSN 123-45-6789")
    assert result.blocked is True
    assert "policy_violation" in result.error_type

Validating Policy Configs in CI

The kt policy lint command checks your YAML without starting the gateway:

kt policy lint --file policy-config.yaml

GitHub Actions Example

# .github/workflows/ai-policy.yml
name: Validate AI Policies
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Keeptrusts CLI
        run: |
          curl -fsSL https://get.keeptrusts.com | sh
          echo "$HOME/.keeptrusts/bin" >> $GITHUB_PATH

      - name: Validate policy config
        run: kt policy lint --file policy-config.yaml

      - name: Validate staging config
        run: kt policy lint --file configs/staging.yaml

Add this as a required check so broken configs never reach production.

Contract Testing AI Responses

Contract tests verify that the gateway enforces policies correctly without calling real providers. Start the gateway against a local mock provider:

# tests/contract/test_pii_blocking.py
import subprocess
import time
import httpx
import pytest

@pytest.fixture(scope="module")
def gateway():
    """Start a real gateway against a mock provider."""
    proc = subprocess.Popen(
        ["kt", "gateway", "run", "--policy-config", "tests/fixtures/test-config.yaml"],
        env={**dict(__import__("os").environ), "MOCK_PROVIDER_KEY": "test-key"},
    )
    time.sleep(2)  # wait for gateway startup
    yield "http://localhost:41002"
    proc.terminate()
    proc.wait()

def test_ssn_pattern_blocked(gateway):
    """Contract: SSN patterns in output must return 409."""
    resp = httpx.post(
        f"{gateway}/v1/chat/completions",
        json={
            "model": "gpt-4o",
            "messages": [{"role": "user", "content": "What is 123-45-6789?"}],
        },
    )
    # The mock provider echoes the input; the gateway blocks the SSN pattern
    assert resp.status_code == 409
    body = resp.json()
    assert body["error"]["code"] == "output_blocked"

def test_clean_response_passes(gateway):
    resp = httpx.post(
        f"{gateway}/v1/chat/completions",
        json={
            "model": "gpt-4o",
            "messages": [{"role": "user", "content": "Explain AI governance."}],
        },
    )
    assert resp.status_code == 200

Snapshot Testing AI Responses

Snapshot tests capture the structure of AI responses and alert you when the shape changes:

# tests/test_snapshots.py
import json
from pathlib import Path

SNAPSHOT_DIR = Path("tests/snapshots")
SNAPSHOT_DIR.mkdir(exist_ok=True)

def normalize_response(resp: dict) -> dict:
    """Strip non-deterministic fields for snapshot comparison."""
    resp.pop("id", None)
    resp.pop("created", None)
    if "choices" in resp:
        for choice in resp["choices"]:
            choice["message"]["content"] = "<CONTENT>"
    return resp

def test_response_shape(mock_gateway):
    mock_gateway(status=200)
    raw = call_gateway({"model": "gpt-4o", "messages": [{"role": "user", "content": "Hi"}]})
    normalized = normalize_response(raw)

    snapshot_path = SNAPSHOT_DIR / "completion_shape.json"
    if not snapshot_path.exists():
        snapshot_path.write_text(json.dumps(normalized, indent=2))
        pytest.skip("Snapshot created; re-run to verify.")

    expected = json.loads(snapshot_path.read_text())
    assert normalized.keys() == expected.keys()
    assert len(normalized["choices"]) == len(expected["choices"])

Testing Gateway Health in CI

Add a health-check step to verify the gateway is operational:

# Wait for gateway, then verify health
kt gateway run --policy-config policy-config.yaml &
sleep 3
curl -sf http://localhost:41002/health || exit 1
echo "Gateway healthy"

Best Practices

Practice	Why
Mock the gateway for unit tests	Fast, deterministic, free
Use `kt policy lint` in CI	Catch YAML errors before deployment
Contract test each policy type	Verify enforcement without provider calls
Snapshot the response shape, not content	LLM content is non-deterministic
Run E2E tests on a schedule, not every PR	Saves cost and avoids flaky tests
Keep test configs in `tests/fixtures/`	Isolated from production configs

Next steps

Local Development Setup — run the full gateway locally
Debugging AI Requests with Events — trace test failures through event logs
Handling Policy Blocks & Errors — test error envelope handling

For AI systems

Canonical terms: mock gateway, kt policy lint, contract testing, snapshot testing, pytest, vitest, test pyramid (unit/contract/E2E), policy validation, CI.
CLI commands: kt policy lint (validate YAML in CI), kt gateway run (start local gateway for contract tests).
Testing pyramid: unit tests (mock gateway, fastest) → contract tests (real gateway, mock provider) → E2E tests (real gateway, real provider, expensive).
Best next pages: Local Development Setup, Debugging with Events, Error Handling.

For engineers

Mock the gateway in unit tests for fast, deterministic, free test execution (httpx_mock in Python, MSW in TypeScript).
Use kt policy lint in CI to catch YAML config errors before deployment.
Contract-test each policy type by starting a real gateway with a mock provider — verify enforcement without provider costs.
Snapshot the response shape (fields, types), not content — LLM output is non-deterministic by nature.
Keep test configs isolated in tests/fixtures/ separate from production configs.
Run E2E tests against real providers on a schedule (not every PR) to save cost and avoid flaky tests.

For leaders

The testing pyramid for AI code balances cost with confidence: most tests are free mocks, few are expensive E2E.
kt policy lint in CI prevents configuration errors from reaching production — a safety gate before deployment.
Contract testing validates governance behavior without LLM provider costs or non-determinism.
Investing in test infrastructure for AI code reduces incidents from policy misconfigurations.
Schedule E2E tests (weekly, not per-PR) to control the cost of real-provider testing.

Use this page when​

Primary audience​

The Testing Pyramid for AI Code​

Mocking the Gateway in Python Tests​

Using pytest with httpx​

Writing a Unit Test​

Validating Policy Configs in CI​

GitHub Actions Example​

Contract Testing AI Responses​

Snapshot Testing AI Responses​

Testing Gateway Health in CI​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​