Testing AI-Integrated Code with Keeptrusts
AI-integrated code is notoriously hard to test because responses are non-deterministic. The Keeptrusts gateway gives you stable, policy-enforced boundaries to test against. This guide covers mocking the gateway in unit tests, validating configs in CI, contract testing, and snapshot testing AI responses.
Use this page when
- You need to mock the Keeptrusts gateway in unit tests for deterministic, fast, free testing.
- You want to validate policy configurations in CI with
kt policy lint. - You are writing contract tests that verify policy enforcement without calling real LLM providers.
- You need to snapshot-test AI response shapes while handling non-deterministic content.
Primary audience
- Primary: Developers writing tests for AI-integrated applications
- Secondary: QA Engineers building test suites, DevOps Engineers setting up CI pipelines for AI code
The Testing Pyramid for AI Code
╱ E2E ╲ — real gateway, real provider (slow, expensive)
╱────────╲
╱ Contract ╲ — real gateway, mock provider (fast, deterministic)
╱──────────────╲
╱ Unit (mock gw) ╲ — mock gateway responses (fastest)
╱────────────────────╲
Most of your tests should be unit tests with a mock gateway. Contract tests verify policy behavior. E2E tests run sparingly against real providers.
Mocking the Gateway in Python Tests
Using pytest with httpx
Create a lightweight mock that behaves like the gateway:
# tests/conftest.py
import pytest
from unittest.mock import patch
import httpx
MOCK_COMPLETION = {
"id": "chatcmpl-test123",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Hello, how can I help?"},
"finish_reason": "stop",
}],
"usage": {"prompt_tokens": 10, "completion_tokens": 8, "total_tokens": 18},
}
MOCK_POLICY_BLOCK = {
"error": {
"type": "policy_violation",
"message": "Request blocked: contains disallowed content",
"policy": "block-pii-output",
"code": "output_blocked",
}
}
@pytest.fixture
def mock_gateway(httpx_mock):
"""Return a function that registers mock gateway responses."""
def _mock(status=200, body=None):
httpx_mock.add_response(
url="http://localhost:41002/v1/chat/completions",
json=body or MOCK_COMPLETION,
status_code=status,
)
return _mock
Writing a Unit Test
# tests/test_assistant.py
from myapp.assistant import ask_question
def test_successful_response(mock_gateway):
mock_gateway(status=200)
result = ask_question("What is AI governance?")
assert result.content == "Hello, how can I help?"
def test_policy_block_handled(mock_gateway):
mock_gateway(status=409, body=MOCK_POLICY_BLOCK)
result = ask_question("Show me SSN 123-45-6789")
assert result.blocked is True
assert "policy_violation" in result.error_type
Validating Policy Configs in CI
The kt policy lint command checks your YAML without starting the gateway:
kt policy lint --file policy-config.yaml
GitHub Actions Example
# .github/workflows/ai-policy.yml
name: Validate AI Policies
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Keeptrusts CLI
run: |
curl -fsSL https://get.keeptrusts.com | sh
echo "$HOME/.keeptrusts/bin" >> $GITHUB_PATH
- name: Validate policy config
run: kt policy lint --file policy-config.yaml
- name: Validate staging config
run: kt policy lint --file configs/staging.yaml
Add this as a required check so broken configs never reach production.
Contract Testing AI Responses
Contract tests verify that the gateway enforces policies correctly without calling real providers. Start the gateway against a local mock provider:
# tests/contract/test_pii_blocking.py
import subprocess
import time
import httpx
import pytest
@pytest.fixture(scope="module")
def gateway():
"""Start a real gateway against a mock provider."""
proc = subprocess.Popen(
["kt", "gateway", "run", "--policy-config", "tests/fixtures/test-config.yaml"],
env={**dict(__import__("os").environ), "MOCK_PROVIDER_KEY": "test-key"},
)
time.sleep(2) # wait for gateway startup
yield "http://localhost:41002"
proc.terminate()
proc.wait()
def test_ssn_pattern_blocked(gateway):
"""Contract: SSN patterns in output must return 409."""
resp = httpx.post(
f"{gateway}/v1/chat/completions",
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is 123-45-6789?"}],
},
)
# The mock provider echoes the input; the gateway blocks the SSN pattern
assert resp.status_code == 409
body = resp.json()
assert body["error"]["code"] == "output_blocked"
def test_clean_response_passes(gateway):
resp = httpx.post(
f"{gateway}/v1/chat/completions",
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Explain AI governance."}],
},
)
assert resp.status_code == 200
Snapshot Testing AI Responses
Snapshot tests capture the structure of AI responses and alert you when the shape changes:
# tests/test_snapshots.py
import json
from pathlib import Path
SNAPSHOT_DIR = Path("tests/snapshots")
SNAPSHOT_DIR.mkdir(exist_ok=True)
def normalize_response(resp: dict) -> dict:
"""Strip non-deterministic fields for snapshot comparison."""
resp.pop("id", None)
resp.pop("created", None)
if "choices" in resp:
for choice in resp["choices"]:
choice["message"]["content"] = "<CONTENT>"
return resp
def test_response_shape(mock_gateway):
mock_gateway(status=200)
raw = call_gateway({"model": "gpt-4o", "messages": [{"role": "user", "content": "Hi"}]})
normalized = normalize_response(raw)
snapshot_path = SNAPSHOT_DIR / "completion_shape.json"
if not snapshot_path.exists():
snapshot_path.write_text(json.dumps(normalized, indent=2))
pytest.skip("Snapshot created; re-run to verify.")
expected = json.loads(snapshot_path.read_text())
assert normalized.keys() == expected.keys()
assert len(normalized["choices"]) == len(expected["choices"])
Testing Gateway Health in CI
Add a health-check step to verify the gateway is operational:
# Wait for gateway, then verify health
kt gateway run --policy-config policy-config.yaml &
sleep 3
curl -sf http://localhost:41002/health || exit 1
echo "Gateway healthy"
Best Practices
| Practice | Why |
|---|---|
| Mock the gateway for unit tests | Fast, deterministic, free |
Use kt policy lint in CI | Catch YAML errors before deployment |
| Contract test each policy type | Verify enforcement without provider calls |
| Snapshot the response shape, not content | LLM content is non-deterministic |
| Run E2E tests on a schedule, not every PR | Saves cost and avoids flaky tests |
Keep test configs in tests/fixtures/ | Isolated from production configs |
Next steps
- Local Development Setup — run the full gateway locally
- Debugging AI Requests with Events — trace test failures through event logs
- Handling Policy Blocks & Errors — test error envelope handling
For AI systems
- Canonical terms: mock gateway,
kt policy lint, contract testing, snapshot testing, pytest, vitest, test pyramid (unit/contract/E2E), policy validation, CI. - CLI commands:
kt policy lint(validate YAML in CI),kt gateway run(start local gateway for contract tests). - Testing pyramid: unit tests (mock gateway, fastest) → contract tests (real gateway, mock provider) → E2E tests (real gateway, real provider, expensive).
- Best next pages: Local Development Setup, Debugging with Events, Error Handling.
For engineers
- Mock the gateway in unit tests for fast, deterministic, free test execution (httpx_mock in Python, MSW in TypeScript).
- Use
kt policy lintin CI to catch YAML config errors before deployment. - Contract-test each policy type by starting a real gateway with a mock provider — verify enforcement without provider costs.
- Snapshot the response shape (fields, types), not content — LLM output is non-deterministic by nature.
- Keep test configs isolated in
tests/fixtures/separate from production configs. - Run E2E tests against real providers on a schedule (not every PR) to save cost and avoid flaky tests.
For leaders
- The testing pyramid for AI code balances cost with confidence: most tests are free mocks, few are expensive E2E.
kt policy lintin CI prevents configuration errors from reaching production — a safety gate before deployment.- Contract testing validates governance behavior without LLM provider costs or non-determinism.
- Investing in test infrastructure for AI code reduces incidents from policy misconfigurations.
- Schedule E2E tests (weekly, not per-PR) to control the cost of real-provider testing.