Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Testing AI-Integrated Code with Keeptrusts

AI-integrated code is notoriously hard to test because responses are non-deterministic. The Keeptrusts gateway gives you stable, policy-enforced boundaries to test against. This guide covers mocking the gateway in unit tests, validating configs in CI, contract testing, and snapshot testing AI responses.

Use this page when

  • You need to mock the Keeptrusts gateway in unit tests for deterministic, fast, free testing.
  • You want to validate policy configurations in CI with kt policy lint.
  • You are writing contract tests that verify policy enforcement without calling real LLM providers.
  • You need to snapshot-test AI response shapes while handling non-deterministic content.

Primary audience

  • Primary: Developers writing tests for AI-integrated applications
  • Secondary: QA Engineers building test suites, DevOps Engineers setting up CI pipelines for AI code

The Testing Pyramid for AI Code

╱ E2E ╲ — real gateway, real provider (slow, expensive)
╱────────╲
╱ Contract ╲ — real gateway, mock provider (fast, deterministic)
╱──────────────╲
╱ Unit (mock gw) ╲ — mock gateway responses (fastest)
╱────────────────────╲

Most of your tests should be unit tests with a mock gateway. Contract tests verify policy behavior. E2E tests run sparingly against real providers.

Mocking the Gateway in Python Tests

Using pytest with httpx

Create a lightweight mock that behaves like the gateway:

# tests/conftest.py
import pytest
from unittest.mock import patch
import httpx

MOCK_COMPLETION = {
"id": "chatcmpl-test123",
"object": "chat.completion",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Hello, how can I help?"},
"finish_reason": "stop",
}],
"usage": {"prompt_tokens": 10, "completion_tokens": 8, "total_tokens": 18},
}

MOCK_POLICY_BLOCK = {
"error": {
"type": "policy_violation",
"message": "Request blocked: contains disallowed content",
"policy": "block-pii-output",
"code": "output_blocked",
}
}

@pytest.fixture
def mock_gateway(httpx_mock):
"""Return a function that registers mock gateway responses."""
def _mock(status=200, body=None):
httpx_mock.add_response(
url="http://localhost:41002/v1/chat/completions",
json=body or MOCK_COMPLETION,
status_code=status,
)
return _mock

Writing a Unit Test

# tests/test_assistant.py
from myapp.assistant import ask_question

def test_successful_response(mock_gateway):
mock_gateway(status=200)
result = ask_question("What is AI governance?")
assert result.content == "Hello, how can I help?"

def test_policy_block_handled(mock_gateway):
mock_gateway(status=409, body=MOCK_POLICY_BLOCK)
result = ask_question("Show me SSN 123-45-6789")
assert result.blocked is True
assert "policy_violation" in result.error_type

Validating Policy Configs in CI

The kt policy lint command checks your YAML without starting the gateway:

kt policy lint --file policy-config.yaml

GitHub Actions Example

# .github/workflows/ai-policy.yml
name: Validate AI Policies
on: [push, pull_request]

jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Keeptrusts CLI
run: |
curl -fsSL https://get.keeptrusts.com | sh
echo "$HOME/.keeptrusts/bin" >> $GITHUB_PATH

- name: Validate policy config
run: kt policy lint --file policy-config.yaml

- name: Validate staging config
run: kt policy lint --file configs/staging.yaml

Add this as a required check so broken configs never reach production.

Contract Testing AI Responses

Contract tests verify that the gateway enforces policies correctly without calling real providers. Start the gateway against a local mock provider:

# tests/contract/test_pii_blocking.py
import subprocess
import time
import httpx
import pytest

@pytest.fixture(scope="module")
def gateway():
"""Start a real gateway against a mock provider."""
proc = subprocess.Popen(
["kt", "gateway", "run", "--policy-config", "tests/fixtures/test-config.yaml"],
env={**dict(__import__("os").environ), "MOCK_PROVIDER_KEY": "test-key"},
)
time.sleep(2) # wait for gateway startup
yield "http://localhost:41002"
proc.terminate()
proc.wait()

def test_ssn_pattern_blocked(gateway):
"""Contract: SSN patterns in output must return 409."""
resp = httpx.post(
f"{gateway}/v1/chat/completions",
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What is 123-45-6789?"}],
},
)
# The mock provider echoes the input; the gateway blocks the SSN pattern
assert resp.status_code == 409
body = resp.json()
assert body["error"]["code"] == "output_blocked"

def test_clean_response_passes(gateway):
resp = httpx.post(
f"{gateway}/v1/chat/completions",
json={
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Explain AI governance."}],
},
)
assert resp.status_code == 200

Snapshot Testing AI Responses

Snapshot tests capture the structure of AI responses and alert you when the shape changes:

# tests/test_snapshots.py
import json
from pathlib import Path

SNAPSHOT_DIR = Path("tests/snapshots")
SNAPSHOT_DIR.mkdir(exist_ok=True)

def normalize_response(resp: dict) -> dict:
"""Strip non-deterministic fields for snapshot comparison."""
resp.pop("id", None)
resp.pop("created", None)
if "choices" in resp:
for choice in resp["choices"]:
choice["message"]["content"] = "<CONTENT>"
return resp

def test_response_shape(mock_gateway):
mock_gateway(status=200)
raw = call_gateway({"model": "gpt-4o", "messages": [{"role": "user", "content": "Hi"}]})
normalized = normalize_response(raw)

snapshot_path = SNAPSHOT_DIR / "completion_shape.json"
if not snapshot_path.exists():
snapshot_path.write_text(json.dumps(normalized, indent=2))
pytest.skip("Snapshot created; re-run to verify.")

expected = json.loads(snapshot_path.read_text())
assert normalized.keys() == expected.keys()
assert len(normalized["choices"]) == len(expected["choices"])

Testing Gateway Health in CI

Add a health-check step to verify the gateway is operational:

# Wait for gateway, then verify health
kt gateway run --policy-config policy-config.yaml &
sleep 3
curl -sf http://localhost:41002/health || exit 1
echo "Gateway healthy"

Best Practices

PracticeWhy
Mock the gateway for unit testsFast, deterministic, free
Use kt policy lint in CICatch YAML errors before deployment
Contract test each policy typeVerify enforcement without provider calls
Snapshot the response shape, not contentLLM content is non-deterministic
Run E2E tests on a schedule, not every PRSaves cost and avoids flaky tests
Keep test configs in tests/fixtures/Isolated from production configs

Next steps

For AI systems

  • Canonical terms: mock gateway, kt policy lint, contract testing, snapshot testing, pytest, vitest, test pyramid (unit/contract/E2E), policy validation, CI.
  • CLI commands: kt policy lint (validate YAML in CI), kt gateway run (start local gateway for contract tests).
  • Testing pyramid: unit tests (mock gateway, fastest) → contract tests (real gateway, mock provider) → E2E tests (real gateway, real provider, expensive).
  • Best next pages: Local Development Setup, Debugging with Events, Error Handling.

For engineers

  • Mock the gateway in unit tests for fast, deterministic, free test execution (httpx_mock in Python, MSW in TypeScript).
  • Use kt policy lint in CI to catch YAML config errors before deployment.
  • Contract-test each policy type by starting a real gateway with a mock provider — verify enforcement without provider costs.
  • Snapshot the response shape (fields, types), not content — LLM output is non-deterministic by nature.
  • Keep test configs isolated in tests/fixtures/ separate from production configs.
  • Run E2E tests against real providers on a schedule (not every PR) to save cost and avoid flaky tests.

For leaders

  • The testing pyramid for AI code balances cost with confidence: most tests are free mocks, few are expensive E2E.
  • kt policy lint in CI prevents configuration errors from reaching production — a safety gate before deployment.
  • Contract testing validates governance behavior without LLM provider costs or non-determinism.
  • Investing in test infrastructure for AI code reduces incidents from policy misconfigurations.
  • Schedule E2E tests (weekly, not per-PR) to control the cost of real-provider testing.