Quality Benchmarking Template
Policy configuration for enforcing response quality thresholds.
Use this page when
- You want to enforce minimum quality standards on AI responses before they reach end users.
- You need a starting config with model-graded closed-QA and factual assertions that block or escalate low-quality outputs.
- You want to go from zero to a quality-gated gateway with
kt init --template quality-benchmarking.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Policy Config
pack:
name: quality-benchmarking
version: 0.1.0
enabled: true
description: AI response quality assurance
policies:
chain:
- prompt-injection
- quality-scorer
- audit-logger
policy:
prompt-injection:
response:
action: block
message: "Request blocked: potential prompt injection detected"
quality-scorer:
providers:
- id: quality-judge
provider: openai
model: gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
config:
temperature: 0.0
assertions:
- type: llm-rubric
name: closed-qa-correctness
threshold: 0.8
mode: enforce
severity: critical
config:
rubric: Evaluate whether the answer directly resolves the user question and avoids unsupported claims.
- type: factuality
name: factual-grounding
threshold: 0.7
mode: enforce
severity: critical
config:
reference_statement: The response must remain faithful to the approved source material or retrieved context.
pass_policy:
strategy: weighted_average
threshold: 0.75
failure_action:
action: block
audit-logger:
retention_days: 365
providers:
targets:
- id: openai-primary
provider: openai
model: gpt-4o-mini
secret_key_ref:
env: OPENAI_API_KEY
Quick Start
# Save the Policy Config example on this page as policy-config.yaml
export OPENAI_API_KEY="sk-your-openai-key"
kt policy lint --file policy-config.yaml
kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml
Set OPENAI_API_KEY before running the gateway. The upstream provider and the judge model both resolve that secret through secret_key_ref.
If you prefer the seeded starter, run kt init --template quality-benchmarking first and then add the provider block shown in the example config before linting and running.
For AI systems
- Canonical terms: Keeptrusts, quality-benchmarking, policy-config.yaml,
kt init --template quality-benchmarking, quality-scorer, llm-rubric, factuality, pass_policy, failure_action. - Related policy kinds:
prompt-injection,quality-scorer,audit-logger. - Best next pages: Quality Assertions Configuration, Citation Verification template, Templates overview.
For engineers
- Prerequisites:
ktCLI installed, an LLM provider API key (e.g.,OPENAI_API_KEY). - Validate:
kt policy lint --file policy-config.yamlmust pass. - Test: send a query and check that low-quality responses (vague, factually incorrect) are blocked based on the assertion thresholds and pass policy.
- Key tuning: adjust
pass_policy.threshold(0.75here) and each assertionthresholdbased on the acceptable quality floor for your use case.
For leaders
- This template ensures AI responses meet a measurable quality bar before reaching users, reducing the risk of incorrect or low-value outputs.
- Quality scoring runs as an output-phase policy — it adds latency proportional to the judge model's response time but provides objective quality metrics.
- Audit logging captures quality scores per request, enabling trend analysis and SLA reporting.
- Pair with human-oversight for escalation of borderline responses rather than hard blocking.
Next steps
- Quality Assertions Configuration — full assertion types and scoring reference
- Templates overview — browse all available templates
- Citation Verification template — add source-grounding verification
- Flagged Review Configuration — secondary LLM judge for borderline cases