Case Study: 90% Hallucination Reduction with Governed Knowledge
This is a representative Keeptrusts rollout rather than a named customer disclosure. The point is not the logo. The point is the operating pattern. In a controlled support-assistant evaluation, a team reduced unsupported answers from 42 to 4 across 200 questions after moving from prompt-only guidance to governed knowledge assets plus output verification. That is a 90.5% reduction in hallucinated or unsupported responses, and it came from a surprisingly disciplined set of changes.
Use this page when
- You want a concrete example of how Keeptrusts knowledge governance improves answer quality.
- You need to explain grounded AI to stakeholders in terms of measurable outcomes rather than architecture alone.
- You are designing a before-and-after evaluation for a knowledge-heavy assistant.
Primary audience
- Primary: Technical Leaders and platform owners
- Secondary: Technical Engineers, QA teams, compliance reviewers
The problem
The assistant in this evaluation handled customer-support and policy questions. Before Keeptrusts knowledge governance was introduced, the team relied on two weak forms of grounding: a long system prompt and a small set of manually maintained snippets pasted into the application layer.
The model often looked helpful, but the team found three classes of failure during review:
- Answers used the right terminology but cited no approved source.
- Answers partially matched the policy but added unsupported details about exceptions or thresholds.
- Answers reflected older operational rules because prompt text had not kept up with the latest documentation.
The baseline evaluation used 200 representative prompts drawn from refund policy, shipping rules, escalation guidance, and account-access procedures. Reviewers marked an answer as unsupported if it contradicted the approved source, could not be traced to the approved source, or filled in missing detail beyond what the source allowed.
The baseline result was not catastrophic, but it was not deployable for a high-trust workflow:
- 42 answers were unsupported or partly hallucinated.
- 31 answers lacked usable source backing even when the answer looked plausible.
- Reviewers spent too much time deciding whether an answer was “close enough” because the system had no explicit grounding evidence.
The solution
The team changed three things, all using documented Keeptrusts capabilities.
First, they moved the approved documentation into versioned Knowledge Base assets instead of keeping the source of truth in prompt fragments. That made content review and activation explicit.
Second, they bound those assets to the support assistant so only the relevant runtime received them.
Third, they enabled Citation Verifier with source requirements, context overlap checks, and a blocking action for unverified output.
The important point is that no single change produced the full effect. The reduction came from the combination of governed source material and an output-side enforcement gate.
Implementation
The evaluation rollout looked like this:
kt kb create --name "Support Canonical Policy" --scope org --kind upload --write-mode facts_only
kt kb mine --source ./support-approved-docs --output kb-support-canon.json
kt kb upload --manifest kb-support-canon.json --asset-id kb_support_canon_2026
kt kb promote --id kb_support_canon_2026 --to active
kt kb bind --id kb_support_canon_2026 --target-type agent --target-id agent_support_assistant
Then the team configured grounded-output enforcement:
pack:
name: support-grounding-eval
version: 1.0.0
enabled: true
policies:
chain:
- citation-verifier
- audit-logger
policy:
citation-verifier:
require_sources: true
require_source_match: true
rag_context:
verify_against_context: true
min_context_overlap: 0.7
output_action:
unverified_action: block
audit-logger:
retention_days: 365
The team then reran the same 200-question set.
That choice matters. If you change the prompts and the evaluation set at the same time, you learn almost nothing. Reusing the same question set made the improvement interpretable. Unsupported answers dropped from 42 to 4. Several answers that would previously have reached the user were now blocked because they were not sufficiently grounded in the approved source. Reviewers no longer had to infer support from tone. They could inspect citations and verification details.
The remaining four failures were not mysterious. Two came from thin source material that needed better coverage. One came from an ambiguous source sentence that the team rewrote in the underlying document. One came from a question that combined two policies the current asset treated separately. All four were easier to fix because the failure mode was visible.
Results and impact
The 90.5% reduction was the headline number, but it was not the only meaningful outcome.
Review time dropped because unsupported answers were easier to identify. Instead of debating whether the answer “felt right,” reviewers could ask whether the response met the verifier’s requirements and whether the cited material actually covered the claim.
Change management improved too. When policy updates arrived, the team updated the Knowledge Base asset, promoted the new version, and reran the evaluation. That replaced prompt archaeology with a repeatable lifecycle.
The assistant also became safer to expand. Once the team trusted the knowledge-governance loop in support, they had a credible pattern for other grounded workflows such as internal help desk answers and policy Q&A.
The broader lesson is that hallucination reduction is usually not a model-selection problem first. It is often a source-governance and verification problem. Better sources plus a stronger output gate frequently outperform prompt tweaks alone.
Key takeaways
- The biggest improvement came from combining governed knowledge assets with output verification, not from one isolated setting.
- Reusing the same evaluation set before and after rollout made the improvement measurable and credible.
- Versioned assets and explicit bindings made source control tractable.
- Citation-based verification reduced reviewer ambiguity as much as it reduced unsupported answers.
- The remaining failures were actionable because they pointed back to source quality and coverage gaps.