Escalation Policies: Routing Flagged Content to Human Reviewers

Route flagged content to human reviewers in Keeptrusts by doing two separate things on purpose: emit an escalate verdict from the policy chain, and attach providers.targets[].escalation_routing to the provider target that should own the case. In practice, flagged-review is the better fit when only borderline or high-risk outputs should go to review, while Human Oversight is the simpler stop-everything switch when a route requires manual approval for every response.

Use this page when

You need a reliable way to send risky outputs into a human review queue instead of returning them directly to the caller.
You want to understand the difference between emitting an escalation and assigning that escalation.
You are deciding between flagged-review, human-oversight, and stricter output controls such as Financial Compliance or Healthcare Compliance.

Primary audience

Primary: Technical Engineers and AI platform operators
Secondary: Technical Leaders and review-team owners

The key idea

Escalation in Keeptrusts is not one setting. It is two coordinated decisions.

The first decision is whether the gateway should emit an escalation at all. That comes from a policy verdict such as escalate.

The second decision is where that escalated item should go. That comes from escalation_routing on the matched provider target, not from the policy block itself.

That distinction matters because teams often try to solve routing entirely inside the policy definition. The current runtime does not work that way. A policy can decide that the output is too sensitive, too weak, too uncertain, or too regulated to return normally. The provider target then contributes the ownership hint that downstream review systems can use.

If you blur those two responsibilities together, you usually get one of two bad outcomes:

Everything escalates, but the queue has no clear owner.
The policy logic stays vague because routing concerns are mixed into the enforcement design.

Which escalation policy should you choose?

Use flagged-review when you want a review-oriented output policy that can judge borderline content, ask for a reviewed response, or immediately emit an escalation result. Its supported modes are judge, review_and_return, audit_only, and escalate. That makes it the better choice when escalation is part of a broader review workflow.

Use human-oversight when the rule is simpler: any output that reaches this point in the chain must stop and go to manual review. The current implementation is deliberately narrow. With action: escalate, the gateway returns successful completion metadata with null assistant content, records oversight.required, and emits an escalation decision event. It does not implement assignment logic, queue rules, or nuanced reviewer branching inside the policy block.

That is why these two policies are related but not interchangeable.

flagged-review is a review policy surface.
human-oversight is a manual-approval switch.

A practical routed escalation config

This pattern routes flagged output to a specific review team when the request uses the configured target:

pack:
  name: flagged-output-review
  version: 1.0.0
  enabled: true

providers:
  targets:
    - id: openai-regulated-review
      provider: openai
      model: gpt-5.4-mini
      secret_key_ref:
        env: OPENAI_API_KEY
      escalation_routing:
        team_id: team-risk-review

policies:
  chain:
    - quality-scorer
    - flagged-review
    - audit-logger

policy:
  quality-scorer:
    min_output_chars: 80
    thresholds:
      min_aggregate: 0.75

  flagged-review:
    mode: escalate
    recursion_depth_max: 1
    provider_isolation: true

  audit-logger:
    retention_days: 365

This does three useful things.

First, it keeps a normal output gate in front of the escalation path. In this example, Quality Scorer can fail weak responses before they become human work.

Second, it uses flagged-review in escalate mode so the runtime emits an escalation result immediately. That is the point where the content stops being a normal end-user response and becomes a review case.

Third, it attaches team_id: team-risk-review to the provider target. That gives downstream systems a stable routing hint for ownership.

Why provider-level routing is the right boundary

Provider-level escalation_routing may feel slightly indirect at first, but it solves a real operational problem.

The same policy chain can protect multiple targets, models, or environments while routing escalations to different teams. That means you do not have to clone the entire config just to say that one model family routes to fraud review and another routes to clinical review.

The Escalation Routing Configuration docs make this explicit:

escalation_routing lives under providers.targets[]
exactly one of team_id or user_id is required
different targets can carry different routing hints

That gives you a stable control-plane boundary:

policy blocks decide whether a response should escalate
provider targets decide who should see it

Where compliance policies fit

Not every risky response should go to a human reviewer first. Some responses should simply never be returned.

That is where domain-specific output controls still matter.

Financial Compliance blocks configured phrases in output and prepends disclaimers for advice-like responses. Healthcare Compliance does the same for medical-output governance. If you already know a category of content is unacceptable, these controls are usually cheaper and more predictable than pushing everything into a queue.

Human review is most valuable when the rule is judgment-heavy:

the answer may be acceptable with context or revision
the case needs an accountable reviewer note
you want a feedback loop for future policy tuning

That is why a common production pattern is:

use deterministic blockers where the answer should never pass
use quality or grounding checks where evidence matters
escalate the remainder that still needs human judgment

For example, Citation Verifier can block unverified outputs when output_action.unverified_action: block. If groundedness is mandatory, block there and spare reviewers the noise. If groundedness is useful but not final, keep the details and escalate only the cases that still require human sign-off.

Avoid the most common escalation design mistake

The common mistake is using Human Oversight as if it were a selective classifier.

It is not.

The current implementation treats it as a focused output-phase escalation control. If the block exists and action: escalate is set, the runtime marks the result as ESCALATE, returns null assistant content, and emits the decision event. That is ideal for routes that truly require manual approval. It is the wrong tool when you only want some outputs reviewed.

For selective review, prefer flagged-review, route-specific conditional chains, or upstream deterministic controls that leave only the ambiguous cases for human handling.

How to validate the flow before rollout

Treat routed escalation like any other policy behavior: validate the config, test the expected verdict, and verify the resulting event metadata.

kt policy lint --file policy-config.yaml
kt policy test --json
kt events tail --since 1h --json

The first command catches schema mistakes. The second confirms the pack returns the expected verdicts. The third helps you verify that escalated decisions are actually showing up with the routing metadata you expect after live traffic or a smoke test.

That is important because review workflows usually fail at the seams. The policy may escalate correctly while the downstream queue ignores the routing hint, or the routing hint may exist while the wrong target is being selected at runtime. Testing the full path prevents both classes of failure.

Key takeaways

Emitting an escalation and assigning an escalation are different concerns in Keeptrusts.
flagged-review is the better selective review policy when only flagged content should go to humans.
human-oversight is the stronger manual-approval switch when the whole route requires review.
providers.targets[].escalation_routing is where ownership hints belong.
Deterministic blockers such as financial or healthcare compliance controls should remove obvious bad cases before they become queue work.

Escalation Policies: Routing Flagged Content to Human Reviewers

Use this page when​

Primary audience​

The key idea​

Which escalation policy should you choose?​

A practical routed escalation config​

Why provider-level routing is the right boundary​

Where compliance policies fit​

Avoid the most common escalation design mistake​

How to validate the flow before rollout​

Key takeaways​

Next steps​