Flagged Review Configuration

The flagged-review policy sends flagged content to an LLM judge for secondary evaluation. When another policy in the chain flags a request (e.g., a borderline prompt injection score), the flagged review step can make a final decision using a separate model.

Use this page when

You need the exact command, config, API, or integration details for Flagged Review Configuration.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Overview

policies:
  chain:
    - "prompt-injection"
    - "flagged-review"
    - "audit-logger"

policy:
  prompt-injection:
    embedding_threshold: 0.6         # lower threshold = more borderline flags
    response:
      action: "block"

  flagged-review:
    mode: "judge"
    provider:
      name: "review-llm"
      endpoint: "https://api.openai.com/v1/chat/completions"
      model: "gpt-4o"
      secret_key_ref:
        env: "OPENAI_REVIEW_KEY"
      timeout_ms: 5000
    recursion_depth_max: 1
    provider_isolation: true
    rationale_capture: true

Mode

Controls how the flagged review policy handles flagged content:

Mode	Behavior
`judge`	LLM makes a final allow/block decision
`review_and_return`	LLM reviews and annotates, then returns the content with its assessment
`audit_only`	LLM reviews and logs the assessment, but always allows the content
`escalate`	LLM reviews and escalates to a human reviewer if flagged

policy:
  flagged-review:
    mode: judge
pack:
  name: config-flagged-review-example-2
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Provider

The flagged review uses a dedicated LLM provider, separate from the primary request provider:

policy:
  flagged-review:
    provider:
      name: review-llm
      endpoint: https://api.openai.com/v1/chat/completions
      model: gpt-4o
      secret_key_ref:
        env: OPENAI_REVIEW_KEY
      timeout_ms: 5000
pack:
  name: config-flagged-review-example-3
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Provider fields

Field	Type	Default	Description
`name`	string	—	Identifier for logging/tracing
`endpoint`	string	`"https://api.openai.com/v1/chat/completions"`	LLM endpoint URL
`model`	string	`"gpt-4o"`	Model to use for review
`secret_key_ref`	object	—	Object reference to the API key (`env` or `store`)
`timeout_ms`	integer (100–60000)	`5000`	Review request timeout

Use a separate API key for the review provider to isolate costs and rate limits from production traffic.

Recursion depth

Controls how many levels of recursive review are allowed:

policy:
  flagged-review:
    recursion_depth_max: 1
pack:
  name: config-flagged-review-example-4
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Field	Type	Default	Description
`recursion_depth_max`	integer (1–8)	`1`	Maximum review recursion depth

1 = single review pass (no recursion)
2 = the review LLM's output can itself be reviewed once
Higher values are rarely needed and increase latency

Provider isolation

When true, the review provider is isolated from the primary provider — different API key, separate rate limits, independent error handling:

policy:
  flagged-review:
    provider_isolation: true
pack:
  name: config-flagged-review-example-5
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Field	Type	Default	Description
`provider_isolation`	boolean	`true`	Isolate review provider from primary

Keep this true in production to prevent a review failure from cascading to the primary request path.

Rationale capture

When enabled, the review LLM's reasoning is captured in the event log:

policy:
  flagged-review:
    rationale_capture: true
pack:
  name: config-flagged-review-example-6
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Field	Type	Default	Description
`rationale_capture`	boolean	`true`	Capture and log review rationale

The rationale appears in the decision event payload and is visible in the console's event detail view.

Prompt template

Customize the prompt sent to the review LLM with placeholder variables:

policy:
  flagged-review:
    prompt_template: |
      You are a content safety reviewer. Analyze the following flagged content and decide whether it should be allowed or blocked.

      ## Flagged Input
      {input}

      ## Model Output
      {output}

      ## Flag Reason
      {reason_code}

      ## Review Mode
      {mode}

      Respond with a JSON object:
      {"decision": "allow" | "block", "confidence": 0.0-1.0, "rationale": "..."}
pack:
  name: config-flagged-review-example-7
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Template placeholders

Placeholder	Replaced with
`{input}`	The original user input that was flagged
`{output}`	The model's output (empty if input-phase flag)
`{reason_code}`	The reason code from the flagging policy
`{mode}`	The current review mode (`judge`, `review_and_return`, etc.)

If no prompt_template is specified, the gateway uses a built-in default template.

Complete field reference

Field	Type	Default	Description
`mode`	string	`"judge"`	`judge`, `review_and_return`, `audit_only`, `escalate`
`provider.name`	string	—	Review provider identifier
`provider.endpoint`	string	`"https://api.openai.com/v1/chat/completions"`	LLM endpoint
`provider.model`	string	`"gpt-4o"`	Model name
`provider.secret_key_ref`	object	—	Object reference for the API key (`env` or `store`)
`provider.timeout_ms`	integer (100–60000)	`5000`	Request timeout
`recursion_depth_max`	integer (1–8)	`1`	Max recursion depth
`provider_isolation`	boolean	`true`	Isolate review from primary provider
`rationale_capture`	boolean	`true`	Log review rationale
`prompt_template`	string	built-in	Custom prompt with `{input}`, `{output}`, `{reason_code}`, `{mode}`

Example: healthcare content review

pack:
  name: "healthcare-review"
  version: "1.0.0"
  enabled: true

providers:
  targets:
    - id: "openai-prod"
      provider: "openai"
      model: "gpt-4o"
      secret_key_ref:
        env: "OPENAI_API_KEY"

policies:
  chain:
    - "hipaa-phi-detector"
    - "healthcare-compliance"
    - "flagged-review"
    - "audit-logger"

policy:
  hipaa-phi-detector:
    mode: "hipaa_18"
    action: "redact"

  healthcare-compliance:
    blocked_patterns:
      - "specific_diagnosis"
      - "treatment_recommendation"
    action: "block"

  flagged-review:
    mode: "escalate"
    provider:
      name: "gpt4-reviewer"
      model: "gpt-4o"
      secret_key_ref:
        env: "OPENAI_REVIEW_KEY"
      timeout_ms: 8000
    recursion_depth_max: 1
    provider_isolation: true
    rationale_capture: true
    prompt_template: |
      You are a healthcare content safety reviewer.

      ## Content
      {input}

      ## Response
      {output}

      ## Flag Reason
      {reason_code}

      Determine if this content:
      1. Contains protected health information (PHI)
      2. Makes specific medical diagnoses
      3. Recommends specific treatments

      Respond: {"decision": "allow" | "escalate", "rationale": "..."}

  audit-logger:
    immutable: true
    retention_days: 2190
    hipaa_audit_controls: true

Example: defense dual review

policy:
  flagged-review:
    mode: judge
    provider:
      name: claude-reviewer
      endpoint: https://api.anthropic.com/v1/messages
      model: claude-sonnet-4-20250514
      secret_key_ref:
        env: ANTHROPIC_REVIEW_KEY
      timeout_ms: 10000
    recursion_depth_max: 2
    provider_isolation: true
    rationale_capture: true
pack:
  name: config-flagged-review-example-9
  version: 1.0.0
  enabled: true
policies:
  chain:
  - flagged-review

Using a different model vendor for review than the primary provider adds an extra layer of independence to the review process.

For AI systems

Canonical terms: Keeptrusts, policy-config.yaml, flagged-review, mode (judge/review_and_return/audit_only/escalate), provider, recursion_depth_max, provider_isolation, rationale_capture, prompt_template.
Output-phase policy: sends flagged content to a secondary LLM judge for evaluation.
Best next pages: Conditional Chains, Per-Policy Catalog, Observability.

For engineers

Use a separate API key for the review provider to isolate costs and rate limits from production traffic.
Keep provider_isolation: true in production to prevent review failures from cascading to the primary request path.
Start with mode: "audit_only" to evaluate review accuracy before switching to judge or escalate.
Keep recursion_depth_max: 1 unless you have a specific reason for recursive review (adds latency).
Use a different model vendor for review than the primary provider for an independent second opinion.
Template placeholders: {input}, {output}, {reason_code}, {mode} are substituted at runtime.

For leaders

Flagged review adds a secondary LLM evaluation for borderline policy decisions, reducing both false positives (blocked legitimate content) and false negatives (missed violations).
The escalate mode integrates with human review workflows, creating a human-AI collaboration for edge cases.
Rationale capture provides explainability for automated decisions, supporting audit requirements and incident investigations.
Provider isolation ensures review infrastructure failures don't impact primary user traffic.
Cost is proportional to flagged traffic volume, not total traffic — only borderline cases incur review costs.

Next steps

Conditional Chains — target flagged-review to specific consumers
Per-Policy Catalog — all policy fields
Observability — where review rationale appears in callbacks

Use this page when​

Primary audience​

Overview​

Mode​

Provider​

Provider fields​

Recursion depth​

Provider isolation​

Rationale capture​

Prompt template​

Template placeholders​

Complete field reference​

Example: healthcare content review​

Example: defense dual review​

For AI systems​

For engineers​

For leaders​

Next steps​