Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Flagged Review Configuration

The flagged-review policy sends flagged content to an LLM judge for secondary evaluation. When another policy in the chain flags a request (e.g., a borderline prompt injection score), the flagged review step can make a final decision using a separate model.

Use this page when

  • You need the exact command, config, API, or integration details for Flagged Review Configuration.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Overview

policies:
chain:
- "prompt-injection"
- "flagged-review"
- "audit-logger"

policy:
prompt-injection:
embedding_threshold: 0.6 # lower threshold = more borderline flags
response:
action: "block"

flagged-review:
mode: "judge"
provider:
name: "review-llm"
endpoint: "https://api.openai.com/v1/chat/completions"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_REVIEW_KEY"
timeout_ms: 5000
recursion_depth_max: 1
provider_isolation: true
rationale_capture: true

Mode

Controls how the flagged review policy handles flagged content:

ModeBehavior
judgeLLM makes a final allow/block decision
review_and_returnLLM reviews and annotates, then returns the content with its assessment
audit_onlyLLM reviews and logs the assessment, but always allows the content
escalateLLM reviews and escalates to a human reviewer if flagged
policy:
flagged-review:
mode: judge
pack:
name: config-flagged-review-example-2
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review

Provider

The flagged review uses a dedicated LLM provider, separate from the primary request provider:

policy:
flagged-review:
provider:
name: review-llm
endpoint: https://api.openai.com/v1/chat/completions
model: gpt-4o
secret_key_ref:
env: OPENAI_REVIEW_KEY
timeout_ms: 5000
pack:
name: config-flagged-review-example-3
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review

Provider fields

FieldTypeDefaultDescription
namestringIdentifier for logging/tracing
endpointstring"https://api.openai.com/v1/chat/completions"LLM endpoint URL
modelstring"gpt-4o"Model to use for review
secret_key_refobjectObject reference to the API key (env or store)
timeout_msinteger (100–60000)5000Review request timeout
Use a separate API key for the review provider to isolate costs and rate limits from production traffic.

Recursion depth

Controls how many levels of recursive review are allowed:

policy:
flagged-review:
recursion_depth_max: 1
pack:
name: config-flagged-review-example-4
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
FieldTypeDefaultDescription
recursion_depth_maxinteger (1–8)1Maximum review recursion depth
  • 1 = single review pass (no recursion)
  • 2 = the review LLM's output can itself be reviewed once
  • Higher values are rarely needed and increase latency

Provider isolation

When true, the review provider is isolated from the primary provider — different API key, separate rate limits, independent error handling:

policy:
flagged-review:
provider_isolation: true
pack:
name: config-flagged-review-example-5
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
FieldTypeDefaultDescription
provider_isolationbooleantrueIsolate review provider from primary

Keep this true in production to prevent a review failure from cascading to the primary request path.

Rationale capture

When enabled, the review LLM's reasoning is captured in the event log:

policy:
flagged-review:
rationale_capture: true
pack:
name: config-flagged-review-example-6
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
FieldTypeDefaultDescription
rationale_capturebooleantrueCapture and log review rationale

The rationale appears in the decision event payload and is visible in the console's event detail view.

Prompt template

Customize the prompt sent to the review LLM with placeholder variables:

policy:
flagged-review:
prompt_template: |
You are a content safety reviewer. Analyze the following flagged content and decide whether it should be allowed or blocked.

## Flagged Input
{input}

## Model Output
{output}

## Flag Reason
{reason_code}

## Review Mode
{mode}

Respond with a JSON object:
{"decision": "allow" | "block", "confidence": 0.0-1.0, "rationale": "..."}
pack:
name: config-flagged-review-example-7
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review

Template placeholders

PlaceholderReplaced with
{input}The original user input that was flagged
{output}The model's output (empty if input-phase flag)
{reason_code}The reason code from the flagging policy
{mode}The current review mode (judge, review_and_return, etc.)

If no prompt_template is specified, the gateway uses a built-in default template.

Complete field reference

FieldTypeDefaultDescription
modestring"judge"judge, review_and_return, audit_only, escalate
provider.namestringReview provider identifier
provider.endpointstring"https://api.openai.com/v1/chat/completions"LLM endpoint
provider.modelstring"gpt-4o"Model name
provider.secret_key_refobjectObject reference for the API key (env or store)
provider.timeout_msinteger (100–60000)5000Request timeout
recursion_depth_maxinteger (1–8)1Max recursion depth
provider_isolationbooleantrueIsolate review from primary provider
rationale_capturebooleantrueLog review rationale
prompt_templatestringbuilt-inCustom prompt with {input}, {output}, {reason_code}, {mode}

Example: healthcare content review

pack:
name: "healthcare-review"
version: "1.0.0"
enabled: true

providers:
targets:
- id: "openai-prod"
provider: "openai"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_API_KEY"

policies:
chain:
- "hipaa-phi-detector"
- "healthcare-compliance"
- "flagged-review"
- "audit-logger"

policy:
hipaa-phi-detector:
mode: "hipaa_18"
action: "redact"

healthcare-compliance:
blocked_patterns:
- "specific_diagnosis"
- "treatment_recommendation"
action: "block"

flagged-review:
mode: "escalate"
provider:
name: "gpt4-reviewer"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_REVIEW_KEY"
timeout_ms: 8000
recursion_depth_max: 1
provider_isolation: true
rationale_capture: true
prompt_template: |
You are a healthcare content safety reviewer.

## Content
{input}

## Response
{output}

## Flag Reason
{reason_code}

Determine if this content:
1. Contains protected health information (PHI)
2. Makes specific medical diagnoses
3. Recommends specific treatments

Respond: {"decision": "allow" | "escalate", "rationale": "..."}

audit-logger:
immutable: true
retention_days: 2190
hipaa_audit_controls: true

Example: defense dual review

policy:
flagged-review:
mode: judge
provider:
name: claude-reviewer
endpoint: https://api.anthropic.com/v1/messages
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_REVIEW_KEY
timeout_ms: 10000
recursion_depth_max: 2
provider_isolation: true
rationale_capture: true
pack:
name: config-flagged-review-example-9
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review

Using a different model vendor for review than the primary provider adds an extra layer of independence to the review process.

For AI systems

  • Canonical terms: Keeptrusts, policy-config.yaml, flagged-review, mode (judge/review_and_return/audit_only/escalate), provider, recursion_depth_max, provider_isolation, rationale_capture, prompt_template.
  • Output-phase policy: sends flagged content to a secondary LLM judge for evaluation.
  • Best next pages: Conditional Chains, Per-Policy Catalog, Observability.

For engineers

  • Use a separate API key for the review provider to isolate costs and rate limits from production traffic.
  • Keep provider_isolation: true in production to prevent review failures from cascading to the primary request path.
  • Start with mode: "audit_only" to evaluate review accuracy before switching to judge or escalate.
  • Keep recursion_depth_max: 1 unless you have a specific reason for recursive review (adds latency).
  • Use a different model vendor for review than the primary provider for an independent second opinion.
  • Template placeholders: {input}, {output}, {reason_code}, {mode} are substituted at runtime.

For leaders

  • Flagged review adds a secondary LLM evaluation for borderline policy decisions, reducing both false positives (blocked legitimate content) and false negatives (missed violations).
  • The escalate mode integrates with human review workflows, creating a human-AI collaboration for edge cases.
  • Rationale capture provides explainability for automated decisions, supporting audit requirements and incident investigations.
  • Provider isolation ensures review infrastructure failures don't impact primary user traffic.
  • Cost is proportional to flagged traffic volume, not total traffic — only borderline cases incur review costs.

Next steps