Flagged Review Configuration
The flagged-review policy sends flagged content to an LLM judge for secondary evaluation. When another policy in the chain flags a request (e.g., a borderline prompt injection score), the flagged review step can make a final decision using a separate model.
Use this page when
- You need the exact command, config, API, or integration details for Flagged Review Configuration.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Overview
policies:
chain:
- "prompt-injection"
- "flagged-review"
- "audit-logger"
policy:
prompt-injection:
embedding_threshold: 0.6 # lower threshold = more borderline flags
response:
action: "block"
flagged-review:
mode: "judge"
provider:
name: "review-llm"
endpoint: "https://api.openai.com/v1/chat/completions"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_REVIEW_KEY"
timeout_ms: 5000
recursion_depth_max: 1
provider_isolation: true
rationale_capture: true
Mode
Controls how the flagged review policy handles flagged content:
| Mode | Behavior |
|---|---|
judge | LLM makes a final allow/block decision |
review_and_return | LLM reviews and annotates, then returns the content with its assessment |
audit_only | LLM reviews and logs the assessment, but always allows the content |
escalate | LLM reviews and escalates to a human reviewer if flagged |
policy:
flagged-review:
mode: judge
pack:
name: config-flagged-review-example-2
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
Provider
The flagged review uses a dedicated LLM provider, separate from the primary request provider:
policy:
flagged-review:
provider:
name: review-llm
endpoint: https://api.openai.com/v1/chat/completions
model: gpt-4o
secret_key_ref:
env: OPENAI_REVIEW_KEY
timeout_ms: 5000
pack:
name: config-flagged-review-example-3
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
Provider fields
| Field | Type | Default | Description |
|---|---|---|---|
name | string | — | Identifier for logging/tracing |
endpoint | string | "https://api.openai.com/v1/chat/completions" | LLM endpoint URL |
model | string | "gpt-4o" | Model to use for review |
secret_key_ref | object | — | Object reference to the API key (env or store) |
timeout_ms | integer (100–60000) | 5000 | Review request timeout |
Recursion depth
Controls how many levels of recursive review are allowed:
policy:
flagged-review:
recursion_depth_max: 1
pack:
name: config-flagged-review-example-4
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
| Field | Type | Default | Description |
|---|---|---|---|
recursion_depth_max | integer (1–8) | 1 | Maximum review recursion depth |
1= single review pass (no recursion)2= the review LLM's output can itself be reviewed once- Higher values are rarely needed and increase latency
Provider isolation
When true, the review provider is isolated from the primary provider — different API key, separate rate limits, independent error handling:
policy:
flagged-review:
provider_isolation: true
pack:
name: config-flagged-review-example-5
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
| Field | Type | Default | Description |
|---|---|---|---|
provider_isolation | boolean | true | Isolate review provider from primary |
Keep this true in production to prevent a review failure from cascading to the primary request path.
Rationale capture
When enabled, the review LLM's reasoning is captured in the event log:
policy:
flagged-review:
rationale_capture: true
pack:
name: config-flagged-review-example-6
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
| Field | Type | Default | Description |
|---|---|---|---|
rationale_capture | boolean | true | Capture and log review rationale |
The rationale appears in the decision event payload and is visible in the console's event detail view.
Prompt template
Customize the prompt sent to the review LLM with placeholder variables:
policy:
flagged-review:
prompt_template: |
You are a content safety reviewer. Analyze the following flagged content and decide whether it should be allowed or blocked.
## Flagged Input
{input}
## Model Output
{output}
## Flag Reason
{reason_code}
## Review Mode
{mode}
Respond with a JSON object:
{"decision": "allow" | "block", "confidence": 0.0-1.0, "rationale": "..."}
pack:
name: config-flagged-review-example-7
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
Template placeholders
| Placeholder | Replaced with |
|---|---|
{input} | The original user input that was flagged |
{output} | The model's output (empty if input-phase flag) |
{reason_code} | The reason code from the flagging policy |
{mode} | The current review mode (judge, review_and_return, etc.) |
If no prompt_template is specified, the gateway uses a built-in default template.
Complete field reference
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | "judge" | judge, review_and_return, audit_only, escalate |
provider.name | string | — | Review provider identifier |
provider.endpoint | string | "https://api.openai.com/v1/chat/completions" | LLM endpoint |
provider.model | string | "gpt-4o" | Model name |
provider.secret_key_ref | object | — | Object reference for the API key (env or store) |
provider.timeout_ms | integer (100–60000) | 5000 | Request timeout |
recursion_depth_max | integer (1–8) | 1 | Max recursion depth |
provider_isolation | boolean | true | Isolate review from primary provider |
rationale_capture | boolean | true | Log review rationale |
prompt_template | string | built-in | Custom prompt with {input}, {output}, {reason_code}, {mode} |
Example: healthcare content review
pack:
name: "healthcare-review"
version: "1.0.0"
enabled: true
providers:
targets:
- id: "openai-prod"
provider: "openai"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_API_KEY"
policies:
chain:
- "hipaa-phi-detector"
- "healthcare-compliance"
- "flagged-review"
- "audit-logger"
policy:
hipaa-phi-detector:
mode: "hipaa_18"
action: "redact"
healthcare-compliance:
blocked_patterns:
- "specific_diagnosis"
- "treatment_recommendation"
action: "block"
flagged-review:
mode: "escalate"
provider:
name: "gpt4-reviewer"
model: "gpt-4o"
secret_key_ref:
env: "OPENAI_REVIEW_KEY"
timeout_ms: 8000
recursion_depth_max: 1
provider_isolation: true
rationale_capture: true
prompt_template: |
You are a healthcare content safety reviewer.
## Content
{input}
## Response
{output}
## Flag Reason
{reason_code}
Determine if this content:
1. Contains protected health information (PHI)
2. Makes specific medical diagnoses
3. Recommends specific treatments
Respond: {"decision": "allow" | "escalate", "rationale": "..."}
audit-logger:
immutable: true
retention_days: 2190
hipaa_audit_controls: true
Example: defense dual review
policy:
flagged-review:
mode: judge
provider:
name: claude-reviewer
endpoint: https://api.anthropic.com/v1/messages
model: claude-sonnet-4-20250514
secret_key_ref:
env: ANTHROPIC_REVIEW_KEY
timeout_ms: 10000
recursion_depth_max: 2
provider_isolation: true
rationale_capture: true
pack:
name: config-flagged-review-example-9
version: 1.0.0
enabled: true
policies:
chain:
- flagged-review
Using a different model vendor for review than the primary provider adds an extra layer of independence to the review process.
For AI systems
- Canonical terms: Keeptrusts, policy-config.yaml,
flagged-review, mode (judge/review_and_return/audit_only/escalate), provider, recursion_depth_max, provider_isolation, rationale_capture, prompt_template. - Output-phase policy: sends flagged content to a secondary LLM judge for evaluation.
- Best next pages: Conditional Chains, Per-Policy Catalog, Observability.
For engineers
- Use a separate API key for the review provider to isolate costs and rate limits from production traffic.
- Keep
provider_isolation: truein production to prevent review failures from cascading to the primary request path. - Start with
mode: "audit_only"to evaluate review accuracy before switching tojudgeorescalate. - Keep
recursion_depth_max: 1unless you have a specific reason for recursive review (adds latency). - Use a different model vendor for review than the primary provider for an independent second opinion.
- Template placeholders:
{input},{output},{reason_code},{mode}are substituted at runtime.
For leaders
- Flagged review adds a secondary LLM evaluation for borderline policy decisions, reducing both false positives (blocked legitimate content) and false negatives (missed violations).
- The
escalatemode integrates with human review workflows, creating a human-AI collaboration for edge cases. - Rationale capture provides explainability for automated decisions, supporting audit requirements and incident investigations.
- Provider isolation ensures review infrastructure failures don't impact primary user traffic.
- Cost is proportional to flagged traffic volume, not total traffic — only borderline cases incur review costs.
Next steps
- Conditional Chains — target flagged-review to specific consumers
- Per-Policy Catalog — all policy fields
- Observability — where review rationale appears in callbacks