Skip to main content

Quality-Aware Retry

How It Works

The gateway detects quality issues in provider responses:

  • Empty responses
  • Truncated output
  • Content-filtered responses
  • Malformed JSON (when structured output is expected)

When detected, the gateway retries with the next provider in the compliant candidate set — never widening the security envelope.

Configuration

quality_retry:
enabled: true
max_retries: 2
retry_on:
- empty_response
- truncated_response
- content_filtered
- malformed_json

Limitations

  • Streaming responses are never retried — quality cannot be assessed mid-stream
  • Retries stay within the compliant candidate set — security constraints are never relaxed
  • Max retries is capped — prevents infinite retry loops