Perplexity AI

Perplexity AI provides a family of online and offline language models purpose-built for search-augmented generation. The sonar family retrieves live web sources at inference time and returns answers with inline citations, while r1-1776 is an offline reasoning model suitable for sensitive workloads where live retrieval is not appropriate. Keeptrusts wraps the Perplexity API with policy enforcement so you can redact PII from search-grounded outputs, verify citation quality, and maintain a complete audit trail for research workflows.

Use this page when

You need the exact command, config, API, or integration details for Perplexity AI.
You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

Primary: AI Agents, Technical Engineers
Secondary: Technical Leaders

Prerequisites

A Perplexity AI API key (PERPLEXITY_API_KEY)
kt CLI installed and authenticated (kt auth login)

Set your key before starting the gateway:

export PERPLEXITY_API_KEY="pplx-..."

Configuration

Minimal — single online model

pack:
  name: perplexity-providers-1
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: perplexity-sonar-pro
    provider: perplexity:chat:sonar-pro
    secret_key_ref:
      env: PERPLEXITY_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Full governance config

pack:
  name: perplexity-research
  version: 1.0.0
  enabled: true
policies:
  chain:
  - prompt-injection
  - pii-detector
  - citation-verifier
  - content-filter
  - audit-logger
policy:
  pii-detector:
    action: redact
    entities:
    - PERSON
    - EMAIL_ADDRESS
    - PHONE_NUMBER
    - CREDIT_CARD
  citation-verifier:
    require_sources: true
    min_grounded_ratio: 0.8
    action_on_failure: warn
  content-filter:
    categories:
    - hate_speech
    - harassment
    - self_harm
    action: block
providers:
  targets:
  - id: perplexity-sonar-pro
    provider: perplexity:chat:sonar-pro
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-sonar
    provider: perplexity:chat:sonar
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-deep-research
    provider: perplexity:chat:sonar-deep-research
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-reasoning-pro
    provider: perplexity:chat:sonar-reasoning-pro
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-offline
    provider: perplexity:chat:r1-1776
    secret_key_ref:
      env: PERPLEXITY_API_KEY

Provider Fields

Field	Required	Description
`provider`	Yes	`"perplexity"` or `"perplexity:chat:{model-id}"`
`secret_key_ref`	Yes	Environment variable holding the Perplexity API key (e.g. `PERPLEXITY_API_KEY`)
`base_url`	No	Defaults to `https://api.perplexity.ai` — override only for proxied or on-prem deployments
`model`	No	Model ID when using the bare `"perplexity"` provider
`format`	No	`"openai"` (Perplexity exposes an OpenAI-compatible endpoint)
`stream_timeout_seconds`	No	Increase for `sonar-deep-research` (300+) which performs multi-step retrieval

Supported Models

Model	Context	Search	Input (per 1M)	Output (per 1M)	Notes
`sonar-pro`	127k	Live web	$3.00	$15.00	Advanced reasoning + citations; recommended default
`sonar`	127k	Live web	$1.00	$1.00	Fast, cost-efficient search-augmented generation
`sonar-deep-research`	127k	Multi-step	$2.00	$8.00	Autonomous research; completes in 30s–5min
`sonar-reasoning-pro`	127k	Live web	$2.00	$8.00	Chain-of-thought reasoning with search grounding
`sonar-reasoning`	127k	Live web	$1.00	$5.00	Reasoning with citations at lower cost
`r1-1776`	128k	None (offline)	$2.00	$8.00	No web retrieval; safe for sensitive/ZDR workloads

Note on online models and data policy — sonar, sonar-pro, sonar-reasoning, and sonar-reasoning-pro perform live web retrieval at request time. This is incompatible with zero_data_retention: true because retrieval inherently externalises query context. Use r1-1776 when ZDR is required.

Client Examples

Start the gateway:

export PERPLEXITY_API_KEY="pplx-..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Python
Node.js
cURL

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="unused",  # auth handled by Keeptrusts
)

# Standard sonar-pro search-augmented query
response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "system",
            "content": "Be precise and concise. Always cite your sources.",
        },
        {
            "role": "user",
            "content": "What EU AI Act obligations take effect in August 2025?",
        },
    ],
    max_tokens=1024,
)
print(response.choices[0].message.content)

# Offline model for sensitive input
offline = client.chat.completions.create(
    model="r1-1776",
    messages=[
        {"role": "user", "content": "Analyse the following internal policy document..."}
    ],
    max_tokens=2048,
)
print(offline.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:41002/v1",
  apiKey: "unused",
});

// Standard sonar-pro query
const response = await client.chat.completions.create({
  model: "sonar-pro",
  messages: [
    {
      role: "system",
      content: "Be precise and concise. Always cite your sources.",
    },
    {
      role: "user",
      content: "What EU AI Act obligations take effect in August 2025?",
    },
  ],
  max_tokens: 1024,
});
console.log(response.choices[0].message.content);

// Deep research — patience required
const research = await client.chat.completions.create({
  model: "sonar-deep-research",
  messages: [
    {
      role: "user",
      content:
        "Produce a comprehensive analysis of GDPR enforcement actions in 2024, including fines, responsible authorities, and precedents set.",
    },
  ],
  max_tokens: 4096,
});
console.log(research.choices[0].message.content);

# sonar-pro search-augmented query
curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar-pro",
    "messages": [
      {"role": "system", "content": "Be precise and concise. Always cite your sources."},
      {"role": "user", "content": "What EU AI Act obligations take effect in August 2025?"}
    ],
    "max_tokens": 1024
  }' | jq .choices[0].message.content

# Offline reasoning (sensitive data safe)
curl -s http://localhost:41002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "r1-1776",
    "messages": [
      {"role": "user", "content": "Analyse the following internal policy document..."}
    ],
    "max_tokens": 2048
  }' | jq .choices[0].message.content

Streaming

All Perplexity models support streaming. For sonar-deep-research, streaming is particularly important because multi-step retrieval can take several minutes — streaming lets you display progress tokens as they arrive rather than blocking the client.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:41002/v1", api_key="unused")

with client.chat.completions.stream(
    model="sonar-deep-research",
    messages=[
        {
            "role": "user",
            "content": "Produce a comprehensive analysis of AI governance regulations enacted globally in 2024.",
        }
    ],
    max_tokens=8192,
) as stream:
    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
print()  # newline after stream

Set stream_timeout_seconds appropriately per model in your config:

pack:
  name: perplexity-providers-3
  version: 1.0.0
  enabled: true
providers:
  targets:
  - id: perplexity-deep-research
    provider: perplexity:chat:sonar-deep-research
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-sonar-pro
    provider: perplexity:chat:sonar-pro
    secret_key_ref:
      env: PERPLEXITY_API_KEY
policies:
  chain:
  - audit-logger
policy:
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true

Advanced Configuration

Routing sensitive queries to the offline model

For workflows that mix public research queries with sensitive internal analysis, use Keeptrusts's routing policy to direct queries containing classified terms to r1-1776 and public queries to sonar-pro:

policies:
  chain:
    - content-classifier
    - router
    - pii-detector
    - audit-logger

policy:
  content-classifier:
    labels:
      sensitive:
        keywords:
          - "internal"
          - "confidential"
          - "proprietary"
          - "classified"
      public:
        default: true

  router:
    rules:
      - when_label: "sensitive"
        target: "perplexity-offline"
      - when_label: "public"
        target: "perplexity-sonar-pro"

providers:
  targets:
    - id: "perplexity-sonar-pro"
      provider: "perplexity:chat:sonar-pro"
      secret_key_ref:
        env: "PERPLEXITY_API_KEY"

    - id: "perplexity-offline"
      provider: "perplexity:chat:r1-1776"
      secret_key_ref:
        env: "PERPLEXITY_API_KEY"

Citation verification for compliance workflows

Research outputs used in compliance, legal, or regulatory filings require grounded citations. Combine citation-verifier with response auditing so rejected responses are logged:

policy:
  citation-verifier:
    require_sources: true
    require_source_match: true
    min_groundedness: 0.85
  audit-logger:
    immutable: true
    retention_days: 365
    log_all_access: true
pack:
  name: perplexity-example-5
  version: 1.0.0
  enabled: true
policies:
  chain:
  - citation-verifier
  - audit-logger

Cost controls with model tiering

Perplexity's sonar is 3× cheaper than sonar-pro. Use model tiering to route simple factual queries to sonar and complex multi-document research to sonar-pro or sonar-deep-research:

policy:
  cost-guard:
    tiers:
      low:
        max_tokens_per_request: 512
        target: perplexity-sonar
      medium:
        max_tokens_per_request: 2048
        target: perplexity-sonar-pro
      high:
        max_tokens_per_request: 8192
        target: perplexity-deep-research
        requires_role: researcher
providers:
  targets:
  - id: perplexity-sonar
    provider: perplexity:chat:sonar
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-sonar-pro
    provider: perplexity:chat:sonar-pro
    secret_key_ref:
      env: PERPLEXITY_API_KEY
  - id: perplexity-deep-research
    provider: perplexity:chat:sonar-deep-research
    secret_key_ref:
      env: PERPLEXITY_API_KEY
pack:
  name: perplexity-example-6
  version: 1.0.0
  enabled: true
policies:
  chain:
  - cost-guard

Best Practices

Do not send sensitive data to online models — sonar, sonar-pro, sonar-reasoning, and sonar-deep-research send the query to Perplexity's retrieval infrastructure. Use r1-1776 for internal documents, PII-bearing queries, or any workload governed by a ZDR policy.
Set stream_timeout_seconds per model tier — sonar-deep-research can run for 3–5 minutes. Without an adequate timeout, Keeptrusts will terminate the stream prematurely. Set at least 300 seconds for deep-research targets and 60 seconds for sonar-pro.
Enable citation verification for compliance outputs — Research responses used in evidence packages, regulatory filings, or legal briefs should pass through citation-verifier with min_grounded_ratio: 0.85. Log failures for reviewer escalation rather than silently accepting ungrounded answers.
Redact PII before the query reaches the model — Perplexity online models may cite sources that include query terms. A name or email in a prompt could appear in a cited page. Apply pii-detector on the request path to remove identifiers before they become part of the search query.
Use sonar for high-volume applications — sonar costs $1/1M tokens for both input and output — 15× cheaper than sonar-pro on output. For chatbots or assistants that don't require deep citation analysis, sonar is the right default tier.
Log the full response including citations — Perplexity returns source URLs in the message content. Enable include_response: true in audit-logger so your audit trail captures the cited sources, not just the model's answer. This is essential for tracing back AI-generated claims in regulated industries.

For AI systems

Canonical terms: Keeptrusts gateway, Perplexity AI, Perplexity, online models, real-time search, citations, provider target, policy-config.yaml, provider: "perplexity".
Config field names: provider, model, base_url: "https://api.perplexity.ai", secret_key_ref.env: "PERPLEXITY_API_KEY", format: "openai", pricing.
Key behavior: Perplexity returns source URLs in message content — responses include citations that can be verified.
Best next pages: Cohere integration (RAG/citations), OpenAI integration, Policy configuration.

For engineers

Prerequisites: Perplexity API key (PERPLEXITY_API_KEY env var from perplexity.ai), kt CLI installed.
Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"sonar-pro","messages":[{"role":"user","content":"What is the latest news on AI regulation?"}]}'.
Enable include_response: true in audit-logger to capture cited source URLs in the audit trail — essential for tracing AI-generated claims.
Perplexity uses OpenAI-compatible API — standard OpenAI SDKs work without modification.
Online models have variable latency depending on search complexity — set appropriate timeout_seconds.

For leaders

Perplexity's citation-backed responses provide verifiable AI outputs — critical for regulated industries where claims must be traceable.
Real-time search means responses reflect current information — reduces hallucination risk for time-sensitive queries.
Audit trail should capture full responses including citations (include_response: true) for compliance evidence.
Online model latency is less predictable than offline models — set timeout and fallback strategies accordingly.

Next steps

Cohere integration — alternative citation-aware provider with RAG support
OpenAI integration — offline models for deterministic workloads
Provider routing strategies — fallback from online to offline models
Policy configuration — audit-logger and citation-verifier reference
Quickstart — install kt and run your first gateway

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Minimal — single online model​

Full governance config​

Provider Fields​

Supported Models​

Client Examples​

Streaming​

Advanced Configuration​

Routing sensitive queries to the offline model​

Citation verification for compliance workflows​

Cost controls with model tiering​

Best Practices​

For AI systems​

For engineers​

For leaders​

Next steps​