Qdrant

Qdrant is a high-performance vector database built for neural search and AI applications. While Qdrant itself stores and indexes vectors, the embedding generation that feeds Qdrant typically uses external LLM providers — OpenAI, Cohere, or self-hosted models. RAG (Retrieval-Augmented Generation) pipelines built on Qdrant also make generation calls to LLM providers at query time.

This page explains how to route the embedding and generation calls associated with Qdrant workflows through the Keeptrusts gateway so policy enforcement, PII redaction, and audit logging apply to every AI operation.

Use this page when

You are building RAG pipelines with Qdrant and need governance on the embedding and generation calls.
You want audit trails for AI operations that process application data through external LLM providers.
If you need general provider integration, see OpenAI integration.

Primary audience

Primary: Technical Engineers (ML, Backend, Platform)
Secondary: AI Agents, Technical Leaders

Prerequisites

Qdrant 1.9+ running (Docker, Kubernetes, or Qdrant Cloud).
Embedding provider (e.g., OpenAI) for generating vectors.
Keeptrusts gateway running locally or centrally:
- Local: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
- Hosted: https://gateway.keeptrusts.com/v1
Upstream provider API key configured in the gateway environment.

Configuration

Gateway policy config

Create a policy-config.yaml for vector search AI governance:

pack:
  name: qdrant-ai-governance
  version: 1.0.0
  enabled: true
policies:
  chain:
  - pii-detector
  - prompt-injection
  - audit-logger
policy:
  pii-detector:
    action: redact
  prompt-injection:
    threshold: 0.8
    action: block
  audit-logger:
    retention_days: 90
providers:
  strategy: single
  targets:
  - id: openai-for-qdrant
    provider: openai:chat:gpt-4o
    secret_key_ref:
      env: OPENAI_API_KEY

Python client configuration

Route embedding calls through the Keeptrusts gateway when building Qdrant pipelines:

from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance

openai_client = OpenAI(
    base_url="http://localhost:41002/v1",
    api_key="your-keeptrusts-access-key",
)

qdrant = QdrantClient(host="localhost", port=6333)

qdrant.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

def embed_and_upsert(texts, ids):
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )
    points = [
        PointStruct(id=id, vector=item.embedding, payload={"text": text})
        for id, item, text in zip(ids, response.data, texts)
    ]
    qdrant.upsert(collection_name="documents", points=points)

embed_and_upsert(
    texts=["AI governance ensures responsible AI use."],
    ids=[1],
)

RAG query with gateway-routed generation

def rag_query(question):
    query_embedding = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=question,
    ).data[0].embedding

    results = qdrant.query_points(
        collection_name="documents",
        query=query_embedding,
        limit=5,
    )

    context = "\n".join(
        point.payload["text"] for point in results.points
    )

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer using this context:\n{context}"},
            {"role": "user", "content": question},
        ],
    )
    return response.choices[0].message.content

answer = rag_query("What is AI governance?")
print(answer)

Both the embedding call and the generation call route through the Keeptrusts gateway, giving you full audit coverage of the RAG pipeline.

Setup steps

Start the Keeptrusts gateway:

export OPENAI_API_KEY="sk-your-openai-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml

Configure your OpenAI client to use http://localhost:41002/v1 as the base URL.
Run an embedding or RAG query to verify traffic flows through the gateway.
Qdrant vector operations (upsert, search, delete) go directly to Qdrant — only the embedding and generation calls route through the gateway.

Verification

Test that embedding calls flow through the gateway:

curl http://localhost:41002/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "Test embedding through Keeptrusts gateway."
  }'

Confirm the request appears in the Keeptrusts events dashboard with policy decisions applied.

Recommended policies

Policy	Purpose	Recommended setting
`pii-detector`	Redact personal data from text before embedding or generation	`action: redact`
`prompt-injection`	Block injection in RAG query prompts	`threshold: 0.8`, `action: block`
`audit-logger`	Log all embedding and generation calls	`retention_days: 90`
`token-limiter`	Cap token usage for generation responses	`max_tokens: 2048`
`safety-filter`	Block harmful content in RAG-generated responses	`mode: standard`, `action: block`

Troubleshooting

Symptom	Cause	Fix
Embedding calls return connection error	Gateway not running	Start `kt gateway run` on port 41002
Vector dimension mismatch	Wrong embedding model	`text-embedding-3-small` = 1536 dimensions; match your Qdrant collection config
RAG responses lack context	Search returns no results	Verify data was indexed and query embedding uses the same model
PII redaction changes embedding output	Redacted text produces different vectors	Redact at the application layer before embedding if vector consistency matters

For AI systems

Canonical terms: Keeptrusts gateway, Qdrant, vector database, neural search, embeddings, RAG, policy-config.yaml.
Config field names: base_url, api_key, provider, secret_key_ref, pii-detector, audit-logger.
Key behavior: Qdrant stores and queries vectors; embedding generation and RAG completion use external LLM providers. Keeptrusts intercepts those LLM calls, applies policies, and forwards compliant requests.
Constraint: Qdrant vector operations go directly to Qdrant — only embedding and generation calls route through the gateway.
Best next pages: Weaviate integration, Pinecone integration, ChromaDB integration.

For engineers

Route both embedding and generation calls through the gateway for full RAG audit coverage.
Use the same embedding model for indexing and querying to ensure vector compatibility.
For Docker Compose deployments, use Docker network hostnames for inter-service communication.
Validate: run an embedding call and confirm the event in the Keeptrusts console.

For leaders

RAG pipelines built on Qdrant send your organization's data to external LLM providers during both indexing and query time. Routing through Keeptrusts ensures PII is redacted and every call is logged.
Complete audit trails cover the full data flow from document ingestion through query-time generation.
Centralized policy enforcement applies consistently across all teams and applications using Qdrant.

Next steps

Weaviate integration — govern Weaviate generative search
Pinecone integration — govern Pinecone inference calls
ChromaDB integration — govern ChromaDB embedding calls
Milvus integration — govern Milvus AI operations
Policy controls catalog — full policy reference

Use this page when​

Primary audience​

Prerequisites​

Configuration​

Gateway policy config​

Python client configuration​

RAG query with gateway-routed generation​

Setup steps​

Verification​

Recommended policies​

Troubleshooting​

For AI systems​

For engineers​

For leaders​

Next steps​