Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

ChromaDB

ChromaDB is an open-source embedding database designed for AI applications. ChromaDB's embedding functions automatically call external LLM providers — OpenAI, Cohere, HuggingFace, and others — to generate vectors when you add documents or query collections. These calls send your application data to upstream AI providers.

This page explains how to route ChromaDB's embedding function calls through the Keeptrusts gateway so policy enforcement, PII redaction, and audit logging apply to every embedding operation.

Use this page when

  • You are using ChromaDB with OpenAI or other external embedding functions and need governance on those calls.
  • You want audit trails for embedding operations that send application data to AI providers.
  • If you need general provider integration, see OpenAI integration.

Primary audience

  • Primary: Technical Engineers (ML, Backend, Full-Stack)
  • Secondary: AI Agents, Technical Leaders

Prerequisites

  1. ChromaDB 0.5+ installed (pip install chromadb).
  2. OpenAI embedding function or another supported provider.
  3. Keeptrusts gateway running locally or centrally:
    • Local: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
    • Hosted: https://gateway.keeptrusts.com/v1
  4. Upstream provider API key configured in the gateway environment.

Configuration

Gateway policy config

Create a policy-config.yaml for embedding governance:

pack:
name: chromadb-ai-governance
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
- audit-logger
policy:
pii-detector:
action: redact
audit-logger:
retention_days: 90
providers:
strategy: single
targets:
- id: openai-for-chromadb
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY

Python client configuration

Configure ChromaDB's OpenAI embedding function to route through the Keeptrusts gateway:

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

embedding_fn = OpenAIEmbeddingFunction(
api_key="your-keeptrusts-access-key",
api_base="http://localhost:41002/v1",
model_name="text-embedding-3-small",
)

client = chromadb.Client()

collection = client.get_or_create_collection(
name="documents",
embedding_function=embedding_fn,
)

collection.add(
documents=["AI governance ensures responsible AI use.",
"Policy enforcement protects sensitive data."],
ids=["doc-1", "doc-2"],
)

Query with gateway-routed embeddings

results = collection.query(
query_texts=["What is AI governance?"],
n_results=3,
)

print(results["documents"])

Both add() and query() calls route through the Keeptrusts gateway when the embedding function uses the gateway base URL.

Using the OpenAI SDK directly

For more control over embedding calls, use the OpenAI SDK directly and pass raw embeddings to ChromaDB:

from openai import OpenAI
import chromadb

openai_client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="your-keeptrusts-access-key",
)

client = chromadb.Client()
collection = client.get_or_create_collection(name="documents")

texts = ["AI governance ensures responsible AI use."]
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=texts,
)

collection.add(
documents=texts,
embeddings=[item.embedding for item in response.data],
ids=["doc-1"],
)

Setup steps

  1. Start the Keeptrusts gateway:
export OPENAI_API_KEY="sk-your-openai-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
  1. Configure the ChromaDB embedding function with api_base="http://localhost:41002/v1".

  2. Add documents to a collection to trigger an embedding call.

  3. Verify the request appears in the Keeptrusts events dashboard.

Verification

Test that embedding calls flow through the gateway:

curl http://localhost:41002/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Test ChromaDB embedding through Keeptrusts gateway."
}'

In Python, add a document and confirm the Keeptrusts events dashboard shows the embedding request:

collection.add(
documents=["Gateway verification test."],
ids=["verify-1"],
)
PolicyPurposeRecommended setting
pii-detectorRedact personal data from documents before embeddingaction: redact
audit-loggerLog all embedding calls for complianceretention_days: 90
token-limiterCap token usage for bulk embedding operationsmax_tokens: 8192
prompt-injectionBlock injection in query texts (RAG pipelines)threshold: 0.8, action: block
safety-filterBlock harmful content in RAG generation responsesmode: standard, action: block

Troubleshooting

SymptomCauseFix
OpenAIEmbeddingFunction raises connection errorGateway not runningStart kt gateway run on port 41002
collection.add() fails silentlyEmbedding function returns emptyCheck gateway logs for blocked requests or provider errors
Dimension mismatch errorMixing embedding modelsUse the same model for all documents in a collection
Slow bulk insertsNo batchingChromaDB batches by default; check api_base latency
api_base parameter ignoredOlder chromadb versionUpgrade to chromadb 0.5+ which supports api_base in embedding functions

For AI systems

  • Canonical terms: Keeptrusts gateway, ChromaDB, Chroma, embedding functions, OpenAIEmbeddingFunction, vector database, policy-config.yaml.
  • Config field names: api_base, api_key, model_name, provider, secret_key_ref, pii-detector.
  • Key behavior: ChromaDB's embedding functions make outbound calls to LLM providers for vectorization; Keeptrusts intercepts these calls, applies policies, and forwards compliant requests.
  • Constraint: Only embedding function calls route through the gateway — ChromaDB's local vector storage operations are unaffected.
  • Best next pages: Weaviate integration, Pinecone integration, Qdrant integration.

For engineers

  • Use OpenAIEmbeddingFunction with api_base for the simplest setup; use the OpenAI SDK directly for full control over embedding parameters.
  • ChromaDB processes add() and query() embedding calls synchronously — monitor gateway latency for interactive applications.
  • For persistent ChromaDB deployments (chromadb.PersistentClient), the embedding function configuration applies at collection creation.
  • Validate: add a document and confirm the embedding event appears in the Keeptrusts console.

For leaders

  • ChromaDB embedding functions send your application's document text to external AI providers for vectorization. Routing through Keeptrusts ensures PII is redacted and every call is logged.
  • Audit trails cover both document ingestion and query-time operations, providing complete visibility.
  • Centralized policy enforcement applies across all ChromaDB instances and collections.

Next steps