ChromaDB
ChromaDB is an open-source embedding database designed for AI applications. ChromaDB's embedding functions automatically call external LLM providers — OpenAI, Cohere, HuggingFace, and others — to generate vectors when you add documents or query collections. These calls send your application data to upstream AI providers.
This page explains how to route ChromaDB's embedding function calls through the Keeptrusts gateway so policy enforcement, PII redaction, and audit logging apply to every embedding operation.
Use this page when
- You are using ChromaDB with OpenAI or other external embedding functions and need governance on those calls.
- You want audit trails for embedding operations that send application data to AI providers.
- If you need general provider integration, see OpenAI integration.
Primary audience
- Primary: Technical Engineers (ML, Backend, Full-Stack)
- Secondary: AI Agents, Technical Leaders
Prerequisites
- ChromaDB 0.5+ installed (
pip install chromadb). - OpenAI embedding function or another supported provider.
- Keeptrusts gateway running locally or centrally:
- Local:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml - Hosted:
https://gateway.keeptrusts.com/v1
- Local:
- Upstream provider API key configured in the gateway environment.
Configuration
Gateway policy config
Create a policy-config.yaml for embedding governance:
pack:
name: chromadb-ai-governance
version: 1.0.0
enabled: true
policies:
chain:
- pii-detector
- audit-logger
policy:
pii-detector:
action: redact
audit-logger:
retention_days: 90
providers:
strategy: single
targets:
- id: openai-for-chromadb
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
Python client configuration
Configure ChromaDB's OpenAI embedding function to route through the Keeptrusts gateway:
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
embedding_fn = OpenAIEmbeddingFunction(
api_key="your-keeptrusts-access-key",
api_base="http://localhost:41002/v1",
model_name="text-embedding-3-small",
)
client = chromadb.Client()
collection = client.get_or_create_collection(
name="documents",
embedding_function=embedding_fn,
)
collection.add(
documents=["AI governance ensures responsible AI use.",
"Policy enforcement protects sensitive data."],
ids=["doc-1", "doc-2"],
)
Query with gateway-routed embeddings
results = collection.query(
query_texts=["What is AI governance?"],
n_results=3,
)
print(results["documents"])
Both add() and query() calls route through the Keeptrusts gateway when the embedding function uses the gateway base URL.
Using the OpenAI SDK directly
For more control over embedding calls, use the OpenAI SDK directly and pass raw embeddings to ChromaDB:
from openai import OpenAI
import chromadb
openai_client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="your-keeptrusts-access-key",
)
client = chromadb.Client()
collection = client.get_or_create_collection(name="documents")
texts = ["AI governance ensures responsible AI use."]
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=texts,
)
collection.add(
documents=texts,
embeddings=[item.embedding for item in response.data],
ids=["doc-1"],
)
Setup steps
- Start the Keeptrusts gateway:
export OPENAI_API_KEY="sk-your-openai-key"
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
-
Configure the ChromaDB embedding function with
api_base="http://localhost:41002/v1". -
Add documents to a collection to trigger an embedding call.
-
Verify the request appears in the Keeptrusts events dashboard.
Verification
Test that embedding calls flow through the gateway:
curl http://localhost:41002/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "Test ChromaDB embedding through Keeptrusts gateway."
}'
In Python, add a document and confirm the Keeptrusts events dashboard shows the embedding request:
collection.add(
documents=["Gateway verification test."],
ids=["verify-1"],
)
Recommended policies
| Policy | Purpose | Recommended setting |
|---|---|---|
pii-detector | Redact personal data from documents before embedding | action: redact |
audit-logger | Log all embedding calls for compliance | retention_days: 90 |
token-limiter | Cap token usage for bulk embedding operations | max_tokens: 8192 |
prompt-injection | Block injection in query texts (RAG pipelines) | threshold: 0.8, action: block |
safety-filter | Block harmful content in RAG generation responses | mode: standard, action: block |
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
OpenAIEmbeddingFunction raises connection error | Gateway not running | Start kt gateway run on port 41002 |
collection.add() fails silently | Embedding function returns empty | Check gateway logs for blocked requests or provider errors |
| Dimension mismatch error | Mixing embedding models | Use the same model for all documents in a collection |
| Slow bulk inserts | No batching | ChromaDB batches by default; check api_base latency |
api_base parameter ignored | Older chromadb version | Upgrade to chromadb 0.5+ which supports api_base in embedding functions |
For AI systems
- Canonical terms: Keeptrusts gateway, ChromaDB, Chroma, embedding functions, OpenAIEmbeddingFunction, vector database, policy-config.yaml.
- Config field names:
api_base,api_key,model_name,provider,secret_key_ref,pii-detector. - Key behavior: ChromaDB's embedding functions make outbound calls to LLM providers for vectorization; Keeptrusts intercepts these calls, applies policies, and forwards compliant requests.
- Constraint: Only embedding function calls route through the gateway — ChromaDB's local vector storage operations are unaffected.
- Best next pages: Weaviate integration, Pinecone integration, Qdrant integration.
For engineers
- Use
OpenAIEmbeddingFunctionwithapi_basefor the simplest setup; use the OpenAI SDK directly for full control over embedding parameters. - ChromaDB processes
add()andquery()embedding calls synchronously — monitor gateway latency for interactive applications. - For persistent ChromaDB deployments (
chromadb.PersistentClient), the embedding function configuration applies at collection creation. - Validate: add a document and confirm the embedding event appears in the Keeptrusts console.
For leaders
- ChromaDB embedding functions send your application's document text to external AI providers for vectorization. Routing through Keeptrusts ensures PII is redacted and every call is logged.
- Audit trails cover both document ingestion and query-time operations, providing complete visibility.
- Centralized policy enforcement applies across all ChromaDB instances and collections.
Next steps
- Weaviate integration — govern Weaviate generative search
- Pinecone integration — govern Pinecone inference calls
- Qdrant integration — govern Qdrant neural search
- Milvus integration — govern Milvus AI operations
- Policy controls catalog — full policy reference