Databricks
Keeptrusts integrates with Databricks Model Serving Foundation Model APIs, giving you a policy enforcement layer over Llama, DBRX, and Mixtral models running inside your Databricks workspace. Because Databricks Foundation Models store no customer data by default, this integration is well-suited for regulated workloads that require full audit trails without sacrificing zero-retention guarantees.
Use this page when
- You need the exact command, config, API, or integration details for Databricks.
- You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
- If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.
Primary audience
- Primary: AI Agents, Technical Engineers
- Secondary: Technical Leaders
Prerequisites
- A Databricks workspace (AWS, Azure, or GCP) with Model Serving enabled
- A Databricks personal access token (PAT) with
CAN_USEpermission on the served model endpoints ktCLI installed and authenticated (kt auth login)
Set your token before starting the gateway:
export DATABRICKS_TOKEN="dapi..."
Configuration
Minimal — single Foundation Model endpoint
pack:
name: databricks-providers-1
version: 1.0.0
enabled: true
providers:
targets:
- id: databricks-llama
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Full governance config
pack:
name: databricks-enterprise
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- dlp-filter
- rbac
- audit-logger
policy:
rbac:
roles:
data-engineer:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
- databricks-dbrx-instruct
max_tokens_per_request: 4096
data-scientist:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
- databricks-meta-llama-3-1-405b-instruct
- databricks-dbrx-instruct
- databricks-mixtral-8x7b-instruct
max_tokens_per_request: 8192
analyst:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
max_tokens_per_request: 2048
dlp-filter:
patterns:
- name: databricks-pat
regex: dapi[a-f0-9]{32}
action: block
- name: jdbc-connection-string
regex: jdbc:databricks://[^\s]+
action: redact
- name: unity-catalog-path
regex: catalog\.schema\.table
action: redact
pii-detector:
action: redact
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-llama-405b
provider: databricks:chat:databricks-meta-llama-3-1-405b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-dbrx
provider: databricks:chat:databricks-dbrx-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
- id: databricks-embeddings
provider: databricks
model: databricks-bge-large-en
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
Provider Fields
| Field | Required | Description |
|---|---|---|
provider | Yes | "databricks" or "databricks:chat:{model-endpoint-name}" |
base_url | Yes | Your workspace serving endpoint URL: https://{workspace}.azuredatabricks.net/serving-endpoints |
secret_key_ref | Yes | Environment variable holding the Databricks PAT (e.g. DATABRICKS_TOKEN) |
model | No | Endpoint name when using the bare "databricks" provider ID |
format | No | "openai" (default for Foundation Model APIs) |
data_policy.zero_data_retention | No | true — Databricks Foundation Models do not store request/response data |
Supported Models
Models are served through Databricks Model Serving and billed per-token via Databricks Foundation Model APIs.
| Model Endpoint | Context Window | Input (per 1M) | Output (per 1M) | Notes |
|---|---|---|---|---|
databricks-meta-llama-3-3-70b-instruct | 128k | $0.54 | $0.54 | Best price/performance; recommended default |
databricks-meta-llama-3-1-405b-instruct | 128k | $5.00 | $15.00 | Highest capability open-weight model |
databricks-dbrx-instruct | 32k | $0.75 | $2.25 | Databricks flagship MoE |
databricks-mixtral-8x7b-instruct | 32k | $0.50 | $1.00 | Fast MoE; cost-efficient for high volume |
databricks-bge-large-en | 512 tokens | $0.10 | — | Embeddings only; 1024-dim vectors |
Pricing reflects Databricks published rates. Actual charges depend on your workspace agreement and DBU pricing tier.
Client Examples
Start the gateway:
export DATABRICKS_TOKEN="dapi..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
- Python
- Node.js
- cURL
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="unused", # auth handled by Keeptrusts
)
# Chat completion
response = client.chat.completions.create(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[
{"role": "system", "content": "You are a data engineering expert."},
{"role": "user", "content": "Write a PySpark query to compute 7-day rolling average sales by region."},
],
max_tokens=1024,
temperature=0.2,
)
print(response.choices[0].message.content)
# Embeddings
embedding = client.embeddings.create(
model="databricks-bge-large-en",
input="quarterly revenue by product line",
)
print(f"Vector dimensions: {len(embedding.data[0].embedding)}")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:41002/v1",
apiKey: "unused",
});
// Chat completion
const response = await client.chat.completions.create({
model: "databricks-meta-llama-3-3-70b-instruct",
messages: [
{ role: "system", content: "You are a data engineering expert." },
{
role: "user",
content: "Write a PySpark query to compute 7-day rolling average sales by region.",
},
],
max_tokens: 1024,
temperature: 0.2,
});
console.log(response.choices[0].message.content);
// Embeddings
const embedding = await client.embeddings.create({
model: "databricks-bge-large-en",
input: "quarterly revenue by product line",
});
console.log("Vector dimensions:", embedding.data[0].embedding.length);
# Chat completion
curl -s http://localhost:41002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "databricks-meta-llama-3-3-70b-instruct",
"messages": [
{"role": "system", "content": "You are a data engineering expert."},
{"role": "user", "content": "Write a PySpark query for 7-day rolling average sales."}
],
"max_tokens": 1024,
"temperature": 0.2
}' | jq .choices[0].message.content
# Embeddings
curl -s http://localhost:41002/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "databricks-bge-large-en",
"input": "quarterly revenue by product line"
}' | jq .data[0].embedding[:5]
Streaming
Databricks Foundation Model APIs support server-sent event (SSE) streaming. Keeptrusts passes streams through transparently after policy checks on the initial request.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:41002/v1", api_key="unused")
with client.chat.completions.stream(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[{"role": "user", "content": "Explain Delta Lake ACID transactions step by step."}],
max_tokens=2048,
) as stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
In your policy config, set a reasonable stream timeout:
pack:
name: databricks-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Advanced Configuration
Unity Catalog integration and DLP
When Databricks AI runs inside a Unity Catalog-governed workspace, sensitive table names, column names, and schema paths may leak into prompts. Add DLP patterns that match your catalog structure:
policy:
dlp-filter:
detect_patterns:
- '[a-z_]+\.[a-z_]+\.[a-z_]+'
- dbutils\.secrets\.get\([^)]+\)
- 0[0-9]{3}-[0-9]{6}-[a-z0-9]{8}
action: block
pack:
name: databricks-example-4
version: 1.0.0
enabled: true
policies:
chain:
- dlp-filter
RBAC for multi-team workspaces
Large Databricks deployments typically serve multiple teams with different model budgets. Map workspace groups to Keeptrusts roles and limit each role to cost-appropriate endpoints:
policy:
rbac:
roles:
ai-platform:
allowed_models:
- databricks-meta-llama-3-1-405b-instruct
- databricks-meta-llama-3-3-70b-instruct
- databricks-dbrx-instruct
- databricks-mixtral-8x7b-instruct
max_tokens_per_request: 16384
application-team:
allowed_models:
- databricks-meta-llama-3-3-70b-instruct
max_tokens_per_request: 4096
read-only:
allowed_models: []
action: block
pack:
name: databricks-example-5
version: 1.0.0
enabled: true
policies:
chain:
- rbac
Zero-data-retention audit trail
Databricks Foundation Models store no customer data server-side. Keeptrusts's audit logger captures a local event record for every request so you maintain a compliance trail without relying on provider storage:
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
providers:
targets:
- id: databricks-llama-70b
provider: databricks:chat:databricks-meta-llama-3-3-70b-instruct
base_url: https://{workspace}.azuredatabricks.net/serving-endpoints
secret_key_ref:
env: DATABRICKS_TOKEN
pack:
name: databricks-example-6
version: 1.0.0
enabled: true
policies:
chain:
- audit-logger
Best Practices
-
Pin to the workspace region closest to your data — Databricks serving endpoints are regional. Use a
base_urlthat matches your primary data region to minimise latency and avoid cross-region data movement. -
Rotate PATs on a schedule — Databricks PATs do not expire by default. Set a 90-day rotation policy, use
DATABRICKS_TOKENin a secrets manager (Vault, AWS Secrets Manager), and inject at runtime rather than baking into config files. -
Use
databricks-meta-llama-3-3-70b-instructas the default tier — It offers the best cost/quality ratio for most enterprise tasks. Reserve the 405B endpoint for tasks that demonstrably require it and protect it with a role that requires explicit elevation. -
Block raw Databricks credentials in prompts — Add DLP patterns for PAT prefixes (
dapi), JDBC connection strings, and secret scope references. A leaked token in a prompt can expose your entire workspace. -
Enable zero-data-retention flags in both Keeptrusts and Databricks — Set
data_policy.zero_data_retention: truein Keeptrusts and confirm your Databricks workspace has Foundation Model API data retention disabled. Document this in your compliance evidence package. -
Test fallback with
databricks-mixtral-8x7b-instruct— Use the Keeptrusts routing policy to fall back to Mixtral when the 70B endpoint times out under load. Mixtral is significantly cheaper and can handle most non-critical requests without quality loss.
For AI systems
- Canonical terms: Keeptrusts gateway, Databricks, Databricks Model Serving, Mosaic ML, Unity Catalog, provider target, policy-config.yaml,
provider: "databricks". - Config field names:
provider,model,base_url,secret_key_ref.env: "DATABRICKS_TOKEN",format: "openai",timeout_seconds,health_probe. - Key behavior: Keeptrusts routes to Databricks Model Serving endpoints using PAT or OAuth token auth with OpenAI-compatible format.
- Best next pages: AWS Bedrock integration, Together AI integration, Provider routing.
For engineers
- Prerequisites: Databricks workspace with Model Serving endpoint deployed, Personal Access Token (
DATABRICKS_TOKEN),ktCLI installed. - Start command:
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml. - Validate:
curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"databricks-meta-llama-3-1-70b-instruct","messages":[{"role":"user","content":"hello"}]}'. - Base URL follows Databricks workspace pattern:
https://<workspace>.databricks.com/serving-endpoints. - Use
fallbackstrategy with Mixtral 8x7B as a cost-effective fallback for when 70B endpoints time out under load.
For leaders
- Databricks Model Serving integrates with Unity Catalog governance — Keeptrusts adds policy enforcement on the request path that Unity Catalog does not cover.
- Data stays within your Databricks workspace — no prompts leave your cloud account, satisfying data residency requirements.
- Fallback from expensive 70B models to Mixtral 8x7B provides cost control without complete service degradation.
- Keeptrusts audit logging complements Databricks system tables for end-to-end observability of AI workloads.
Next steps
- AWS Bedrock integration — alternative managed model hosting on AWS
- Together AI integration — hosted open models with broader selection
- Provider routing strategies — fallback and load-based routing
- Policy configuration — prompt-injection and audit-logger reference
- Quickstart — install
ktand run your first gateway