Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Cloudera AI

Cloudera AI Inference provides enterprise LLM hosting on Cloudera Data Platform (CDP) with data residency guarantees, private deployment, and integration with existing Cloudera governance and security controls. Organizations running sensitive workloads that cannot go to public cloud APIs can host models such as Llama 3.3 70B and Mistral within their own CDP environment.

Use this page when

  • You need the exact command, config, API, or integration details for Cloudera AI.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Keeptrusts enforces governance policies on Cloudera-hosted models — prompt-injection detection, PII redaction, content safety filters, and audit logging — without requiring any changes to the Cloudera deployment itself. The gateway sits in front of the Cloudera AI Inference endpoint and applies your policy chain on every request and response.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  1. Cloudera AI Inference endpoint — deploy one or more model endpoints in your CDP environment. Your administrator will provide the full endpoint URL and a suitable access token.
  2. Keeptrusts CLI — install kt (quickstart guide).
  3. Export your Cloudera access token so the gateway can read it at startup:
export CLOUDERA_API_KEY="your-cloudera-access-token"

Unlike cloud-hosted providers, Cloudera does not have a fixed public base URL — every deployment has its own endpoint. The base_url field is required and must point at the root of your Cloudera AI Inference endpoint (e.g. https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b/v1).

Configuration

A complete policy-config.yaml that routes traffic through a Cloudera AI Inference endpoint with prompt-injection, PII, and safety policies:

pack:
name: cloudera-gateway
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- safety-filter
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
safety-filter:
mode: strict
action: block
audit-logger:
retention_days: 365
providers:
strategy: single
targets:
- id: cloudera-llama
provider: cloudera
model: meta/llama-3.3-70b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b/v1
secret_key_ref:
env: CLOUDERA_API_KEY

Start the gateway:

kt gateway run \
--listen 0.0.0.0:41002 \
--policy-config policy-config.yaml

Compact Provider Shorthand

You can encode the model directly in the provider field. The two forms below are equivalent:

# Shorthand — model embedded in the provider string
- id: "cloudera-llama"
provider: "cloudera:chat:meta/llama-3.3-70b-instruct"
base_url: "https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b/v1"

# Explicit — separate provider and model fields
- id: "cloudera-llama"
provider: "cloudera"
model: "meta/llama-3.3-70b-instruct"
base_url: "https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b/v1"

Provider Fields

All fields available on a providers.targets[] entry for Cloudera AI:

FieldTypeDefaultDescription
idstringrequiredUnique identifier for this target. Used in logs, the console dashboard, and routing decisions.
providerstringrequiredProvider ID. Use "cloudera" or the shorthand "cloudera:chat:<model>".
modelstringrequiredModel name as registered in your Cloudera AI deployment, e.g. "meta/llama-3.3-70b-instruct".
base_urlstringrequiredFull URL to your Cloudera AI Inference endpoint root. Must be your organization-specific endpoint.
secret_key_refobjectCLOUDERA_API_KEYObject reference to the environment variable holding the CDP access token.
formatstring"openai"Wire format. Cloudera AI Inference exposes an OpenAI-compatible API.
timeout_secondsinteger60Maximum wall-clock time for non-streaming requests. Larger enterprise models may need higher values.
stream_timeout_secondsintegerinherits timeout_secondsMaximum wall-clock time for streaming requests. Set higher for long completions.
max_context_tokensintegernoneMaximum token budget for the request. When set, the gateway rejects requests that exceed this limit before forwarding upstream.
descriptionstringnoneHuman-readable label shown in the console dashboard and health-check output.
weightfloat1.0Routing weight used by the weighted_round_robin strategy.
health_probeobjectnoneActive health probe configuration. Sub-fields: enabled (bool), interval_seconds (int), timeout_seconds (int).

Supported Models

The models available depend entirely on what your organization has deployed in Cloudera AI. Common deployments include:

ModelNotes
meta/llama-3.3-70b-instructMeta Llama 3.3 70B — strong general-purpose instruction-following
meta/llama-3.1-8b-instructLlama 3.1 8B — fast and efficient for lower-latency workloads
mistral/mistral-7b-instructMistral 7B — compact multilingual model
ibm/granite-13b-instruct-v2IBM Granite — enterprise-focused instruction model

Contact your Cloudera administrator for the exact model IDs and endpoint URLs available in your CDP deployment.

Keeptrusts passes the model field through to the upstream endpoint as-is. Use the exact model identifier string that your Cloudera AI Inference deployment expects.

Client Examples

Once the gateway is running, point your client SDK to http://localhost:8080 instead of your Cloudera endpoint URL. The standard OpenAI SDK works directly — no Cloudera-specific SDK is needed.

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="unused", # auth is handled by Keeptrusts via CLOUDERA_API_KEY
)

response = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful enterprise assistant."},
{"role": "user", "content": "Summarize the key compliance requirements for HIPAA data handling."},
],
temperature=0.3,
max_tokens=512,
)

print(response.choices[0].message.content)

Streaming

Keeptrusts fully supports streaming for Cloudera AI Inference. Set stream: true in your request — the gateway applies policies to each chunk in real time. Enterprise-hosted models may have higher first-token latency, so configure stream_timeout_seconds generously:

pack:
name: cloudera-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: cloudera-streaming
provider: cloudera
model: meta/llama-3.3-70b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b/v1
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

stream = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Draft a data processing agreement summary."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Advanced Configuration

Multi-Endpoint Failover

Route to a backup Cloudera AI endpoint if the primary is unavailable:

pack:
name: cloudera-providers-4
version: 1.0.0
enabled: true
providers:
targets:
- id: cloudera-primary
provider: cloudera:chat:meta/llama-3.3-70b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b-primary/v1
secret_key_ref:
env: CLOUDERA_API_KEY
- id: cloudera-backup
provider: cloudera:chat:meta/llama-3.3-70b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama-3-3-70b-backup/v1
secret_key_ref:
env: CLOUDERA_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Multi-Model Routing

Route different workloads to different Cloudera-hosted models:

pack:
name: cloudera-providers-5
version: 1.0.0
enabled: true
providers:
targets:
- id: cloudera-llama-70b
provider: cloudera:chat:meta/llama-3.3-70b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama70b/v1
secret_key_ref:
env: CLOUDERA_API_KEY
- id: cloudera-llama-8b
provider: cloudera:chat:meta/llama-3.1-8b-instruct
base_url: https://your-domain/namespaces/serving-default/endpoints/llama8b/v1
secret_key_ref:
env: CLOUDERA_API_KEY
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Best Practices

  • Specify the full base_url — Cloudera AI Inference endpoints are organization-specific. Always provide the complete endpoint URL including namespace, serving environment, and endpoint name. Do not rely on Keeptrusts to synthesize paths.
  • Use environment variables for credentials — set secret_key_ref to a variable name and keep CDP access tokens out of policy-config.yaml. Rotate tokens via environment updates without redeploying the gateway.
  • Set higher timeouts for large models — enterprise-hosted 70B models may have 10–30 second first-token latencies under load. Set timeout_seconds: 120 and stream_timeout_seconds: 300 to avoid premature client disconnects.
  • Enable health_probe — on-premise infrastructure can have scheduled maintenance windows; the health probe ensures Keeptrusts detects unavailability promptly and activates fallback routing.
  • Combine with PII redaction — a primary use case for Cloudera is data residency. Pair the pii-detector policy with Cloudera to ensure sensitive fields are redacted before they leave your application layer, providing defense in depth even within your private network.

For AI systems

  • Canonical terms: Keeptrusts gateway, Cloudera AI, Cloudera Machine Learning (CML), on-premise, private cloud, provider target, policy-config.yaml.
  • Config field names: provider, model, base_url, secret_key_ref.env, format: "openai", health_probe, timeout_seconds.
  • Key behavior: Keeptrusts routes to Cloudera's OpenAI-compatible inference endpoints within private infrastructure.
  • Best next pages: vLLM integration (alternative self-hosted), Ollama integration, Policy configuration.

For engineers

  • Prerequisites: Cloudera ML workspace with deployed model endpoint, API credentials, kt CLI installed, network access to the CML endpoint.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"your-cml-model","messages":[{"role":"user","content":"hello"}]}'.
  • Enable health_probe to detect scheduled maintenance windows and trigger fallback routing automatically.
  • Combine with pii-detector policy for defense-in-depth — redact sensitive data before it reaches the model even within your private network.

For leaders

  • Cloudera deployments keep all data within your private network — no prompts or completions leave the organization's infrastructure.
  • On-premise maintenance windows require health-probe-driven fallback routing to maintain availability SLAs.
  • Keeptrusts audit logging provides compliance evidence even for fully on-premise inference that has no vendor-side audit trail.
  • Pair PII redaction with Cloudera for layered data protection that satisfies data residency regulations.

Next steps