Weights & Biases (W&B)
Keeptrusts integrates with Weights & Biases (W&B) by combining W&B's experiment tracking and model observability with Keeptrusts gateway governance events. When your ML pipeline makes LLM calls through the Keeptrusts gateway, you log both the W&B experiment metrics and the Keeptrusts policy decisions in a single workflow — giving your team unified visibility into model performance and compliance posture.
Use this page when
- You are combining W&B experiment tracking with Keeptrusts governance events.
- You need the gateway config and Python integration pattern for dual logging.
- You want governance metadata (policy decisions, blocked requests) alongside W&B metrics.
- If you want a general quickstart instead, see Quickstart.
Primary audience
- Primary: Technical Engineers
- Secondary: AI Agents, Technical Leaders
Prerequisites
- A W&B account (free tier or enterprise)
- W&B Python SDK installed (
pip install wandb) - Keeptrusts CLI (
kt) installed and authenticated (kt auth login) - An upstream LLM provider key exported as an environment variable
- The
openaiPython SDK installed (pip install openai)
Configuration
Gateway policy config
pack:
name: wandb-governed-experiments
version: 1.0.0
enabled: true
providers:
targets:
- id: experiment-llm
provider: openai:chat:gpt-4o
secret_key_ref:
env: OPENAI_API_KEY
policies:
chain:
- prompt-injection
- pii-detector
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
entities:
- PERSON
- EMAIL_ADDRESS
audit-logger:
immutable: true
retention_days: 365
log_all_access: true
Start the gateway
export OPENAI_API_KEY="sk-..."
kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml
Setup steps
1. Initialize W&B and the OpenAI client
import wandb
from openai import OpenAI
wandb.init(project="keeptrusts-governed-llm", config={
"gateway": "http://localhost:41002/v1",
"model": "gpt-4o",
"policies": ["prompt-injection", "pii-detector", "audit-logger"]
})
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="unused",
)
2. Log governed LLM calls to W&B
import time
prompts = [
"Summarize the Q4 earnings report.",
"Draft a customer outreach email.",
"Explain our refund policy in simple terms.",
]
for i, prompt in enumerate(prompts):
start = time.time()
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=512,
temperature=0.3,
)
latency = time.time() - start
output = response.choices[0].message.content
tokens_used = response.usage.total_tokens
wandb.log({
"prompt_index": i,
"latency_seconds": latency,
"tokens_used": tokens_used,
"status": "allowed",
"output_length": len(output),
})
except Exception as e:
latency = time.time() - start
wandb.log({
"prompt_index": i,
"latency_seconds": latency,
"status": "blocked",
"error": str(e),
})
wandb.finish()
3. Create a W&B dashboard for governance metrics
In the W&B UI, create a dashboard that tracks:
- Allowed vs. blocked requests — bar chart on
status - Latency distribution — histogram on
latency_seconds - Token usage over time — line chart on
tokens_used
This gives your ML team a single pane for both experiment quality and policy compliance.
Verification
# Verify gateway is healthy
curl http://localhost:41002/health
# Run the Python script above
python governed_experiment.py
# Check W&B dashboard for logged metrics
# https://wandb.ai/<your-org>/<project>/runs
# Check Keeptrusts audit log
kt events list --limit 10
Recommended policies
| Policy | Purpose | Recommended setting |
|---|---|---|
pii-detector | Redact personal data from experiment prompts | action: redact, entities: PERSON, EMAIL |
prompt-injection | Block adversarial inputs in experiment datasets | threshold: 0.8, action: block |
audit-logger | Compliance trail for all experiment LLM calls | retention_days: 365, immutable: true |
rbac | Restrict model access by team or experiment tier | Map W&B teams to Keeptrusts roles |
cost-tracker | Monitor token spend per experiment run | Track tokens_used alongside W&B cost metrics |
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
W&B logs show status: blocked for all requests | Policy chain too restrictive | Test with audit-logger only, then add policies incrementally |
Missing tokens_used in W&B | Response does not include usage object | Ensure stream: false (usage is not returned in streamed responses) |
wandb.init() hangs | W&B API key not set | Run wandb login or export WANDB_API_KEY |
| Gateway returns 502 | Upstream provider unreachable | Check OPENAI_API_KEY and provider connectivity |
| Latency spikes in W&B charts | Policy evaluation overhead | Move pii-detector to async mode if supported; ensure gateway is co-located |
For AI systems
- Canonical terms: Keeptrusts gateway, Weights & Biases, W&B, experiment tracking, MLOps observability, governance events,
policy-config.yaml. - Config field names:
provider,secret_key_ref.env,pii-detector,audit-logger. - Key behavior: W&B logs experiment metrics while Keeptrusts enforces policies on the same LLM calls. Both systems see every request — W&B for observability, Keeptrusts for governance.
- Best next pages: MLflow integration, LangSmith integration, Policy controls catalog.
For engineers
Prerequisites
- W&B account,
wandbandopenaiPython SDKs,ktCLI installed.
Validation
- Run the experiment script and verify metrics appear in W&B.
- Run
kt events list --limit 10and verify all experiment requests are logged. - Confirm that PII in prompts appears redacted in Keeptrusts audit logs but experiment metrics still flow to W&B.
For leaders
- W&B tracks model quality; Keeptrusts tracks compliance. Together they give ML leadership a unified view of whether experiments are both performing well and following governance rules.
- Audit trails in Keeptrusts prove that every experiment LLM call was policy-checked, which satisfies internal AI governance committee requirements.
- Blocked-request metrics in W&B help identify which datasets or prompt templates trigger policy violations, enabling proactive cleanup before production deployment.
Next steps
- MLflow integration — alternative experiment tracking with model serving
- LangSmith integration — LLM-specific observability alongside Keeptrusts
- Policy controls catalog — full reference for all policy types
- Quickstart — install
ktand run your first gateway