LangChain + Keeptrusts: Governed RAG Pipelines
LangChain's ChatOpenAI class accepts a custom openai_api_base, which means every chain, agent, and RAG pipeline can route through the Keeptrusts gateway with a one-line change.
Use this page when
- You are routing LangChain chains, agents, or RAG pipelines through the Keeptrusts gateway.
- You need to configure
ChatOpenAIwith a customopenai_api_basefor policy enforcement. - You want to apply tool-level policies to LangChain agent tool calls.
- You are handling 409 policy blocks within LangChain chain invocations.
Primary audience
- Primary: Python developers building LangChain RAG or agent applications
- Secondary: AI Engineers validating governance on multi-step reasoning chains, MLOps Engineers testing pipelines
Basic Client Configuration
Python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
temperature=0,
)
response = llm.invoke("What is AI governance?")
print(response.content)
Environment-Driven Setup
import os
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model=os.environ.get("LLM_MODEL", "gpt-4o"),
openai_api_base=os.environ.get("LLM_GATEWAY_URL", "http://localhost:41002/v1"),
openai_api_key=os.environ["OPENAI_API_KEY"],
temperature=0,
request_timeout=30,
)
Governed RAG Pipeline
A standard retrieval-augmented generation pipeline flows through the gateway. Every call — the embedding lookup and the generation step — is policy-evaluated.
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Embeddings also route through the gateway
embeddings = OpenAIEmbeddings(
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
)
# Build a vector store from documents
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = splitter.create_documents(["Your compliance policy documents here..."])
vectorstore = FAISS.from_documents(docs, embeddings)
# LLM routed through the governed gateway
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
temperature=0,
)
# Build the RAG chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
)
result = qa_chain.invoke({"query": "What are the data retention requirements?"})
print(result["result"])
Every generation call in this chain passes through the gateway's policy engine — PII filters, prompt injection detection, and content policies all apply.
Streaming with LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
streaming=True,
)
for chunk in llm.stream("Explain the principle of least privilege for AI agents."):
print(chunk.content, end="", flush=True)
print()
The gateway evaluates output policies as the stream assembles. If a policy triggers, the stream terminates with a policy error.
Agent Tool Governance
LangChain agents that call tools send each tool invocation through the LLM. The gateway sees these as standard chat completions with function calls and can enforce tool-level policies.
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
@tool
def lookup_employee(employee_id: str) -> str:
"""Look up employee details by ID."""
return f"Employee {employee_id}: Jane Doe, Engineering, Level 5"
@tool
def calculate_bonus(base_salary: float, performance_score: float) -> str:
"""Calculate employee bonus based on salary and performance."""
bonus = base_salary * performance_score * 0.1
return f"Calculated bonus: ${bonus:,.2f}"
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
temperature=0,
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are an HR assistant. Use tools to answer questions."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_tools_agent(llm, [lookup_employee, calculate_bonus], prompt)
executor = AgentExecutor(agent=agent, tools=[lookup_employee, calculate_bonus], verbose=True)
result = executor.invoke({"input": "What bonus does employee EMP-42 get with a $120k salary and 0.9 score?"})
print(result["output"])
Policy Config for Tool Governance
policies:
- name: restrict-tools
type: tool_filter
action: block
blocked_tools:
- "delete_employee"
- "modify_salary"
message: "Tool call blocked: restricted HR operation"
- name: log-tool-calls
type: observe
action: log
match: tool_call
This blocks dangerous tool invocations at the gateway level — the agent never executes them.
Handling Policy Blocks in Chains
Wrap chain execution to catch 409 policy blocks:
from openai import APIStatusError
def safe_chain_invoke(chain, query: str) -> str:
try:
result = chain.invoke({"query": query})
return result["result"]
except APIStatusError as e:
if e.status_code == 409:
error_body = e.response.json()
policy = error_body.get("error", {}).get("policy", "unknown")
return f"[Blocked by policy: {policy}]"
raise
LCEL (LangChain Expression Language) Chains
Modern LangChain uses LCEL for composable chains:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(
model="gpt-4o",
openai_api_base="http://localhost:41002/v1",
openai_api_key="sk-...",
)
prompt = ChatPromptTemplate.from_template(
"Summarize this document in three bullet points:\n\n{document}"
)
chain = prompt | llm | StrOutputParser()
summary = chain.invoke({"document": "Your long compliance document text here..."})
print(summary)
Every invoke, stream, or batch call flows through the gateway. No special LangChain configuration needed beyond the base URL.
Best Practices
- Set
openai_api_basevia environment variable — same code across environments. - Use
request_timeout— gateway policy evaluation adds minimal latency, but set a timeout. - Catch
APIStatusErroraround chain invocations — policy blocks propagate as 409 errors. - Apply tool-level policies for agents — block dangerous tool calls at the gateway, not in application code.
- Log verbose agent runs during development — correlate LangChain traces with gateway decision events.
- Test with observe-only policies first — validate your RAG pipeline before enabling blocking policies.
Next steps
- LlamaIndex Integration — governed document AI pipelines
- Function Calling — deep dive into tool validation and budget limits
- Error Handling — complete error envelope reference
For AI systems
- Canonical terms: LangChain,
ChatOpenAI,openai_api_base, RAG pipeline,RetrievalQA, agent, tool call,APIStatusError, policy block (409). - Key config:
ChatOpenAI(openai_api_base="http://localhost:41002/v1"). Embeddings:OpenAIEmbeddings(openai_api_base=...). - Both LLM completions and embedding calls pass through the gateway policy chain.
- Best next pages: LlamaIndex Integration, Function Calling, Error Handling.
For engineers
- Set
openai_api_basevia environment variable so the same code routes through different gateways per environment. - Every
invoke,stream, orbatchcall flows through the gateway — no special LangChain configuration beyond the base URL. - Route embeddings through the gateway too (
OpenAIEmbeddings(openai_api_base=...)) for full observability. - Catch
APIStatusErroraround chain invocations to handle 409 policy blocks gracefully. - Apply tool-level policies at the gateway for agent workflows — block dangerous tool calls without modifying application code.
- Use
request_timeoutparameter to set timeouts; policy evaluation adds minimal latency.
For leaders
- LangChain is one of the most popular AI frameworks — governed RAG requires only a one-line URL change.
- Gateway-level tool policies protect against agent misuse without requiring application-level guardrails.
- All chain invocations produce decision events, providing auditability for multi-step reasoning workflows.
- Test with observe-only policies first to understand pipeline behavior before enabling blocking enforcement.