Customizing the Chat Experience
Keeptrusts allows administrators to customize the chat experience through policy configuration, system prompts, and model parameter defaults. These settings shape how the Chat Workbench behaves for users while maintaining governance guardrails.
Use this page when
- You are configuring system prompts to define the chat assistant's persona and boundaries.
- You need to set default temperature, token limits, or model parameters for your organization.
- You want to enforce response formatting policies (e.g., bullet points, citation requirements).
- You are creating per-team chat configurations with different system prompts and model defaults.
Primary audience
- Primary: Platform Administrators configuring the chat experience, AI Engineers tuning model parameters
- Secondary: Technical Leaders defining assistant personas per team, Compliance Officers setting output standards
System Prompt Configuration
System prompts define the assistant's persona, capabilities, and boundaries. They are prepended to every conversation and are not visible to or editable by chat users.
Setting a System Prompt
System prompts are configured in the gateway's policy configuration:
# policy-config.yaml
chat:
system_prompt: |
You are a compliance-focused AI assistant for Acme Corp.
Always cite sources when making factual claims.
Do not provide legal advice — direct users to the legal team.
Respond in a professional, concise manner.
System Prompt Best Practices
| Guideline | Example |
|---|---|
| Define the assistant's role | "You are a technical support assistant for [Company]." |
| Set output expectations | "Always respond in bullet points when listing items." |
| Establish boundaries | "Do not provide medical, legal, or financial advice." |
| Require citations | "Cite the source document when referencing company policies." |
| Set tone | "Use a professional and approachable tone." |
Per-Team System Prompts
Different teams can have different system prompts by using separate gateway configurations:
- Create a configuration for each team in the console.
- Set the appropriate system prompt in each configuration.
- Bind each configuration to the team's gateway.
This allows the engineering team to have a code-focused assistant while the compliance team has a regulation-focused assistant.
Response Formatting Policies
Control how responses are structured and presented through output policies.
Enforcing Response Formats
Configure output policies to enforce formatting requirements:
pack:
name: chat-customization-example-2
version: 1.0.0
enabled: true
policies:
chain:
- output
policy:
output: {}
Common Formatting Controls
| Control | Purpose |
|---|---|
| Max response length | Limits verbose responses to a manageable size |
| Required sections | Ensures responses include specific sections (summary, details, references) |
| Language enforcement | Restricts responses to approved languages |
| Markdown formatting | Requires or prohibits markdown in responses |
| Disclaimer appending | Automatically adds compliance disclaimers to responses |
Disclaimer Configuration
Add automatic disclaimers to all chat responses:
pack:
name: chat-customization-example-3
version: 1.0.0
enabled: true
policies:
chain:
- output
policy:
output: {}
Temperature and Token Limits
Temperature Controls
Temperature affects the randomness and creativity of LLM responses. Configure limits at the gateway level:
chat:
parameters:
temperature:
default: 0.7
min: 0.0
max: 1.0
| Temperature | Behavior | Use Case |
|---|---|---|
| 0.0 - 0.3 | Deterministic, focused | Factual queries, compliance checks |
| 0.4 - 0.7 | Balanced | General conversation, analysis |
| 0.8 - 1.0 | Creative, varied | Brainstorming, content generation |
Token Limits
Control the maximum tokens for prompts and responses:
chat:
parameters:
max_tokens:
default: 2000
max: 4000
max_input_tokens: 8000
- max_tokens: Maximum tokens in the LLM response.
- max_input_tokens: Maximum tokens in the user's prompt (including system prompt and knowledge context).
Why Limit Tokens
| Reason | Impact |
|---|---|
| Cost control | Prevents unexpectedly expensive responses |
| Response quality | Overly long responses often lose focus |
| Latency | Shorter responses are faster to generate |
| Policy efficiency | Shorter content is faster to evaluate through the policy chain |
Model Defaults
Default Model Selection
Configure the default model that the Chat Workbench selects when a user starts a new conversation:
chat:
default_model: gpt-4o-mini
Users can still switch models using the model selector, but the default guides them toward the organization's preferred choice.
Model Allowlists
Restrict which models are available to chat users:
chat:
allowed_models:
- gpt-4o
- gpt-4o-mini
- claude-sonnet
Models not in the allowlist do not appear in the model selector, even if the gateway has access to additional providers.
Per-Team Model Defaults
Different teams can have different model defaults and allowlists through separate configurations:
| Team | Default Model | Allowed Models |
|---|---|---|
| Engineering | gpt-4o | gpt-4o, claude-sonnet |
| Customer Support | gpt-4o-mini | gpt-4o-mini |
| Research | claude-sonnet | gpt-4o, claude-sonnet, gemini-pro |
Advanced Parameter Configuration
Top-P (Nucleus Sampling)
Control the diversity of token selection:
chat:
parameters:
top_p:
default: 1.0
min: 0.1
max: 1.0
Lower top_p values focus responses on the most likely tokens. Typically, adjust either temperature or top_p, not both simultaneously.
Frequency and Presence Penalties
Reduce repetition in responses:
chat:
parameters:
frequency_penalty:
default: 0.0
max: 2.0
presence_penalty:
default: 0.0
max: 2.0
- Frequency penalty: Reduces the likelihood of repeating the same tokens.
- Presence penalty: Encourages the model to introduce new topics.
Stop Sequences
Configure sequences that cause the model to stop generating:
chat:
parameters:
stop_sequences:
- "END_OF_RESPONSE"
- "---"
Applying Configuration Changes
Via the Console
- Navigate to Configurations in the console.
- Select the configuration associated with your gateway.
- Edit the chat parameters in the configuration editor.
- Save the configuration.
- The gateway reloads the configuration automatically.
Via the CLI
# Apply a local policy configuration
kt config apply -f policy-config.yaml
# Verify the active configuration
kt config show
Via Git Sync
If your organization uses Git-based configuration management:
- Edit the
policy-config.yamlin your linked repository. - Commit and push the changes.
- The API sync worker detects the change and updates the gateway configuration.
Validating Customizations
After applying changes:
- Open the Chat Workbench.
- Verify the default model is correct.
- Send a test message to confirm the system prompt is active.
- Test edge cases to verify token limits and formatting policies.
- Check the Events page to confirm policy evaluations reflect the new settings.
Best Practices
| Practice | Why It Matters |
|---|---|
| Keep system prompts concise | Long system prompts consume tokens and increase costs |
| Set conservative token limits by default | Users can request increases if needed |
| Use lower temperatures for compliance-sensitive tasks | Reduces hallucination risk |
| Document customizations | Helps future administrators understand configuration choices |
| Test changes in a staging environment first | Avoids disrupting production chat users |
| Review parameter settings quarterly | Ensures settings remain appropriate as usage evolves |
Next steps
- Build complex conversation patterns in Advanced Chat Patterns.
- Compare how customizations affect model performance in Multi-Model Chat Comparison.
- Track the impact of customizations on usage in Chat Analytics & Usage Insights.
For AI systems
- Canonical terms: system prompt, response formatting policy, temperature limit, token limit, model defaults, per-team configuration, output policy.
- Config keys:
chat.system_prompt,chat.default_model,chat.max_output_tokens,chat.temperature_max. Policy type:format_enforcement. - Best next pages: Advanced Chat Patterns, Multi-Model Comparison, Team Chat Environments.
For engineers
- Define system prompts in
policy-config.yamlunderchat.system_prompt— they are prepended to every conversation and invisible to users. - Create separate gateway configurations per team in the console to assign different system prompts and model defaults.
- Set
temperature_maxin output policies to cap hallucination risk for compliance-sensitive tasks. - Configure
max_output_tokensto control cost; test limits in the Chat Workbench before enforcing in production. - Validate customizations: open Chat Workbench, send a test message, then check the Events page for correct policy evaluation.
For leaders
- System prompts shape the assistant's tone, boundaries, and domain focus — different teams may need different personas.
- Conservative token limits reduce cost but may constrain output quality for complex tasks; start conservative and adjust based on analytics.
- Per-team customization enables domain-specific AI assistants without separate deployments.
- Quarterly review of parameter settings ensures they remain appropriate as model capabilities and usage patterns evolve.