Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Customizing the Chat Experience

Keeptrusts allows administrators to customize the chat experience through policy configuration, system prompts, and model parameter defaults. These settings shape how the Chat Workbench behaves for users while maintaining governance guardrails.

Use this page when

  • You are configuring system prompts to define the chat assistant's persona and boundaries.
  • You need to set default temperature, token limits, or model parameters for your organization.
  • You want to enforce response formatting policies (e.g., bullet points, citation requirements).
  • You are creating per-team chat configurations with different system prompts and model defaults.

Primary audience

  • Primary: Platform Administrators configuring the chat experience, AI Engineers tuning model parameters
  • Secondary: Technical Leaders defining assistant personas per team, Compliance Officers setting output standards

System Prompt Configuration

System prompts define the assistant's persona, capabilities, and boundaries. They are prepended to every conversation and are not visible to or editable by chat users.

Setting a System Prompt

System prompts are configured in the gateway's policy configuration:

# policy-config.yaml
chat:
system_prompt: |
You are a compliance-focused AI assistant for Acme Corp.
Always cite sources when making factual claims.
Do not provide legal advice — direct users to the legal team.
Respond in a professional, concise manner.

System Prompt Best Practices

GuidelineExample
Define the assistant's role"You are a technical support assistant for [Company]."
Set output expectations"Always respond in bullet points when listing items."
Establish boundaries"Do not provide medical, legal, or financial advice."
Require citations"Cite the source document when referencing company policies."
Set tone"Use a professional and approachable tone."

Per-Team System Prompts

Different teams can have different system prompts by using separate gateway configurations:

  1. Create a configuration for each team in the console.
  2. Set the appropriate system prompt in each configuration.
  3. Bind each configuration to the team's gateway.

This allows the engineering team to have a code-focused assistant while the compliance team has a regulation-focused assistant.

Response Formatting Policies

Control how responses are structured and presented through output policies.

Enforcing Response Formats

Configure output policies to enforce formatting requirements:

pack:
name: chat-customization-example-2
version: 1.0.0
enabled: true
policies:
chain:
- output
policy:
output: {}

Common Formatting Controls

ControlPurpose
Max response lengthLimits verbose responses to a manageable size
Required sectionsEnsures responses include specific sections (summary, details, references)
Language enforcementRestricts responses to approved languages
Markdown formattingRequires or prohibits markdown in responses
Disclaimer appendingAutomatically adds compliance disclaimers to responses

Disclaimer Configuration

Add automatic disclaimers to all chat responses:

pack:
name: chat-customization-example-3
version: 1.0.0
enabled: true
policies:
chain:
- output
policy:
output: {}

Temperature and Token Limits

Temperature Controls

Temperature affects the randomness and creativity of LLM responses. Configure limits at the gateway level:

chat:
parameters:
temperature:
default: 0.7
min: 0.0
max: 1.0
TemperatureBehaviorUse Case
0.0 - 0.3Deterministic, focusedFactual queries, compliance checks
0.4 - 0.7BalancedGeneral conversation, analysis
0.8 - 1.0Creative, variedBrainstorming, content generation

Token Limits

Control the maximum tokens for prompts and responses:

chat:
parameters:
max_tokens:
default: 2000
max: 4000
max_input_tokens: 8000
  • max_tokens: Maximum tokens in the LLM response.
  • max_input_tokens: Maximum tokens in the user's prompt (including system prompt and knowledge context).

Why Limit Tokens

ReasonImpact
Cost controlPrevents unexpectedly expensive responses
Response qualityOverly long responses often lose focus
LatencyShorter responses are faster to generate
Policy efficiencyShorter content is faster to evaluate through the policy chain

Model Defaults

Default Model Selection

Configure the default model that the Chat Workbench selects when a user starts a new conversation:

chat:
default_model: gpt-4o-mini

Users can still switch models using the model selector, but the default guides them toward the organization's preferred choice.

Model Allowlists

Restrict which models are available to chat users:

chat:
allowed_models:
- gpt-4o
- gpt-4o-mini
- claude-sonnet

Models not in the allowlist do not appear in the model selector, even if the gateway has access to additional providers.

Per-Team Model Defaults

Different teams can have different model defaults and allowlists through separate configurations:

TeamDefault ModelAllowed Models
Engineeringgpt-4ogpt-4o, claude-sonnet
Customer Supportgpt-4o-minigpt-4o-mini
Researchclaude-sonnetgpt-4o, claude-sonnet, gemini-pro

Advanced Parameter Configuration

Top-P (Nucleus Sampling)

Control the diversity of token selection:

chat:
parameters:
top_p:
default: 1.0
min: 0.1
max: 1.0

Lower top_p values focus responses on the most likely tokens. Typically, adjust either temperature or top_p, not both simultaneously.

Frequency and Presence Penalties

Reduce repetition in responses:

chat:
parameters:
frequency_penalty:
default: 0.0
max: 2.0
presence_penalty:
default: 0.0
max: 2.0
  • Frequency penalty: Reduces the likelihood of repeating the same tokens.
  • Presence penalty: Encourages the model to introduce new topics.

Stop Sequences

Configure sequences that cause the model to stop generating:

chat:
parameters:
stop_sequences:
- "END_OF_RESPONSE"
- "---"

Applying Configuration Changes

Via the Console

  1. Navigate to Configurations in the console.
  2. Select the configuration associated with your gateway.
  3. Edit the chat parameters in the configuration editor.
  4. Save the configuration.
  5. The gateway reloads the configuration automatically.

Via the CLI

# Apply a local policy configuration
kt config apply -f policy-config.yaml

# Verify the active configuration
kt config show

Via Git Sync

If your organization uses Git-based configuration management:

  1. Edit the policy-config.yaml in your linked repository.
  2. Commit and push the changes.
  3. The API sync worker detects the change and updates the gateway configuration.

Validating Customizations

After applying changes:

  1. Open the Chat Workbench.
  2. Verify the default model is correct.
  3. Send a test message to confirm the system prompt is active.
  4. Test edge cases to verify token limits and formatting policies.
  5. Check the Events page to confirm policy evaluations reflect the new settings.

Best Practices

PracticeWhy It Matters
Keep system prompts conciseLong system prompts consume tokens and increase costs
Set conservative token limits by defaultUsers can request increases if needed
Use lower temperatures for compliance-sensitive tasksReduces hallucination risk
Document customizationsHelps future administrators understand configuration choices
Test changes in a staging environment firstAvoids disrupting production chat users
Review parameter settings quarterlyEnsures settings remain appropriate as usage evolves

Next steps

For AI systems

  • Canonical terms: system prompt, response formatting policy, temperature limit, token limit, model defaults, per-team configuration, output policy.
  • Config keys: chat.system_prompt, chat.default_model, chat.max_output_tokens, chat.temperature_max. Policy type: format_enforcement.
  • Best next pages: Advanced Chat Patterns, Multi-Model Comparison, Team Chat Environments.

For engineers

  • Define system prompts in policy-config.yaml under chat.system_prompt — they are prepended to every conversation and invisible to users.
  • Create separate gateway configurations per team in the console to assign different system prompts and model defaults.
  • Set temperature_max in output policies to cap hallucination risk for compliance-sensitive tasks.
  • Configure max_output_tokens to control cost; test limits in the Chat Workbench before enforcing in production.
  • Validate customizations: open Chat Workbench, send a test message, then check the Events page for correct policy evaluation.

For leaders

  • System prompts shape the assistant's tone, boundaries, and domain focus — different teams may need different personas.
  • Conservative token limits reduce cost but may constrain output quality for complex tasks; start conservative and adjust based on analytics.
  • Per-team customization enables domain-specific AI assistants without separate deployments.
  • Quarterly review of parameter settings ensures they remain appropriate as model capabilities and usage patterns evolve.