Customizing the Chat Experience

Keeptrusts allows administrators to customize the chat experience through policy configuration, system prompts, and model parameter defaults. These settings shape how the Chat Workbench behaves for users while maintaining governance guardrails.

Use this page when

You are configuring system prompts to define the chat assistant's persona and boundaries.
You need to set default temperature, token limits, or model parameters for your organization.
You want to enforce response formatting policies (e.g., bullet points, citation requirements).
You are creating per-team chat configurations with different system prompts and model defaults.

Primary audience

Primary: Platform Administrators configuring the chat experience, AI Engineers tuning model parameters
Secondary: Technical Leaders defining assistant personas per team, Compliance Officers setting output standards

System Prompt Configuration

System prompts define the assistant's persona, capabilities, and boundaries. They are prepended to every conversation and are not visible to or editable by chat users.

Setting a System Prompt

System prompts are configured in the gateway's policy configuration:

# policy-config.yaml
chat:
  system_prompt: |
    You are a compliance-focused AI assistant for Acme Corp.
    Always cite sources when making factual claims.
    Do not provide legal advice — direct users to the legal team.
    Respond in a professional, concise manner.

System Prompt Best Practices

Guideline	Example
Define the assistant's role	"You are a technical support assistant for [Company]."
Set output expectations	"Always respond in bullet points when listing items."
Establish boundaries	"Do not provide medical, legal, or financial advice."
Require citations	"Cite the source document when referencing company policies."
Set tone	"Use a professional and approachable tone."

Per-Team System Prompts

Different teams can have different system prompts by using separate gateway configurations:

Create a configuration for each team in the console.
Set the appropriate system prompt in each configuration.
Bind each configuration to the team's gateway.

This allows the engineering team to have a code-focused assistant while the compliance team has a regulation-focused assistant.

Response Formatting Policies

Control how responses are structured and presented through output policies.

Enforcing Response Formats

Configure output policies to enforce formatting requirements:

pack:
  name: chat-customization-example-2
  version: 1.0.0
  enabled: true
policies:
  chain:
  - output
policy:
  output: {}

Common Formatting Controls

Control	Purpose
Max response length	Limits verbose responses to a manageable size
Required sections	Ensures responses include specific sections (summary, details, references)
Language enforcement	Restricts responses to approved languages
Markdown formatting	Requires or prohibits markdown in responses
Disclaimer appending	Automatically adds compliance disclaimers to responses

Disclaimer Configuration

Add automatic disclaimers to all chat responses:

pack:
  name: chat-customization-example-3
  version: 1.0.0
  enabled: true
policies:
  chain:
  - output
policy:
  output: {}

Temperature and Token Limits

Temperature Controls

Temperature affects the randomness and creativity of LLM responses. Configure limits at the gateway level:

chat:
  parameters:
    temperature:
      default: 0.7
      min: 0.0
      max: 1.0

Temperature	Behavior	Use Case
0.0 - 0.3	Deterministic, focused	Factual queries, compliance checks
0.4 - 0.7	Balanced	General conversation, analysis
0.8 - 1.0	Creative, varied	Brainstorming, content generation

Token Limits

Control the maximum tokens for prompts and responses:

chat:
  parameters:
    max_tokens:
      default: 2000
      max: 4000
    max_input_tokens: 8000

max_tokens: Maximum tokens in the LLM response.
max_input_tokens: Maximum tokens in the user's prompt (including system prompt and knowledge context).

Why Limit Tokens

Reason	Impact
Cost control	Prevents unexpectedly expensive responses
Response quality	Overly long responses often lose focus
Latency	Shorter responses are faster to generate
Policy efficiency	Shorter content is faster to evaluate through the policy chain

Model Defaults

Default Model Selection

Configure the default model that the Chat Workbench selects when a user starts a new conversation:

chat:
  default_model: gpt-4o-mini

Users can still switch models using the model selector, but the default guides them toward the organization's preferred choice.

Model Allowlists

Restrict which models are available to chat users:

chat:
  allowed_models:
    - gpt-4o
    - gpt-4o-mini
    - claude-sonnet

Models not in the allowlist do not appear in the model selector, even if the gateway has access to additional providers.

Per-Team Model Defaults

Different teams can have different model defaults and allowlists through separate configurations:

Team	Default Model	Allowed Models
Engineering	gpt-4o	gpt-4o, claude-sonnet
Customer Support	gpt-4o-mini	gpt-4o-mini
Research	claude-sonnet	gpt-4o, claude-sonnet, gemini-pro

Advanced Parameter Configuration

Top-P (Nucleus Sampling)

Control the diversity of token selection:

chat:
  parameters:
    top_p:
      default: 1.0
      min: 0.1
      max: 1.0

Lower top_p values focus responses on the most likely tokens. Typically, adjust either temperature or top_p, not both simultaneously.

Frequency and Presence Penalties

Reduce repetition in responses:

chat:
  parameters:
    frequency_penalty:
      default: 0.0
      max: 2.0
    presence_penalty:
      default: 0.0
      max: 2.0

Frequency penalty: Reduces the likelihood of repeating the same tokens.
Presence penalty: Encourages the model to introduce new topics.

Stop Sequences

Configure sequences that cause the model to stop generating:

chat:
  parameters:
    stop_sequences:
      - "END_OF_RESPONSE"
      - "---"

Applying Configuration Changes

Via the Console

Navigate to Configurations in the console.
Select the configuration associated with your gateway.
Edit the chat parameters in the configuration editor.
Save the configuration.
The gateway reloads the configuration automatically.

Via the CLI

# Apply a local policy configuration
kt config apply -f policy-config.yaml

# Verify the active configuration
kt config show

Via Git Sync

If your organization uses Git-based configuration management:

Edit the policy-config.yaml in your linked repository.
Commit and push the changes.
The API sync worker detects the change and updates the gateway configuration.

Validating Customizations

After applying changes:

Open the Chat Workbench.
Verify the default model is correct.
Send a test message to confirm the system prompt is active.
Test edge cases to verify token limits and formatting policies.
Check the Events page to confirm policy evaluations reflect the new settings.

Best Practices

Practice	Why It Matters
Keep system prompts concise	Long system prompts consume tokens and increase costs
Set conservative token limits by default	Users can request increases if needed
Use lower temperatures for compliance-sensitive tasks	Reduces hallucination risk
Document customizations	Helps future administrators understand configuration choices
Test changes in a staging environment first	Avoids disrupting production chat users
Review parameter settings quarterly	Ensures settings remain appropriate as usage evolves

Next steps

Build complex conversation patterns in Advanced Chat Patterns.
Compare how customizations affect model performance in Multi-Model Chat Comparison.
Track the impact of customizations on usage in Chat Analytics & Usage Insights.

For AI systems

Canonical terms: system prompt, response formatting policy, temperature limit, token limit, model defaults, per-team configuration, output policy.
Config keys: chat.system_prompt, chat.default_model, chat.max_output_tokens, chat.temperature_max. Policy type: format_enforcement.
Best next pages: Advanced Chat Patterns, Multi-Model Comparison, Team Chat Environments.

For engineers

Define system prompts in policy-config.yaml under chat.system_prompt — they are prepended to every conversation and invisible to users.
Create separate gateway configurations per team in the console to assign different system prompts and model defaults.
Set temperature_max in output policies to cap hallucination risk for compliance-sensitive tasks.
Configure max_output_tokens to control cost; test limits in the Chat Workbench before enforcing in production.
Validate customizations: open Chat Workbench, send a test message, then check the Events page for correct policy evaluation.

For leaders

System prompts shape the assistant's tone, boundaries, and domain focus — different teams may need different personas.
Conservative token limits reduce cost but may constrain output quality for complex tasks; start conservative and adjust based on analytics.
Per-team customization enables domain-specific AI assistants without separate deployments.
Quarterly review of parameter settings ensures they remain appropriate as model capabilities and usage patterns evolve.

Use this page when​

Primary audience​

System Prompt Configuration​

Setting a System Prompt​

System Prompt Best Practices​

Per-Team System Prompts​

Response Formatting Policies​

Enforcing Response Formats​

Common Formatting Controls​

Disclaimer Configuration​

Temperature and Token Limits​

Temperature Controls​

Token Limits​

Why Limit Tokens​

Model Defaults​

Default Model Selection​

Model Allowlists​

Per-Team Model Defaults​

Advanced Parameter Configuration​

Top-P (Nucleus Sampling)​

Frequency and Presence Penalties​

Stop Sequences​

Applying Configuration Changes​

Via the Console​

Via the CLI​

Via Git Sync​

Validating Customizations​

Best Practices​

Next steps​

For AI systems​

For engineers​

For leaders​