Skip to main content

Connecting External Document Sources to the Knowledge Base

The first generation of knowledge-grounded AI often depends on manual uploads. That is fine for a pilot. It is rarely enough for production. The moment your documentation lives in external systems, the real requirement becomes durable sync: where the content comes from, how often it refreshes, which configurations receive it, and whether new material should auto-promote or wait in draft. Keeptrusts handles that with declarative knowledge sources.

Use this page when

  • Your approved source material already lives in connected systems rather than a local upload folder.
  • You want a scheduled or event-driven path from external content into the Knowledge Base.
  • You need to control sync behavior, bindings, and promotion rules in configuration instead of handling them manually.

Primary audience

  • Primary: Technical Engineers and platform teams
  • Secondary: Knowledge managers, AI operations teams, technical leaders

The problem

Manual ingestion creates three operational problems once a team scales beyond a handful of assets.

First, freshness becomes unreliable. External documentation changes, but the Knowledge Base does not update unless someone remembers to re-upload content.

Second, provenance gets fuzzy. Teams know an asset came from “some connector” or “some tool,” but not which declared source, which schedule, or which binding made it available at runtime.

Third, rollout quality becomes inconsistent. Some teams want every new version reviewed before activation. Others want low-risk sources to auto-promote after sync. Without a declarative model, those decisions become tribal knowledge and one-off scripts.

In short, manual upload works for authorship. It does not scale for ongoing source integration.

The solution

Keeptrusts supports a top-level knowledge_sources block in gateway configuration. Each source declares where the content comes from, how it syncs, what it binds to, and whether newly synced versions auto-promote.

The core source kinds are deliberately explicit:

  • connector_sync pulls content from an external connector and requires connector_ref.
  • tool_projection projects output from a tool server and requires tool_server_ref.
  • upload, manual, and learning_synthesis represent self-contained sources that do not take those references.

That matters because source behavior becomes readable in configuration. You do not have to reverse-engineer an ingestion script to understand how content enters the Knowledge Base.

Implementation

The simplest pattern is to declare one source for a scheduled connector sync and another for a tool-backed projection:

knowledge_sources:
- id: api-docs-sync
name: API Documentation
source_kind: connector_sync
connector_ref: confluence-main
schedule:
mode: periodic
interval_minutes: 60
binding:
configuration_ids:
- prod-gateway-config
auto_promote: true
labels:
domain: engineering
freshness: hourly

- id: code-index-projection
name: Code Search Index
source_kind: tool_projection
tool_server_ref: code-search
schedule:
mode: on_change
binding:
configuration_ids:
- dev-gateway-config
auto_promote: false
labels:
domain: engineering

This example shows two useful distinctions.

The connector-backed source is configured for steady refresh and auto-promotion because the team is comfortable with routine documentation updates becoming active quickly.

The tool-backed source is event-driven and does not auto-promote, which is appropriate when projected content is powerful but needs a review step before runtime use.

The validation rules are strict on purpose. A connector_sync source must have connector_ref. A tool_projection source must have tool_server_ref. A single source cannot declare both. That prevents ambiguous or half-valid definitions that are hard to reason about later.

For teams moving from manual uploads, the main design choice is not syntax. It is governance posture. Ask three questions for each source:

  1. How fresh does the content need to be?
  2. Which configurations should see it?
  3. Is auto-promotion safe for this source?

Low-risk internal references may be good candidates for periodic sync and auto-promotion. High-stakes regulatory or contractual content often deserves periodic sync with auto_promote: false, followed by an explicit review and activation step.

This model also works well with knowledge-source labeling. Even simple labels such as domain: engineering or freshness: hourly make discovery and future policy decisions easier. They do not replace governance controls, but they improve operational clarity.

Results and impact

External document connectivity changes knowledge grounding from a publishing task into a managed data flow.

Freshness improves because the sync schedule is declared instead of implied. Scope improves because the binding is written down in configuration. Change review improves because auto-promotion becomes an explicit choice rather than an accidental side effect.

It also reduces the cost of scale. Teams stop duplicating import routines for every external source because the Knowledge Base layer already understands source kind, schedule, and binding behavior. That makes it easier to add one more document source without creating one more fragile pipeline.

Perhaps most importantly, it keeps governance visible. When a grounded answer depends on synchronized external content, operators can trace the path from source declaration to binding to runtime use instead of guessing which background job last touched the asset.

Key takeaways

  • Declarative knowledge_sources are the production-friendly way to connect external content into the Knowledge Base.
  • connector_sync and tool_projection are intentionally distinct because they represent different trust and freshness models.
  • schedule, binding, and auto_promote are the real governance controls for connected knowledge, not just convenience fields.
  • Validation rules around connector_ref and tool_server_ref keep source definitions explicit and safe.
  • External sync is most useful when paired with a deliberate promotion posture for each source.

Next steps