Shadow AI Data: Discovering Ungoverned Data Flows in Your Organization
Shadow AI Data: Discovering Ungoverned Data Flows in Your Organization
You cannot govern what you cannot see, and you cannot see shadow AI by hoping people will self-report it. Keeptrusts will not magically detect traffic that never reaches the gateway, but it gives you something just as useful: a clean baseline of sanctioned AI traffic. Once every approved tool, team, and surface has a known route, key boundary, and event trail, anything missing from that governed baseline becomes much easier to spot and close.
Use this page when
- You are trying to discover which AI tools and integrations in your organization are still outside the governed path.
- You need a practical way to inventory sanctioned AI traffic by team, application surface, or gateway route.
- You want to reduce shadow AI by making the governed path explicit and measurable.
Primary audience
- Primary: Technical Leaders
- Secondary: Technical Engineers, AI Agents
The problem
Shadow AI is not one tool. It is a pattern.
Developers install AI IDE assistants directly against provider APIs. Ops teams wire one-off summarizers into internal bots. Analysts build small scripts with personal provider keys because they cannot wait for platform onboarding. None of these flows show up in the same place, and most of them produce no reusable audit trail.
That creates two risks at once. The first is obvious: sensitive data may be leaving the organization through ungoverned paths. The second is subtler: sanctioned traffic becomes hard to distinguish from unsanctioned traffic because no standard identity, key, or route model exists.
Many teams respond by trying to detect shadow AI entirely with policy blocks. That is not enough. If the traffic never reaches Keeptrusts, no policy can inspect it. The discovery strategy has to start by making sanctioned traffic easy to enumerate and hard to confuse.
The solution
Start by formalizing the governed path.
Use Routes and Consumer Groups to give every sanctioned tool or integration surface its own identity boundary. Use RBAC to require organization, user, and role headers so governed requests carry enough context to be attributable. Keep kt events active so the sanctioned stream is exportable and analyzable.
This is where Centralize AI Observability Across All Teams and the shadow-AI guidance in CIO AI Strategy become practical. The point is not to say "everything is governed now." The point is to create a high-confidence map of the traffic that is governed so the missing surfaces stand out.
Implementation
This configuration creates separate consumer-group identities for sanctioned surfaces and requires request identity on every governed call.
pack:
name: sanctioned-ai-baseline
version: "1.0.0"
enabled: true
providers:
targets:
- id: openai-governed
provider: openai
model: gpt-5.4-mini-mini
secret_key_ref:
env: OPENAI_GOVERNED_KEY
consumer_groups:
key_header: Authorization
groups:
- name: vscode-engineering
api_keys:
- "sha256:bdd37f7c744c3123f79384bd8f6e8e73d25416f0f8fbe90a6d26d55c89f1517c"
rate_limit:
max_requests: 5000
max_tokens: 2000000
window_seconds: 3600
upstream: openai-governed
chain:
- rbac
- audit-logger
- name: chatops-support
api_keys:
- "sha256:13f2c9f0d09f6afc1e45db2ef0db62d6a0ea46d7d5f0a8d25e6013e6f56f9c7c"
rate_limit:
max_requests: 1200
max_tokens: 400000
window_seconds: 3600
upstream: openai-governed
chain:
- rbac
- audit-logger
routes:
- name: ide-chat
path: "/v1/chat/completions"
headers:
X-Client-Surface: ide
upstream: openai-governed
priority: 20
- name: chatops-chat
path: "/v1/chat/completions"
headers:
X-Client-Surface: chatops
upstream: openai-governed
priority: 20
policy:
rbac:
deny_if_missing:
- X-User-ID
- X-Org-ID
- X-User-Role
require_auth: true
audit-logger: {}
This does not detect shadow AI by itself. What it does is create a trustworthy inventory of sanctioned AI usage.
Once approved surfaces have unique consumer-group names, rate limits, route matches, and required headers, you can export governed events and compare them with your known application inventory. If a team says it relies on an AI IDE assistant but there are no corresponding sanctioned events for that surface, that is a discovery lead. If an approved route exists but many requests arrive without the required headers, that is another lead. If a provider invoice is growing while the governed event stream for that team is flat, that is yet another lead.
Use the governed stream to create a baseline review artifact:
kt events export --since 30d --format json --output governed-ai-baseline.json
kt events tail --since 24h --json
Those commands give you a recent view and a time-bounded artifact. They will not expose traffic that bypassed the gateway, but they make the sanctioned footprint clear enough that missing or inconsistent coverage becomes obvious.
The practical lesson is simple: shadow AI discovery is mostly a gap-analysis problem. Keeptrusts helps by making the approved path explicit, attributable, and exportable.
Results and impact
Organizations that do this well usually discover two things quickly.
First, sanctioned AI usage is narrower than they thought, which means shadow usage was filling more of the demand curve than anyone admitted. Second, the lack of a standard key and identity model was making it impossible to compare one team's usage with another's.
Once the governed baseline exists, onboarding becomes easier too. Teams no longer need a custom exception for each surface. They receive a consumer-group identity, a route, and an event trail.
Key takeaways
- Keeptrusts cannot inspect traffic that never reaches the gateway, so discovery starts by formalizing sanctioned traffic.
- Use Routes and Consumer Groups and RBAC to make governed usage attributable.
- Use kt events exports to build a baseline of approved AI traffic.
- Treat missing or inconsistent sanctioned events as discovery leads, not as proof that nothing is happening.
- The easiest way to reduce shadow AI is to make the governed path standard and measurable.