Insider Threat: Governing Against Internal AI Misuse

Insider misuse is harder than anonymous abuse because the actor already has context. They know which customer names matter, which project codenames are sensitive, which workflows are trusted, and which prompts will look ordinary to everyone else. That is why internal AI governance cannot rely on broad acceptable-use language alone. Keeptrusts helps by enforcing identity-scoped controls, request-boundary defense, redaction and blocking for sensitive data, and a decision trail that investigators can actually use later.

Use this page when

You are governing AI usage for employees, contractors, or internal service accounts rather than anonymous public traffic.
You need a policy model that distinguishes legitimate internal work from risky internal misuse.
You want to reduce both deliberate abuse and accidental exposure without shutting down useful internal AI workflows.

Primary audience

Primary: Security teams, technical engineers, and governance operators
Secondary: Technical Leaders, compliance owners, AI Agents

The control map

The five shortest reference pages for internal misuse scenarios are Rate Limits, Audit Logger, Prompt Injection Detection, Block Prompt Injection Attacks Before They Reach Your Models, and Prevent Sensitive Data Leaks in AI Requests.

The problem

Insider risk sits in the gap between authentication and intent. An employee may be fully authorized to use an AI tool and still use it in a way that violates policy.

That can look like deliberate misconduct. Someone tries to paste a private key into an assistant for debugging, summarize an unreleased customer export, or probe whether hidden instructions can be revealed from a governed workflow.

It can also look like convenience. A well-meaning engineer pastes production logs that contain PAN-like strings. A support agent pastes a customer thread with email addresses and payment references. A reviewer asks the model to reorganize an internal incident note into an external-ready summary without realizing what sensitive fields are embedded inside it.

In both cases, the actor is already inside the tent. They may be on a trusted network. They may use a valid session. They may know exactly how your organization names restricted projects or how your model prompts are structured. That means perimeter-only controls miss the point.

The right design question is not "Can this person log in?" It is "What can this person send through the AI gateway, how quickly, with what identity, and with what evidence left behind when something goes wrong?"

The solution

Keeptrusts answers that question at the request boundary.

Use Prompt Injection Detection to stop obvious instruction overrides, hidden-boundary tricks, and multi-turn attacks before they hit the provider. This matters for insiders because internal misuse is often wrapped in familiar language. The request may look like ordinary work right up until it tries to reveal hidden context or override task boundaries.

Use PII Detector to redact or block sensitive identifiers before they leave the gateway. The documented implementation supports request-side enforcement plus buffered output redaction when the policy is present in the chain. That means an internal request can be sanitized or rejected before it becomes a provider-side exposure event.

Use rate limits to make abuse attributable and bounded. Internal misuse often happens through valid accounts. Per-user quotas are therefore more useful than generic perimeter throttles. They help security teams separate one risky actor from a general workload surge.

Then make the decision stream observable. Audit Logger does not itself enforce retention or immutability, but it marks audit logging as active in the chain so the resulting events, exports, and review workflows form a stable evidence trail.

Implementation

This configuration shows a strict internal-workload lane. The gateway enforces per-user fairness, blocks prompt-boundary attacks, blocks request-side sensitive data when custom patterns or PCI-like content are detected, and marks audit visibility as active.

pack:
  name: insider-risk-controls
  version: 1.0.0
  enabled: true

rate_limits:
  per_user:
    rpm: 12
    tpm: 40000
    max_parallel_requests: 3

  per_team:
    rpm: 160
    tpm: 500000
    max_parallel_requests: 30

user_rate_limit:
  header: "X-User-Id"
  strategy: sliding_window
  window_seconds: 60

policies:
  chain:
    - prompt-injection
    - pii-detector
    - audit-logger

policy:
  prompt-injection:
    use_embedding: true
    detection:
      embedding_threshold: 0.78
      attack_patterns:
        - "ignore.*previous.*instructions"
        - "reveal.*system.*prompt"
        - "dump.*all.*context"
    encoding:
      decode_base64: true
      normalize_unicode: true
      detect_homoglyphs: true
    boundaries:
      enforce_delimiters: true
      reject_fake_boundaries: true

  pii-detector:
    action: block
    healthcare_mode: false
    pci_mode: true
    detect_patterns:
      - 'AKIA[0-9A-Z]{16}'
      - 'ghp_[0-9A-Za-z]{36}'
      - '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
    redaction:
      marker_format: label
      include_metadata: true
      preserve_length: false
      custom_markers: {}

  audit-logger: {}

There are two reasons to prefer action: block for high-risk internal lanes.

The first is clarity. If an employee pastes a private key or obvious credential material, you usually want a hard stop, not a partially sanitized guess that still leaves a risky workflow intact.

The second is investigation value. A hard block produces a cleaner event stream for triage because the system is making an explicit decision instead of silently normalizing questionable input.

For triage, pair the policy with CLI evidence collection:

kt events tail --since 24h --verdict blocked --json
kt escalation list

Even if you are not routing every internal misuse case to a human reviewer, the event stream still matters. You want to know whether one identity is repeatedly triggering the same reason code or whether a team workflow needs better training and safer defaults.

Results and impact

The first effect is that internal AI misuse becomes much harder to hide inside ordinary work. The same valid identity that authorizes the request also becomes the unit of enforcement and investigation.

The second effect is fewer accidental leaks. Teams stop relying on users to notice every PAN-like string, every private key fragment, or every injected instruction buried inside copied context.

The third effect is better separation between malicious behavior and bad process. If a cluster of employees all trigger the same block reason, you may have a workflow problem. If one identity repeatedly probes the same boundary, you may have an insider threat problem. The gateway gives you enough structure to tell the difference.

Governance is strongest here when it is explicit. Internal AI usage is valuable. That is exactly why it needs controls that assume good people sometimes make bad decisions and bad actors sometimes look ordinary.

Key takeaways

Insider risk is not a login problem. It is a request-governance and evidence problem.
Prompt Injection Detection matters for internal traffic because trusted users can still submit adversarial inputs.
PII Detector is the practical control for stopping or sanitizing sensitive identifiers before provider exposure.
Rate Limits make internal misuse attributable and bounded at the user scope.
Audit Logger keeps the audit trail explicit even though storage and retention live elsewhere in the platform.

Insider Threat: Governing Against Internal AI Misuse

Use this page when​

Primary audience​

The control map​

The problem​

The solution​

Implementation​

Results and impact​

Key takeaways​

Next steps​