Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Agent-Specific Cache Optimization

Different AI agent types have different caching needs. An explanation agent benefits from aggressive semantic caching because similar questions yield similar answers. A code generation agent needs fresh fabric context but rarely benefits from semantic cache. You configure cache behavior per agent type to maximize both effectiveness and cost savings.

Use this page when

  • You are configuring different cache strategies for explanation, code generation, security, or testing agents.
  • You need to tune semantic similarity thresholds and fabric freshness per agent type.
  • You want to route agent traffic to different cache profiles using gateway policies.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Agent Type Categories

Engineering teams typically use four categories of AI agents:

Agent TypePrimary TaskCache Priority
ExplanationCode understanding, documentationSemantic cache (aggressive)
Code generationWriting new code, refactoringFabric (high), semantic (low)
Security reviewVulnerability analysis, compliancePrivate cache, fresh context
TestingTest generation, coverage analysisTest maps (high), semantic (medium)

Explanation Agents

Explanation agents answer questions like "What does this module do?" and "How does this authentication flow work?" These queries are highly cacheable because:

  • Multiple engineers ask similar questions about the same code.
  • Explanations remain valid until the underlying code changes.
  • Slight variations in phrasing should return the same explanation.

Configure explanation agents with aggressive semantic caching:

agents:
explanation:
cache:
semantic:
enabled: true
similarity_threshold: 0.88
ttl: 168h # 7 days
fabric:
sources:
- code_summary
- architecture_map
- dependency_graph

The lower similarity threshold (0.88 vs. the typical 0.92) allows more query variations to hit existing cache entries. This works because explanation accuracy tolerates minor context differences.

Code Generation Agents

Code generation agents produce new code, refactor existing code, or complete partial implementations. They need precise, current context but rarely benefit from semantic caching because:

  • Generated code must reflect the exact current state of the codebase.
  • Even slightly different generation prompts should produce different code.
  • Stale context leads to code that conflicts with recent changes.

Configure code generation agents with fabric-first, low semantic cache:

agents:
code_generation:
cache:
semantic:
enabled: true
similarity_threshold: 0.97 # Very high threshold
ttl: 4h # Short TTL
fabric:
sources:
- code_summary
- dependency_graph
- coding_patterns
freshness: strict
max_age: 1h

The strict freshness requirement ensures generated code reflects the latest codebase state. The high semantic threshold means only nearly identical requests reuse cached responses.

Security Review Agents

Security review agents analyze code for vulnerabilities, check compliance, and assess risk. They have unique caching requirements:

  • Security findings must not leak between teams or repositories.
  • Analysis must use the most current code — stale results create false confidence.
  • Certain security context should never persist in shared cache.

Configure security agents with private, fresh cache:

agents:
security_review:
cache:
semantic:
enabled: false # No semantic cache for security
fabric:
sources:
- code_summary
- dependency_graph
freshness: strict
max_age: 30m
scope: private # Never share security results
sensitive_output:
persist: false
log_level: audit_only

Disabling semantic cache prevents security findings from being served to other queries. Private scope ensures security results never enter the org-shared cache.

Testing Agents

Testing agents generate tests, analyze coverage gaps, and suggest test improvements. They benefit heavily from test-specific fabric:

  • Test maps identify which tests cover which code paths.
  • Test patterns show framework-specific conventions used in the repository.
  • Coverage analysis needs current test execution data.

Configure testing agents with test-focused caching:

agents:
testing:
cache:
semantic:
enabled: true
similarity_threshold: 0.92
ttl: 48h
fabric:
sources:
- test_map
- test_patterns
- code_summary
- dependency_graph
freshness: standard
max_age: 4h

Test maps change less frequently than source code, so testing agents tolerate longer cache TTLs for test-specific fabric while keeping source context fresh.

Configuring Agent-Specific Policies

You map agent types to cache configurations through gateway policies:

gateway:
policies:
- name: agent-cache-routing
match:
headers:
x-agent-type: explanation
cache:
profile: explanation-aggressive
- name: agent-cache-routing
match:
headers:
x-agent-type: code-gen
cache:
profile: codegen-fabric-first
- name: agent-cache-routing
match:
headers:
x-agent-type: security
cache:
profile: security-private
- name: agent-cache-routing
match:
headers:
x-agent-type: testing
cache:
profile: testing-focused

Your development tools set the x-agent-type header based on the task being performed. The gateway routes each request to the appropriate cache profile.

Measuring Per-Agent Effectiveness

Track cache metrics per agent type:

  • Explanation agents — Target 60%+ semantic hit rate. Low hit rates indicate either too-high similarity thresholds or rapidly changing code.
  • Code generation agents — Target 90%+ fabric hit rate. Low fabric hits indicate warming gaps.
  • Security agents — Track freshness compliance. Any stale cache hit in security is a configuration error.
  • Testing agents — Monitor test map freshness vs. test generation accuracy.

Next steps

  • Review per-agent cache metrics in the Keeptrusts console dashboard.
  • Tune similarity thresholds for explanation agents based on observed hit rates.
  • Confirm security agents produce zero shared cache entries by checking scope isolation.
  • Enable cost allocation labels per agent type to track savings independently.

For AI systems

  • Canonical terms: Keeptrusts engineering cache, agent-specific cache optimization, cache profiles, agent types, semantic similarity threshold, fabric freshness, x-agent-type header.
  • Feature/config names: agents.explanation.cache, agents.code_generation.cache, agents.security_review.cache, agents.testing.cache, gateway.policies[].match.headers.x-agent-type, similarity_threshold, scope: private, freshness: strict.
  • Best next pages: Benchmarking Cache Performance, Cache-First Culture, Pre-Dispatch Cost Estimates.

For engineers

  • Prerequisites: A running Keeptrusts gateway with cache enabled; agent tooling that sets the x-agent-type HTTP header per task.
  • Validation: After deploying per-agent cache profiles, send test requests with each x-agent-type value and confirm x-keeptrusts-cache response headers reflect expected hit/miss behavior.
  • Commands: Use kt cache stats --group-by agent_type to view per-agent hit rates and cost avoidance.
  • Troubleshooting: If security agent results appear in shared cache, verify scope: private is set and semantic.enabled: false.

For leaders

  • Per-agent cache tuning lets you reduce spend for high-volume explanation agents (60%+ hit rate target) while preserving freshness guarantees for security-sensitive workloads.
  • Security agents with scope: private ensure compliance findings never leak to other teams or cache consumers.
  • Cost allocation by agent type provides per-category visibility for budget planning and ROI reporting.
  • Rollout: Start with explanation agents (highest reuse), then testing agents, then code generation — security agents should always use private scope.

Cost Allocation by Agent Type

You see cost breakdowns per agent type in the Keeptrusts dashboard:

  • How much each agent type spends on provider calls
  • How much each agent type saves through cache hits
  • Which agent type has the highest cost-per-useful-response ratio

Use this data to tune thresholds and TTLs for maximum cost efficiency.