Skip to main content
Browse docs
By Audience
Getting Started
Configuration
Use Cases
IDE Integration
Third-Party Integrations
Engineering Cache
Console
API Reference
Gateway
Workflow Guides
Templates
Providers and SDKs
Industry Guides
Advanced Guides
Browse by Role
Deployment Guides
In-Depth Guides
Tutorials
FAQ

Google Vertex AI

Google Vertex AI provides enterprise-grade access to Gemini, Claude, and open models with Google Cloud IAM, VPC Service Controls, and audit logging. Keeptrusts adds a governance gateway layer on top of Vertex AI, enforcing policy before and after each LLM call. Authentication uses Application Default Credentials (ADC) or a service account key -- no separate API key is required in the client.

Use this page when

  • You need the exact command, config, API, or integration details for Google Vertex AI.
  • You are wiring automation or AI retrieval and need canonical names, examples, and constraints.
  • If you want a guided rollout instead of a reference page, use the linked workflow pages in Next steps.

Primary audience

  • Primary: AI Agents, Technical Engineers
  • Secondary: Technical Leaders

Prerequisites

  • A GCP project with the Vertex AI API enabled (aiplatform.googleapis.com)
  • The IAM role roles/aiplatform.user granted on the project (or on specific model resources)
  • The Google Cloud CLI authenticated locally, or a service account key file
# Option A -- authenticate with user credentials (development)
gcloud auth application-default login

# Option B -- authenticate with a service account key (CI / production)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json

Configuration

pack:
name: vertex-ai-governance
version: 1.0.0
enabled: true
policies:
chain:
- prompt-injection
- pii-detector
- audit-logger
policy:
prompt-injection:
threshold: 0.8
action: block
pii-detector:
action: redact
fields:
- email
- phone
- ssn
audit-logger:
retention_days: 365
providers:
targets:
- id: vertex-gemini-flash
provider: google-vertex:chat:gemini-2.0-flash
- id: vertex-gemini-pro
provider: google-vertex:chat:gemini-1.5-pro-002
- id: vertex-claude
provider: google-vertex:chat:claude-3-5-sonnet-v2@20241022
- id: vertex-embeddings
provider: google-vertex:embeddings:text-embedding-004

Start the gateway:

kt gateway run --policy-config policy-config.yaml

Provider Fields

FieldTypeRequiredDefaultDescription
providerstring[OK]--Provider ID. Use google-vertex for the base runtime, or google-vertex:chat:<model> / google-vertex:embeddings:<model> to pin the model inline.
gcp_projectstring[OK]--Your GCP project ID (e.g., my-gcp-project).
gcp_regionstringus-central1GCP region where the model is deployed (e.g., us-central1, europe-west4, us-east5).
modelstring--Model ID when not encoded in the provider field.
formatstringgoogle-geminiRequest/response format. Keeptrusts auto-translates OpenAI-format requests to google-gemini format.
base_urlstringautoOverrides the Vertex AI endpoint. Normally derived automatically from gcp_project and gcp_region.
timeout_secsinteger120Request timeout in seconds.
max_retriesinteger2Number of retries on transient errors (429, 503).

Supported Models

Model IDTypeContext WindowNotes
gemini-2.0-flashChat1M tokensDefault; fast and cost-efficient
gemini-2.0-flash-liteChat1M tokensLowest latency in the Gemini family
gemini-1.5-pro-002Chat2M tokensHighest context, multimodal
gemini-1.5-flash-002Chat1M tokensBalanced speed and quality
claude-3-5-sonnet-v2@20241022Chat200K tokensVia Vertex Model Garden; requires additional enablement
text-embedding-004Embeddings2048 tokens input768-dimension output

Model Garden models (Claude, Llama, Mistral) must be enabled individually in the Vertex AI Model Garden console before they can be called.

Client Examples

from openai import OpenAI

# Keeptrusts gateway -- no Vertex credentials needed in the client
client = OpenAI(
base_url="http://localhost:41002/v1",
api_key="any", # gateway handles GCP auth
)

response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the risks of using LLMs in financial advice."},
],
temperature=0.3,
max_tokens=512,
)

print(response.choices[0].message.content)

Streaming

Vertex AI supports server-sent event (SSE) streaming. Keeptrusts passes chunks through after applying streaming-compatible policy checks.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:41002/v1", api_key="any")

with client.chat.completions.stream(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Write a short story about AI governance."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

Advanced Configuration

Workload Identity (GKE)

When running inside GKE with Workload Identity, no key file is needed. Bind a Kubernetes service account to a GCP service account with roles/aiplatform.user, and Keeptrusts will pick up credentials automatically via the metadata server.

pack:
name: google-vertex-ai-providers-2
version: 1.0.0
enabled: true
providers:
targets:
- id: vertex-gemini-flash
provider: google-vertex:chat:gemini-2.0-flash
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Multi-Region Failover

Define multiple targets with different regions and enable routing by priority:

pack:
name: google-vertex-ai-providers-3
version: 1.0.0
enabled: true
providers:
targets:
- id: vertex-us
provider: google-vertex:chat:gemini-2.0-flash
- id: vertex-eu
provider: google-vertex:chat:gemini-2.0-flash
policies:
chain:
- audit-logger
policy:
audit-logger:
immutable: true
retention_days: 365
log_all_access: true

Embeddings

from openai import OpenAI

client = OpenAI(base_url="http://localhost:41002/v1", api_key="any")

result = client.embeddings.create(
model="text-embedding-004",
input="Enterprise AI governance best practices",
)

print(result.data[0].embedding[:5]) # 768-dimension vector

Best Practices

  • Use ADC in development and a dedicated service account with minimum roles/aiplatform.user in production. Avoid broader roles such as roles/editor.
  • Pin model versions using the full suffix (e.g., gemini-1.5-pro-002) rather than alias names to avoid unexpected behavior after Google updates a model alias.
  • Enable VPC Service Controls around the aiplatform.googleapis.com API to restrict Vertex AI access to your corporate network.
  • Set timeout_secs appropriately for long-context requests -- Gemini 1.5 Pro with 2M-token windows can take 30–60 seconds for large inputs.
  • Monitor quota via Cloud Monitoring. Vertex AI imposes per-project QPM limits that vary by model and region; set max_retries to handle transient 429s gracefully.
  • Route Model Garden models (Claude, Llama) through a separate provider target so you can apply stricter policies or different retention rules to non-Google models.

For AI systems

  • Canonical terms: Keeptrusts gateway, Google Vertex AI, Vertex AI, GCP, Model Garden, service account, OAuth2, provider target, policy-config.yaml, provider: "google-vertex-ai".
  • Config field names: provider, model, gcp_project_id, gcp_region, gcp_service_account_key_env, format, provider_type: "google-vertex-ai", pricing.
  • Auth: OAuth2 via GCP service account key (JSON) or Application Default Credentials (ADC).
  • Key behavior: Keeptrusts handles OAuth2 token refresh and Vertex AI endpoint construction for Gemini and Model Garden models.
  • Best next pages: Google AI Studio integration (consumer tier), AWS Bedrock integration, Policy configuration.

For engineers

  • Prerequisites: GCP project with Vertex AI API enabled, service account with aiplatform.endpoints.predict permission, kt CLI installed.
  • Required config: gcp_project_id, gcp_region, and either gcp_service_account_key_env or Application Default Credentials.
  • Start command: kt gateway run --listen 0.0.0.0:41002 --policy-config policy-config.yaml.
  • Monitor GCP per-project QPM quotas via Cloud Monitoring — set max_retries to handle transient 429s.
  • For Model Garden models (Claude, Llama), configure separate provider targets with distinct policies.
  • Validate: curl http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"gemini-2.0-flash","messages":[{"role":"user","content":"hello"}]}'.

For leaders

  • Vertex AI provides enterprise GCP controls: VPC Service Controls, CMEK encryption, IAM-based access, and Cloud Audit Logs.
  • Data residency is configurable per GCP region — traffic stays within your selected region for sovereignty compliance.
  • Model Garden provides access to third-party models (Claude, Llama) under GCP's data handling agreements.
  • GCP quotas (QPM per project/region) require capacity planning; Keeptrusts health probes and fallback routing help maintain availability.

Next steps