Google Vertex AI Integration
Full observability and governance for Google Vertex AI and Gemini models. Track token usage, latency, costs, and ensure quality across your Vertex AI deployments.
📦 turingpulse ≥ 0.5.0✓ Vertex AI SDK ≥ 1.38.0✓ Gemini Pro / Ultra / Flash
Installation
Install TuringPulse with Vertex AI support:
terminal
pip install turingpulse[vertexai]Or install with all integrations:
terminal
pip install turingpulse[all]Quick Start
Instrument Vertex AI with a single function call:
main.py
import vertexai
from vertexai.generative_models import GenerativeModel
from turingpulse import init
from turingpulse.integrations.vertexai import instrument_vertexai
# Initialize TuringPulse
init(
api_key="sk_live_...",
project_id="my-project"
)
# Instrument Vertex AI - this wraps all Gemini calls
instrument_vertexai()
# Initialize Vertex AI
vertexai.init(project="your-gcp-project", location="us-central1")
# Your code works exactly the same - now with full tracing!
model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Explain quantum computing in simple terms")
print(response.text)Configuration Options
Customize the instrumentation behavior:
config.py
from turingpulse.integrations.vertexai import instrument_vertexai
instrument_vertexai(
# Capture full prompts and responses (default: True)
capture_content=True,
# Track token counts and costs (default: True)
capture_tokens=True,
# Add custom metadata to all traces
default_metadata={
"environment": "production",
"team": "ml-platform"
},
# Filter which models to trace (default: all)
model_filter=lambda model: "gemini" in model.lower(),
)Supported Features
Models
- Gemini 1.5 Pro
- Gemini 1.5 Flash
- Gemini 1.0 Pro
- Gemini Pro Vision
- PaLM 2 (text-bison)
- Codey (code-bison)
Capabilities
- Text generation
- Multi-modal (images)
- Streaming responses
- Function calling
- Chat sessions
- Embeddings
Tracked Metrics
- Token usage (in/out)
- Latency (TTFB, total)
- Cost estimation
- Safety ratings
- Finish reasons
- Error rates
Streaming Support
TuringPulse automatically tracks streaming responses:
streaming.py
from vertexai.generative_models import GenerativeModel
model = GenerativeModel("gemini-1.5-flash")
# Streaming is automatically instrumented
response = model.generate_content(
"Write a haiku about AI",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
# Full trace is captured including:
# - Time to first token
# - Total streaming duration
# - Complete token countsMulti-modal Tracing
Track vision model interactions with images:
vision.py
from vertexai.generative_models import GenerativeModel, Part
model = GenerativeModel("gemini-1.5-pro")
# Image input is tracked (image content stored as reference)
response = model.generate_content([
Part.from_uri("gs://your-bucket/image.jpg", mime_type="image/jpeg"),
"Describe what you see in this image"
])
# Trace captures:
# - Image URI reference
# - Text prompt
# - Model response
# - Multi-modal token countsFunction Calling
Instrument function calls and tool use:
functions.py
from vertexai.generative_models import GenerativeModel, FunctionDeclaration, Tool
# Define your functions
get_weather = FunctionDeclaration(
name="get_weather",
description="Get the current weather for a location",
parameters={
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
)
weather_tool = Tool(function_declarations=[get_weather])
model = GenerativeModel("gemini-1.5-pro", tools=[weather_tool])
# Function calls are automatically traced
response = model.generate_content("What's the weather in San Francisco?")
# Trace captures:
# - Function call request
# - Function arguments
# - Function response (when provided)Chat Sessions
Track multi-turn conversations:
chat.py
from vertexai.generative_models import GenerativeModel
model = GenerativeModel("gemini-1.5-pro")
chat = model.start_chat()
# Each message in the conversation is traced
response1 = chat.send_message("Hi! I'm building an AI agent.")
response2 = chat.send_message("What frameworks should I consider?")
response3 = chat.send_message("Tell me more about LangGraph")
# Full conversation history is tracked with:
# - Session ID linking all messages
# - Turn-by-turn latency and tokens
# - Cumulative conversation metrics💡
Cost Tracking
TuringPulse automatically estimates costs based on Vertex AI pricing. View cost breakdowns by model, project, or workflow in the dashboard.
Error Handling
Errors and exceptions are automatically captured:
errors.py
from vertexai.generative_models import GenerativeModel
from google.api_core.exceptions import ResourceExhausted
model = GenerativeModel("gemini-1.5-pro")
try:
response = model.generate_content("...")
except ResourceExhausted as e:
# Error is automatically captured in trace with:
# - Error type and message
# - Stack trace
# - Request that caused the error
raiseEnvironment Variables
Configure via environment variables:
.env
# TuringPulse configuration
export TURINGPULSE_API_KEY="sk_live_..."
export TURINGPULSE_PROJECT_ID="my-project"
# Vertex AI configuration
export GOOGLE_CLOUD_PROJECT="your-gcp-project"
export GOOGLE_CLOUD_REGION="us-central1"Next Steps
- Quickstart Guide - Get up and running in 5 minutes
- Python SDK Reference - Full API documentation
- Runs & Traces - Explore your traces
- KPI Thresholds - Set up alerts for latency and costs
- Governance - Configure human oversight