Google Vertex AI Integration

Full observability and governance for Google Vertex AI and Gemini models. Track token usage, latency, costs, and ensure quality across your Vertex AI deployments.

📦 turingpulse ≥ 0.5.0✓ Vertex AI SDK ≥ 1.38.0✓ Gemini Pro / Ultra / Flash

Installation

Install TuringPulse with Vertex AI support:

pip install turingpulse[vertexai]

Or install with all integrations:

pip install turingpulse[all]

Quick Start

Instrument Vertex AI with a single function call:

import vertexai
from vertexai.generative_models import GenerativeModel
from turingpulse import init
from turingpulse.integrations.vertexai import instrument_vertexai

# Initialize TuringPulse
init(
    api_key="sk_live_...",
    project_id="my-project"
)

# Instrument Vertex AI - this wraps all Gemini calls
instrument_vertexai()

# Initialize Vertex AI
vertexai.init(project="your-gcp-project", location="us-central1")

# Your code works exactly the same - now with full tracing!
model = GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Explain quantum computing in simple terms")
print(response.text)

Configuration Options

Customize the instrumentation behavior:

from turingpulse.integrations.vertexai import instrument_vertexai

instrument_vertexai(
    # Capture full prompts and responses (default: True)
    capture_content=True,
    
    # Track token counts and costs (default: True)
    capture_tokens=True,
    
    # Add custom metadata to all traces
    default_metadata={
        "environment": "production",
        "team": "ml-platform"
    },
    
    # Filter which models to trace (default: all)
    model_filter=lambda model: "gemini" in model.lower(),
)

Supported Features

Models

Gemini 1.5 Pro
Gemini 1.5 Flash
Gemini 1.0 Pro
Gemini Pro Vision
PaLM 2 (text-bison)
Codey (code-bison)

Capabilities

Text generation
Multi-modal (images)
Streaming responses
Function calling
Chat sessions
Embeddings

Tracked Metrics

Token usage (in/out)
Latency (TTFB, total)
Cost estimation
Safety ratings
Finish reasons
Error rates

Streaming Support

TuringPulse automatically tracks streaming responses:

from vertexai.generative_models import GenerativeModel

model = GenerativeModel("gemini-1.5-flash")

# Streaming is automatically instrumented
response = model.generate_content(
    "Write a haiku about AI",
    stream=True
)

for chunk in response:
    print(chunk.text, end="", flush=True)

# Full trace is captured including:
# - Time to first token
# - Total streaming duration
# - Complete token counts

Multi-modal Tracing

Track vision model interactions with images:

from vertexai.generative_models import GenerativeModel, Part

model = GenerativeModel("gemini-1.5-pro")

# Image input is tracked (image content stored as reference)
response = model.generate_content([
    Part.from_uri("gs://your-bucket/image.jpg", mime_type="image/jpeg"),
    "Describe what you see in this image"
])

# Trace captures:
# - Image URI reference
# - Text prompt
# - Model response
# - Multi-modal token counts

Function Calling

Instrument function calls and tool use:

from vertexai.generative_models import GenerativeModel, FunctionDeclaration, Tool

# Define your functions
get_weather = FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a location",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
    }
)

weather_tool = Tool(function_declarations=[get_weather])
model = GenerativeModel("gemini-1.5-pro", tools=[weather_tool])

# Function calls are automatically traced
response = model.generate_content("What's the weather in San Francisco?")

# Trace captures:
# - Function call request
# - Function arguments
# - Function response (when provided)

Chat Sessions

Track multi-turn conversations:

from vertexai.generative_models import GenerativeModel

model = GenerativeModel("gemini-1.5-pro")
chat = model.start_chat()

# Each message in the conversation is traced
response1 = chat.send_message("Hi! I'm building an AI agent.")
response2 = chat.send_message("What frameworks should I consider?")
response3 = chat.send_message("Tell me more about LangGraph")

# Full conversation history is tracked with:
# - Session ID linking all messages
# - Turn-by-turn latency and tokens
# - Cumulative conversation metrics

💡

Cost Tracking

TuringPulse automatically estimates costs based on Vertex AI pricing. View cost breakdowns by model, project, or workflow in the dashboard.

Error Handling

Errors and exceptions are automatically captured:

from vertexai.generative_models import GenerativeModel
from google.api_core.exceptions import ResourceExhausted

model = GenerativeModel("gemini-1.5-pro")

try:
    response = model.generate_content("...")
except ResourceExhausted as e:
    # Error is automatically captured in trace with:
    # - Error type and message
    # - Stack trace
    # - Request that caused the error
    raise

Environment Variables

Configure via environment variables:

# TuringPulse configuration
export TURINGPULSE_API_KEY="sk_live_..."
export TURINGPULSE_PROJECT_ID="my-project"

# Vertex AI configuration  
export GOOGLE_CLOUD_PROJECT="your-gcp-project"
export GOOGLE_CLOUD_REGION="us-central1"

Next Steps

Quickstart Guide - Get up and running in 5 minutes
Python SDK Reference - Full API documentation
Runs & Traces - Explore your traces
KPI Thresholds - Set up alerts for latency and costs
Governance - Configure human oversight