KPI Rules
Define threshold rules to monitor key performance indicators and get alerted when they breach.
What are KPI Rules?
KPI (Key Performance Indicator) rules let you define thresholds for important metrics and automatically create alerts or incidents when those thresholds are breached.
Built-in Metrics
- Latency - End-to-end response time (p50, p95, p99, avg)
- Token Usage - Input/output tokens per run
- Cost - Estimated cost per run or total
- Error Rate - Percentage of failed runs
- Throughput - Runs per minute/hour
Custom Metrics
You can also track custom metrics defined in your code via the SDK.
Creating KPI Rules via UI
Step 1: Navigate to Thresholds
Go to Controls → Thresholds in the sidebar.
Step 2: Create New Rule
Click Create Rule and fill in:
- Name - Descriptive name (e.g., "High Latency Alert")
- Workflow - Select specific workflow or "All Workflows"
- Metric - Choose from dropdown or enter custom metric ID
- Aggregation - avg, sum, p50, p95, p99, max, min
- Comparator - Greater than, Less than, etc.
- Threshold - Numeric value
- Window - Time window for aggregation (5m, 15m, 1h, etc.)
Step 3: Configure Alerts
- Severity - Warning or Critical
- Auto-create Incident - Toggle to create incidents automatically
- Alert Channels - Select notification channels
Step 4: Enable and Save
Toggle the rule to enabled and click Save.
Creating KPI Rules via SDK
Inline with @instrument
agent.py
from turingpulse import instrument, KPIConfig
@instrument(
agent_id="customer-support",
kpis=[
# Alert if latency > 5 seconds
KPIConfig(
kpi_id="latency_ms",
use_duration=True,
alert_threshold=5000,
comparator="gt",
severity="warning",
),
# Alert if cost > $1 per run
KPIConfig(
kpi_id="cost_usd",
value=lambda ctx: ctx.metadata.get("total_cost", 0),
alert_threshold=1.0,
comparator="gt",
severity="critical",
auto_create_incident=True,
),
# Alert if token count > 4000
KPIConfig(
kpi_id="total_tokens",
value=lambda ctx: (
ctx.metadata.get("input_tokens", 0) +
ctx.metadata.get("output_tokens", 0)
),
alert_threshold=4000,
comparator="gt",
),
]
)
def handle_query(query: str):
return agent.run(query)Via API
create_rule.py
import requests
response = requests.post(
"https://api.turingpulse.ai/v1/kpi-rules",
headers={"Authorization": "Bearer sk_live_..."},
json={
"name": "High Latency Alert",
"workflow_id": "customer-support", # or "*" for all
"metric": "latency_ms",
"aggregation": "p95",
"comparator": "gt",
"threshold": 5000,
"window": "5m",
"severity": "warning",
"auto_create_incident": False,
"alert_channels": ["email:ops@company.com"],
"enabled": True,
}
)KPI Configuration Reference
KPIConfig Options
| Option | Type | Description |
|---|---|---|
kpi_id | str | Unique identifier for the KPI |
use_duration | bool | Use execution duration as the value |
value | callable | Function to extract value from context |
alert_threshold | float | Threshold value for alerting |
comparator | str | gt, lt, gte, lte, eq |
severity | str | warning, critical |
auto_create_incident | bool | Create incident on breach |
Comparators
| Comparator | Description | Example |
|---|---|---|
gt | Greater than | Alert if latency > 5000ms |
lt | Less than | Alert if accuracy < 0.8 |
gte | Greater than or equal | Alert if errors >= 10 |
lte | Less than or equal | Alert if throughput <= 5/min |
eq | Equal to | Alert if status == "failed" |
Viewing KPI Alerts
When a KPI threshold is breached:
- An alert is created and visible in Operations → Overview → KPIs tab
- Notifications are sent to configured alert channels
- If
auto_create_incidentis enabled, an incident is created
ℹ️
Alert Deduplication
Alerts are deduplicated within the configured window. Multiple breaches within the same window won't create duplicate alerts.
Best Practices
- Start with warnings - Use warning severity first, then escalate to critical once you understand normal behavior.
- Use appropriate windows - Short windows (5m) for real-time alerts, longer windows (1h) for trend-based alerts.
- Set realistic thresholds - Base thresholds on historical data, not arbitrary values.
- Group related KPIs - Create rules for related metrics together (e.g., latency + error rate).