Alerts - oleander

oleander monitors every pipeline run and fires alerts when a job fails or when a pipeline metric deviates significantly from its historical baseline. Alerts appear in the oleander UI, trigger Slack notifications, and can be forwarded via webhooks.

Failure alerts

When a run transitions to FAIL or ABORT, oleander opens a CRITICAL failure alert on the corresponding pipeline. Each alert carries context about how the failure rate has changed relative to recent history:

Error rate — the percentage of all runs for that pipeline that have failed
Failure increase factor — the ratio of failures in the most recent 20 runs compared to the 20 runs before that

If a pipeline that usually runs cleanly suddenly starts failing, the increase factor surfaces that signal immediately rather than burying it in a long-run error rate.

Anomaly alerts

oleander tracks the following metrics per pipeline run and detects anomalies using a windowed statistical model:

Metric	Description
`bytes_read`	Total bytes read across all inputs
`rows_read`	Total rows read across all inputs
`bytes_written`	Total bytes written across all outputs
`rows_written`	Total rows written across all outputs
`duration_avg`	Average job duration in seconds

When a metric falls outside the expected range for a run, oleander creates an anomaly alert with:

The observed value and the expected value
Upper and lower bounds from the detection window
An anomaly score
Severity: INFO, WARN, or CRITICAL

Multiple anomalies in a single run are grouped into one alert. The alert severity is promoted to the highest severity across all anomalies in that run.

Alert lifecycle

Status	Meaning
`NEW`	Alert just opened, not yet acknowledged
`ACTIVE`	Alert is ongoing
`RESOLVED`	Manually resolved by a team member

Alerts for the same job and run type are deduplicated. Repeated failures on the same run update last_detected_at and re-evaluate severity rather than creating duplicate alerts.

Notifications

When an alert opens, oleander sends a Slack notification (if connected) and creates a notification entry visible in the platform. Webhooks configured in Settings receive all alert events as OpenLineage-formatted payloads. To receive alerts for a pipeline, add it to your watch list in Settings.

AI investigations

For every failure alert, oleander can run an automated investigation. The AI correlates:

Logs, OTel spans, and raw telemetry from the failed run
Lineage events and dataset I/O from the run
Iceberg commit and scan metrics
Spark logical plan diffs against the previous successful run

The investigation produces a root cause summary and a remediation suggestion. Trigger an investigation from the alert detail view or ask Lea directly in the chat interface.

​Failure alerts

​Anomaly alerts

​Alert lifecycle

​Notifications

​AI investigations

Failure alerts

Anomaly alerts

Alert lifecycle

Notifications

AI investigations