Skip to main content
oleander monitors every pipeline run and fires alerts when a job fails or when a pipeline metric deviates significantly from its historical baseline. Alerts appear in the oleander UI, trigger Slack notifications, and can be forwarded via webhooks.

Failure alerts

When a run transitions to FAIL or ABORT, oleander opens a CRITICAL failure alert on the corresponding pipeline. Each alert carries context about how the failure rate has changed relative to recent history:
  • Error rate — the percentage of all runs for that pipeline that have failed
  • Failure increase factor — the ratio of failures in the most recent 20 runs compared to the 20 runs before that
If a pipeline that usually runs cleanly suddenly starts failing, the increase factor surfaces that signal immediately rather than burying it in a long-run error rate.

Anomaly alerts

oleander tracks the following metrics per pipeline run and detects anomalies using a windowed statistical model:
MetricDescription
bytes_readTotal bytes read across all inputs
rows_readTotal rows read across all inputs
bytes_writtenTotal bytes written across all outputs
rows_writtenTotal rows written across all outputs
duration_avgAverage job duration in seconds
When a metric falls outside the expected range for a run, oleander creates an anomaly alert with:
  • The observed value and the expected value
  • Upper and lower bounds from the detection window
  • An anomaly score
  • Severity: INFO, WARN, or CRITICAL
Multiple anomalies in a single run are grouped into one alert. The alert severity is promoted to the highest severity across all anomalies in that run.

Alert lifecycle

StatusMeaning
NEWAlert just opened, not yet acknowledged
ACTIVEAlert is ongoing
RESOLVEDManually resolved by a team member
Alerts for the same job and run type are deduplicated. Repeated failures on the same run update last_detected_at and re-evaluate severity rather than creating duplicate alerts.

Notifications

When an alert opens, oleander sends a Slack notification (if connected) and creates a notification entry visible in the platform. Webhooks configured in Settings receive all alert events as OpenLineage-formatted payloads. To receive alerts for a pipeline, add it to your watch list in Settings.

AI investigations

For every failure alert, oleander can run an automated investigation. The AI correlates:
  • Logs, OTel spans, and raw telemetry from the failed run
  • Lineage events and dataset I/O from the run
  • Iceberg commit and scan metrics
  • Spark logical plan diffs against the previous successful run
The investigation produces a root cause summary and a remediation suggestion. Trigger an investigation from the alert detail view or ask Lea directly in the chat interface.