> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Alerts

> Automatic failure detection and metric anomaly detection for your pipelines.

oleander monitors every pipeline run and fires alerts when a job fails or when a pipeline metric deviates significantly from its historical baseline. Alerts appear in the oleander UI, trigger Slack notifications, and can be forwarded via [webhooks](/features/webhooks).

## Failure alerts

When a run transitions to `FAIL` or `ABORT`, oleander opens a `CRITICAL` failure alert on the corresponding pipeline. Each alert carries context about how the failure rate has changed relative to recent history:

* **Error rate** -- the percentage of all runs for that pipeline that have failed
* **Failure increase factor** -- the ratio of failures in the most recent 20 runs compared to the 20 runs before that

If a pipeline that usually runs cleanly suddenly starts failing, the increase factor surfaces that signal immediately rather than burying it in a long-run error rate.

## Anomaly alerts

oleander tracks the following metrics per pipeline run and detects anomalies using a windowed statistical model:

| Metric          | Description                            |
| --------------- | -------------------------------------- |
| `bytes_read`    | Total bytes read across all inputs     |
| `rows_read`     | Total rows read across all inputs      |
| `bytes_written` | Total bytes written across all outputs |
| `rows_written`  | Total rows written across all outputs  |
| `duration_avg`  | Average job duration in seconds        |

When a metric falls outside the expected range for a run, oleander creates an anomaly alert with:

* The observed value and the expected value
* Upper and lower bounds from the detection window
* An anomaly score
* Severity: `INFO`, `WARN`, or `CRITICAL`

Multiple anomalies in a single run are grouped into one alert. The alert severity is promoted to the highest severity across all anomalies in that run.

## Alert lifecycle

| Status     | Meaning                                 |
| ---------- | --------------------------------------- |
| `NEW`      | Alert just opened, not yet acknowledged |
| `ACTIVE`   | Alert is ongoing                        |
| `RESOLVED` | Manually resolved by a team member      |

Alerts for the same job and run type are deduplicated. Repeated failures on the same run update `last_detected_at` and re-evaluate severity rather than creating duplicate alerts.

## Notifications

When an alert opens, oleander sends a Slack notification (if connected) and creates a notification entry visible in the platform. Webhooks configured in [Settings](/features/webhooks) receive all alert events as OpenLineage-formatted payloads.

To receive alerts for a pipeline, add it to your watch list in [Settings](https://oleander.dev/app/settings/watchers).

## AI investigations

For every failure alert, oleander can run an automated investigation. The AI correlates:

* Logs, OTel spans, and raw telemetry from the failed run
* Lineage events and dataset I/O from the run
* Iceberg commit and scan metrics
* Spark logical plan diffs against the previous successful run

The investigation produces a root cause summary and a remediation suggestion. Trigger an investigation from the alert detail view or ask Lea directly in the chat interface.
