Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt

Use this file to discover all available pages before exploring further.

oleander is built in three layers: a telemetry ingestion layer that collects OpenLineage and OpenTelemetry events from your stack, an Iceberg catalog that stores raw telemetry and user data in open formats, and a context graph that unifies runs, datasets, costs, and lineage into a single versioned, queryable object model. Serverless Spark and Tasks run on oleander’s managed infrastructure isolated, fully provisioned, no clusters to configure. Write to the Iceberg catalog from a Spark job, run a backfill script with Tasks, trigger a pipeline on a webhook. Every read and write captures lineage automatically, so the context graph stays current without any instrumentation.

Telemetry

oleander collects telemetry via OpenLineage and OpenTelemetry the two open standards already running in your stack. No agents are deployed in your environment. No direct access to your infrastructure is required. Point your existing tools at oleander’s ingest endpoint and telemetry starts flowing immediately. Every event is written to the oleander.telemetry namespace in the Iceberg catalog within seconds of emission.
StandardWhat it captures
OpenLineageJob runs, dataset inputs and outputs, lineage edges, schema versions
OpenTelemetrySpans, traces, logs, and resource attributes from pipeline execution

Catalog

Every oleander organization is provisioned a private Apache Iceberg REST catalog named oleander, backed by Lakekeeper and stored on S3. It exposes a standard Iceberg REST endpoint, making it natively compatible with Spark, DuckDB, and any Iceberg-aware tool. The catalog contains two namespaces out of the box:
NamespacePurpose
oleander.telemetryPlatform events written automatically run_events, traces, logs
oleander.defaultYour own tables, created via SQL, Parquet upload, or S3 sync
You can also register an external AWS S3 Tables catalog and query it alongside your own data.
SELECT * FROM oleander.telemetry.run_events
WHERE event_type = 'FAIL'
  AND event_time > now() - interval '24 hours'
ORDER BY event_time DESC;

Graph

oleander reads from the catalog and builds a versioned context graph: a unified object model where every run, dataset, schema, query, cost, and lineage edge is a first-class, interconnected node. The graph is updated automatically after every execution. You can query it as it exists now or as it existed at any point in time, diff any two runs, and trace exactly what changed and when.
ObjectWhat it contains
RunsEvery DAG run, Spark job, dbt model: duration, status, logs, lineage
DatasetsTables and schemas versioned at every write, with upstream and downstream lineage
CostsEvery warehouse query attributed to the workflow and dataset that drove it
ChangesSchema diffs and PR-level impact before they cascade downstream

Compute

oleander provides two managed compute surfaces that read and write from the Iceberg catalog with automatic lineage. Every read and write is captured in the context graph without any instrumentation required.

Serverless Spark

Submit PySpark applications to fully managed infrastructure. No clusters to provision, no configuration required. Spark jobs can read from and write to both oleander.default and external Iceberg catalogs. Lineage and cost flow into the context graph from the first run.
oleander spark submit my_job.py

Tasks

Run arbitrary Python scripts in oleander’s managed environment. Tasks can connect to the Iceberg catalog, external data sources, and trigger via webhooks or on demand. Every read and write to the catalog generates automatic lineage.
oleander tasks run my_script.py
StatusDescription
QUEUEDWaiting to execute
WORKINGCurrently running
COMPLETEDFinished successfully
FAILEDEncountered an error or non-zero exit code
CANCELEDManually canceled

Query

Engineers and AI agents query the same context graph and catalog from any surface. The query layer exposes four access patterns, all backed by the same underlying graph and catalog.
SurfaceUse case
MCP serverAI agents (Cursor, Copilot, Claude, BYO) query the context graph directly
APIProgrammatic access to the graph, catalog, and SQL proxy
CLITerminal-based queries, Spark submission, lake management
SDKCustom integrations and automated workflows
The SQL proxy lets you query BigQuery, Snowflake, and other warehouses directly via DuckDB, without leaving oleander. Queries are attributed to the workflow that triggered them and appear in the context graph.