Documentation Index
Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt
Use this file to discover all available pages before exploring further.
oleander is built in three layers: a telemetry ingestion layer that collects OpenLineage and OpenTelemetry events from your stack, an Iceberg catalog that stores raw telemetry and user data in open formats, and a context graph that unifies runs, datasets, costs, and lineage into a single versioned, queryable object model.
Serverless Spark and Tasks run on oleander’s managed infrastructure isolated, fully provisioned, no clusters to configure. Write to the Iceberg catalog from a Spark job, run a backfill script with Tasks, trigger a pipeline on a webhook. Every read and write captures lineage automatically, so the context graph stays current without any instrumentation.
Telemetry
oleander collects telemetry via OpenLineage and OpenTelemetry the two open standards already running in your stack. No agents are deployed in your environment. No direct access to your infrastructure is required.
Point your existing tools at oleander’s ingest endpoint and telemetry starts flowing immediately. Every event is written to the oleander.telemetry namespace in the Iceberg catalog within seconds of emission.
| Standard | What it captures |
|---|
OpenLineage | Job runs, dataset inputs and outputs, lineage edges, schema versions |
OpenTelemetry | Spans, traces, logs, and resource attributes from pipeline execution |
Catalog
Every oleander organization is provisioned a private Apache Iceberg REST catalog named oleander, backed by Lakekeeper and stored on S3. It exposes a standard Iceberg REST endpoint, making it natively compatible with Spark, DuckDB, and any Iceberg-aware tool.
The catalog contains two namespaces out of the box:
| Namespace | Purpose |
|---|
oleander.telemetry | Platform events written automatically run_events, traces, logs |
oleander.default | Your own tables, created via SQL, Parquet upload, or S3 sync |
You can also register an external AWS S3 Tables catalog and query it alongside your own data.
SELECT * FROM oleander.telemetry.run_events
WHERE event_type = 'FAIL'
AND event_time > now() - interval '24 hours'
ORDER BY event_time DESC;
Graph
oleander reads from the catalog and builds a versioned context graph: a unified object model where every run, dataset, schema, query, cost, and lineage edge is a first-class, interconnected node.
The graph is updated automatically after every execution. You can query it as it exists now or as it existed at any point in time, diff any two runs, and trace exactly what changed and when.
| Object | What it contains |
|---|
| Runs | Every DAG run, Spark job, dbt model: duration, status, logs, lineage |
| Datasets | Tables and schemas versioned at every write, with upstream and downstream lineage |
| Costs | Every warehouse query attributed to the workflow and dataset that drove it |
| Changes | Schema diffs and PR-level impact before they cascade downstream |
Compute
oleander provides two managed compute surfaces that read and write from the Iceberg catalog with automatic lineage. Every read and write is captured in the context graph without any instrumentation required.
Serverless Spark
Submit PySpark applications to fully managed infrastructure. No clusters to provision, no configuration required. Spark jobs can read from and write to both oleander.default and external Iceberg catalogs. Lineage and cost flow into the context graph from the first run.
oleander spark submit my_job.py
Tasks
Run arbitrary Python scripts in oleander’s managed environment. Tasks can connect to the Iceberg catalog, external data sources, and trigger via webhooks or on demand. Every read and write to the catalog generates automatic lineage.
oleander tasks run my_script.py
| Status | Description |
|---|
QUEUED | Waiting to execute |
WORKING | Currently running |
COMPLETED | Finished successfully |
FAILED | Encountered an error or non-zero exit code |
CANCELED | Manually canceled |
Query
Engineers and AI agents query the same context graph and catalog from any surface. The query layer exposes four access patterns, all backed by the same underlying graph and catalog.
| Surface | Use case |
|---|
| MCP server | AI agents (Cursor, Copilot, Claude, BYO) query the context graph directly |
| API | Programmatic access to the graph, catalog, and SQL proxy |
| CLI | Terminal-based queries, Spark submission, lake management |
| SDK | Custom integrations and automated workflows |
The SQL proxy lets you query BigQuery, Snowflake, and other warehouses directly via DuckDB, without leaving oleander. Queries are attributed to the workflow that triggered them and appear in the context graph.