Architecture - oleander

oleander is built in three layers: a telemetry ingestion layer that collects OpenLineage and OpenTelemetry events from your stack, a managed lake that stores telemetry and user data in open formats, and a context graph that unifies runs, datasets, costs, and lineage into a single versioned, queryable object model with built-in time travel and a knowledge layer that learns from your data team. Serverless Spark runs on oleander’s managed infrastructure, isolated, fully provisioned, no clusters to configure. Write to the Iceberg catalog from a Spark job or trigger a job on a dataset refresh, a partition write, or an upstream change via a webhook. Every read and write captures lineage automatically in real time, so agents always have the full picture when investigating failures, optimizing pipelines, or writing code.

Telemetry

oleander collects telemetry via OpenLineage and OpenTelemetry, the two open standards already running in your stack. No agents are deployed in your environment. No direct access to your infrastructure is required. Point your existing data tools at oleander’s ingest endpoint to start collecting telemetry. Every event, log, and trace is written to oleander.telemetry.

Standard	What it captures
`OpenLineage`	Job runs, dataset inputs and outputs, lineage edges, schema versions
`OpenTelemetry`	Spans, traces, logs, and resource attributes from pipeline execution

Catalog

Every oleander organization is provisioned a fully managed Iceberg catalog named oleander.default. It exposes a standard Iceberg REST endpoint, making it natively compatible with Spark, DuckDB, and any Iceberg-aware tool. The catalog contains two namespaces out of the box, or you can register your own:

Namespace	Purpose
`oleander.telemetry`	Telemetry data written automatically: `run_events`, `traces`, `logs`
`oleander.default`	Your data, queryable and writable as Iceberg
`<your catalog>`	Register an external catalog (AWS S3 Tables or Snowflake Horizon) and query it alongside your data. For more details, see Catalogs.

For example, register a Snowflake Horizon catalog and join pipeline failures against your customers table to understand which customers are impacted during a critical outage to offer credits..

SELECT
  r.job_name,
  r.event_time,
  r.error_message,
  c.customer_name,
  c.tier,
  c.sla_threshold_minutes
FROM oleander.telemetry.run_events AS r
JOIN snowflake.my_database.customers AS c
  ON r.job_name = c.pipeline_name
WHERE r.event_type = 'FAIL'
  AND r.event_time > NOW() - INTERVAL '24 hours'
  AND c.tier = 'enterprise'
ORDER BY r.event_time DESC;

Graph

oleander reads from the catalog and builds a versioned context graph: a unified object model where every run, dataset, schema, query, cost, and lineage edge is a first-class, interconnected node. The graph is updated automatically after every execution. You can query it as it exists now or as it existed at any point in time, diff any two runs, and trace exactly what changed and when.

Object	What it contains
Runs	Every DAG run, Spark job, dbt model: `duration`, `status`, `logs`, `lineage`
Datasets	Tables and schemas versioned at every write, with upstream and downstream lineage
Costs	Every warehouse query attributed to the workflow and dataset
Changes	Schema diffs and PR-level impact before they cascade downstream

Knowledge

Every node in the context graph is versioned at the time it is written. The graph maintains full history: query it as it exists now or as it existed at any point in time, diff any two runs, and trace exactly what changed and when. The graph learns from two sources: telemetry captured automatically from every run, cost, and schema change, and annotations your team adds during investigations to record context that never appears in a log. Any node in the graph can be annotated. Annotations are attached to the node as memories and surfaced automatically the next time oleander, an agent, or an engineer investigates that pipeline, dataset, or cost.

Compute

oleander provides a managed compute environment that reads from and writes to the Iceberg catalog. Lineage and cost are attributed automatically on every read and write, with no configuration required.

Serverless Spark

Submit PySpark applications to fully managed infrastructure. No clusters to provision, no configuration required. Spark jobs read from and write to both oleander.default and external Iceberg catalogs. Lineage and cost flow into the context graph from the first run.

oleander spark submit my_job.py

Bring your own Spark cluster. Connect Databricks, EMR, or Dataproc to oleander’s Iceberg catalog endpoint and telemetry, Spark execution plans, and lineage are captured in the context graph automatically.

Query

Engineers and agents query the same context graph and catalog from wherever they work.

Interface	What it does
MCP server	Coding agents (Cursor, Copilot, Claude, BYO) query the context graph directly
API	Programmatic access to the graph, catalog, and SQL proxy
CLI	Terminal-based queries, Spark submission, and lake management
SDK	Custom integrations and automated workflows
SQL	Query the telemetry lake and user tables directly

The SQL proxy lets you query BigQuery, Snowflake, and other warehouses directly. Queries are attributed to the workflow that triggered them and appear in the context graph.

​Telemetry

​Catalog

​Graph

​Knowledge

​Compute

​Serverless Spark

​Query