Lake

Overview

oleander’s lake provides a powerful environment for data analysis, collaboration, and discovery. Execute SQL queries, manage your tables, and contribute to the community with automatic lineage tracking and observability built-in. We use DuckLake as our open table format and DuckDB as the query engine. You can also bring your own compute to connect to your lake from any DuckDB terminal, PySpark application, or client library, or bring your own (Iceberg) catalog to use your existing Iceberg catalog within the oleander lake.

Portable

The lake is portable and can be used with any DuckDB environment. This means you can run queries, import and export data, and leverage all the features of oleander’s lake wherever DuckDB is supported—including on your local machine, in cloud environments, or as part of larger data workflows.

For full lineage coverage, always use the oleander lake API or the oleander SAAS (not a direct DuckDB connection) for queries. API usage ensures all operations are tracked and observable through the lake interface.Direct DuckDB connections are supported for power users; however, operations performed outside the API will not be traceable in lineage or workflows.

Automatic lineage

Every operation in oleander automatically captures lineage metadata using the OpenLineage specification. No manual configuration id required.

Serverless SQL execution

Run SQL queries directly in oleander’s lake environment. Whether you’re analyzing your own data or exploring public tables, the SQL interface provides:

Fast Query Execution - Execute queries with optimized performance
Interactive Results - View and export query results instantly
Query History - Access your past queries and results
Automatic Lineage - Every query automatically captures lineage metadata

Sync your data

There are several ways to sync your own data into oleander’s lake:

Managed upload

Use the dialog to sync parquet files directly into your lake

DuckDB import operations

Connect to your lake and run any DuckDB import operation to load data from various sources

-- Insert flower data from a garden API
INSERT INTO flowers
SELECT
  flower_data->>'name' AS name,
  flower_data->>'species' AS species,
  flower_data->>'color' AS color,
  (flower_data->>'evergreen')::BOOLEAN AS evergreen,
  (flower_data->>'petals')::INTEGER AS petals,
  flower_data->>'notes' AS notes
FROM read_json('https://api.garden.example.com/flowers', format='newline_delimited');

CREATE TABLE statements

Run CREATE TABLE statements to define and populate tables with your data

CREATE TABLE flowers AS
SELECT
  'oleander' AS name,
  'nerium' AS species,
  'pink' AS color,
  TRUE AS evergreen,
  5 AS petals,
  'toxic but beautiful' AS notes;

API-based queries

Use API-based queries to pull data from external sources directly into your lake (see example below)

S3 scheduled sync

Configure a task to import parquet files to your lake.

Access public tables

Explore and utilize public tables contributed by the oleander community. These tables are:

Curated - oleander staff ensures high quality datasets to work with
Indexed - Find interesting datasets contributed by other companies & individuals
Joinable - Easily join with your own data

Queries can be run on any url as well like below:

-- query data from any api
WITH
  src AS (
    SELECT
      *
    FROM
      read_json (
        'https://api.tvmaze.com/search/shows?q=pushing+daisies',
        auto_detect = TRUE
      )
  )
SELECT
  src -> 'show' ->> 'name' AS show_name,
  src -> 'show' ->> 'premiered' AS premiered,
  src -> 'show' -> 'rating' ->> 'average' AS rating,
  src -> 'show' -> 'network' ->> 'name' AS network_name,
  src -> 'show' -> 'image' ->> 'medium' AS image_medium
FROM
  src;

Scheduled queries with time-travel

Schedule queries to run automatically at specified intervals with time-travel capabilities. Time-travel queries allow you to write queries that reference specific versions or timestamps of your data, ensuring reproducibility and enabling historical analysis without needing to maintain separate historical copies of your data. Please note that these do incur additional storage as each version is kept to preserve time-traveling capabilities.

SELECT * 
FROM sch_nyc_taxi_daily 
AT (TIMESTAMP => now() - INTERVAL '24 HOURS');

Bring your own (Iceberg) catalog

We support adding Iceberg via S3 table buckets. This lets you use your existing Iceberg catalog within the oleander lake and consume the oleander compute layer.

This allows the oleander aws account to generate temporary credentials to access your catalog.

Create a role that has access to S3Tables. You can scope this down to specific tables.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3tables:*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "PassRoleToS3TablesReplication",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": [
                        "replication.s3tables.amazonaws.com"
                    ]
                }
            }
        }
    ]
}

Configure the trust relationship for an your IAM role with the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::579897423473:role/oleander-job-execution-role-dev"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Navigate to the lake settings and your aws account ID, role ARN, and bucket name. Next your catalog should be available listed on the lake page.

Bring your own compute

Connect to your oleander lake from any DuckDB terminal, PySpark application, or client library. You can easily connect to your lake in our settings here.

DuckDB terminal

Connect directly from the DuckDB CLI:

-- Set S3 credentials
SET s3_access_key_id='YOUR_ACCESS_KEY_ID';
SET s3_secret_access_key='YOUR_SECRET_ACCESS_KEY';
SET s3_region='YOUR_REGION';

-- Attach DuckLake catalog
ATTACH IF NOT EXISTS 'ducklake:postgres:dbname=YOUR_DB_NAME host=YOUR_HOST port=5432 user=YOUR_USER password=YOUR_PASSWORD' AS oleander
    (DATA_PATH 's3://oleander-lake/orgs/YOUR_ORG_ID/', CREATE_IF_NOT_EXISTS true);

-- Use the catalog
USE oleander;

-- Query your data
SELECT * FROM flowers LIMIT 10;

Once connected, you can query your lake data directly from your local environment, enabling powerful integrations with your existing data workflows and tools.

Client libraries

You can also connect using any DuckDb library, including the TypeScript one, like below:

import duckdb from "duckdb";

// Create or open a local DuckDB database (in-memory or file)
const db = new duckdb.Database(":memory:");

// Create a connection
const conn = db.connect();

// 1. Set S3 credentials
await conn.run(`
  SET s3_access_key_id='YOUR_ACCESS_KEY_ID';
  SET s3_secret_access_key='YOUR_SECRET_ACCESS_KEY';
  SET s3_region='YOUR_REGION';
`);

// 2. Attach DuckLake catalog
await conn.run(`
  ATTACH IF NOT EXISTS 'ducklake:postgres:dbname=YOUR_DB_NAME host=YOUR_HOST port=5432 user=YOUR_USER password=YOUR_PASSWORD' AS oleander
  (DATA_PATH 's3://oleander-lake/orgs/YOUR_ORG_ID/', CREATE_IF_NOT_EXISTS true);
`);

// 3. Switch to the catalog
await conn.run(`USE oleander;`);

// 4. Query the catalog
const result = await conn.all(`SELECT * FROM flowers;`);
console.log(result);

We provide you wish a serverless Postgres instance that serves as your data catalog. Our pricing model expects this Postgres instance to contain only metadata.

Getting started

Get started with oleander’s lake in just a few steps:

Navigate to the Lake in your oleander dashboard to access the SQL interface
Run a SQL query on a public table to get familiar with the interface:

SELECT * FROM public.iris_dataset LIMIT 10;

Sync your own data to start analyzing your tables. Create a new table with sample data:

CREATE TABLE flowers AS
SELECT
  'oleander' AS name,
  'nerium' AS species,
  'pink' AS color,
  TRUE AS evergreen,
  5 AS petals,
  'toxic but beautiful' AS notes;

Follow up with an insertion if you’d like as well:

INSERT INTO flowers (name, species, color, evergreen, petals, notes)
VALUES (
  'camellia',
  'camellia japonica',
  'white',
  TRUE,
  8,
  'winter-blooming, glossy leaves'
);

Explore lineage to see how data flows through your queries. Every operation automatically captures lineage metadata for complete observability.

The lake provides a complete environment for data analysis, from quick SQL queries to complex Spark jobs, all with observability built in.

FAQs

Why do clock time and query time differ?

Clock time and query time can differ for several reasons:

Clock time measures the total wall-clock time from when a query is submitted until it comes back to you. This also includes lineage extraction and creation.
Query time measures the actual execution time spent processing the query in the database engine running your query.

The difference typically occurs due to:

Network latency
Lineage extraction

You only pay for the query time.

Why doesn't the direct DuckDb connection instrument lineage?

Direct DuckDB connections bypass the oleander API layer, which is responsible for capturing and instrumenting lineage metadata. When you connect directly to DuckDB, your queries execute directly against the database engine without going through oleander’s observability infrastructure.To capture lineage:

Use the oleander lake API or the oleander SaaS interface for queries
All operations performed through the API are automatically tracked and observable
Lineage metadata is captured using the OpenLineage specification

Direct DuckDB connections are supported for power users who need direct database access, but they operate outside the observability layer. If you need lineage tracking for your queries, use the oleander API instead of a direct connection.

Get Started

Integrations

Features

Overview

Portable

Automatic lineage

Serverless SQL execution

Sync your data

Access public tables

Scheduled queries with time-travel

Bring your own (Iceberg) catalog

Bring your own compute

DuckDB terminal

Client libraries

Getting started

FAQs

Get Started

Integrations

Features

​Overview

​Portable

​Automatic lineage

​Serverless SQL execution

​Sync your data

​Access public tables

​Scheduled queries with time-travel

​Bring your own (Iceberg) catalog

​Bring your own compute

​DuckDB terminal

​Client libraries

​Getting started

​FAQs

Overview

Portable

Automatic lineage

Serverless SQL execution

Sync your data

Access public tables

Scheduled queries with time-travel

Bring your own (Iceberg) catalog

Bring your own compute

DuckDB terminal

Client libraries

Getting started

FAQs