Polars - oleander

Polars is in beta. Observability (lineage, traces, logs) is not yet available for Polars workloads.

The oleander polars command runs Polars workloads against your Iceberg lake tables. It supports two modes:

Query mode — execute a Polars SQL expression across one or more tables
Script mode — run a Python script that uses the Polars API directly

Workloads run in an isolated Daytona sandbox. Pass --distributed to offload execution to Polars Cloud.

Query mode

Run a Polars SQL query against one or more lake tables. Register each table with --table.

oleander polars "SELECT * FROM flowers LIMIT 25" \
  --table flowers=default.flowers

Use alias=namespace.table to register a table under a short alias. Multiple --table flags are supported:

oleander polars "SELECT f.species, m.mean_sepal_length FROM flowers f JOIN measurements m USING (species) LIMIT 50" \
  --table flowers=default.flowers \
  --table measurements=default.flower_measurements

Script mode

Pass a Python file with --script. oleander injects the Polars runtime, auth, and catalog — your script only needs to assign a result.

oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=100

Python template

The following variables are available in your script without any imports:

Variable	Description
`pl`	The `polars` module
`scan(table)`	Returns a `LazyFrame` for the given `namespace.table` with credentials already wired in
`params`	`dict` of values passed via `--param key=value`
`catalog`	The underlying pyiceberg catalog (advanced use)

Your script must assign its output to result — a Polars LazyFrame or DataFrame. Do not call .collect() or .remote() yourself; oleander handles execution.

# my_analysis.py

table = params.get("table", "default.flowers")
limit = int(params.get("limit", 50))

flowers = scan(table)

result = (
    flowers.group_by("species")
    .agg(
        pl.len().alias("count"),
        pl.col("sepal_length").mean().round(2).alias("avg_sepal_length"),
        pl.col("petal_length").mean().round(2).alias("avg_petal_length"),
        pl.col("sepal_width").mean().round(2).alias("avg_sepal_width"),
    )
    .sort("avg_sepal_length", descending=True)
    .head(limit)
)

Run it:

oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=25

Saving results

Use --save to write the result to a table in your lake. The destination is a namespace.table identifier.

oleander polars --script ./my_analysis.py --save default.flower_summary

Append to an existing table instead of replacing it:

oleander polars --script ./my_analysis.py \
  --save default.flower_summary \
  --save-mode append

Distributed execution

Pass --distributed to run the workload on Polars Cloud instead of in the local sandbox. Use --instance-type and --cluster-size to control the compute.

oleander polars --script ./my_analysis.py \
  --distributed \
  --instance-type t4g.large \
  --cluster-size 4

Distributed execution incurs Polars Cloud compute costs in addition to your oleander usage. Large --cluster-size values or long-running scripts can accumulate significant charges. Start with the default instance type and a small cluster size, then scale up as needed.

Options

Option	Description
`--script <file>`	Path to a Python script (activates script mode)
`--table <alias=ns.table>`	Register a table for query mode. Repeatable.
`--param <key=value>`	Pass a parameter to the script via `params`. Repeatable.
`--distributed`	Run on Polars Cloud instead of a local sandbox
`--instance-type <type>`	Polars Cloud instance type (default: `t4g.medium`)
`--cluster-size <n>`	Number of nodes for distributed execution
`--vcpus <n>`	Sandbox vCPU tier for local execution: `2`, `4`, `8`, `16`, or `32`
`--save <ns.table>`	Write results to a lake table
`--save-mode <mode>`	`overwrite` (default) or `append`
`--catalog <name>`	Catalog name (default: `oleander`)

​Query mode

​Script mode

​Python template

​Saving results

​Distributed execution

​Options

Query mode

Script mode

Python template

Saving results

Distributed execution

Options