Polars is in beta. Observability (lineage, traces, logs) is not yet available for Polars workloads.
The oleander polars command runs Polars workloads against your Iceberg lake tables. It supports two modes:
- Query mode — execute a Polars SQL expression across one or more tables
- Script mode — run a Python script that uses the Polars API directly
Workloads run in an isolated Daytona sandbox. Pass --distributed to offload execution to Polars Cloud.
Query mode
Run a Polars SQL query against one or more lake tables. Register each table with --table.
oleander polars "SELECT * FROM flowers LIMIT 25" \
--table flowers=default.flowers
Use alias=namespace.table to register a table under a short alias. Multiple --table flags are supported:
oleander polars "SELECT f.species, m.mean_sepal_length FROM flowers f JOIN measurements m USING (species) LIMIT 50" \
--table flowers=default.flowers \
--table measurements=default.flower_measurements
Script mode
Pass a Python file with --script. oleander injects the Polars runtime, auth, and catalog — your script only needs to assign a result.
oleander polars --script ./my_analysis.py \
--param table=default.flowers \
--param limit=100
Python template
The following variables are available in your script without any imports:
| Variable | Description |
|---|
pl | The polars module |
scan(table) | Returns a LazyFrame for the given namespace.table with credentials already wired in |
params | dict of values passed via --param key=value |
catalog | The underlying pyiceberg catalog (advanced use) |
Your script must assign its output to result — a Polars LazyFrame or DataFrame. Do not call .collect() or .remote() yourself; oleander handles execution.
# my_analysis.py
table = params.get("table", "default.flowers")
limit = int(params.get("limit", 50))
flowers = scan(table)
result = (
flowers.group_by("species")
.agg(
pl.len().alias("count"),
pl.col("sepal_length").mean().round(2).alias("avg_sepal_length"),
pl.col("petal_length").mean().round(2).alias("avg_petal_length"),
pl.col("sepal_width").mean().round(2).alias("avg_sepal_width"),
)
.sort("avg_sepal_length", descending=True)
.head(limit)
)
Run it:
oleander polars --script ./my_analysis.py \
--param table=default.flowers \
--param limit=25
Saving results
Use --save to write the result to a table in your lake. The destination is a namespace.table identifier.
oleander polars --script ./my_analysis.py --save default.flower_summary
Append to an existing table instead of replacing it:
oleander polars --script ./my_analysis.py \
--save default.flower_summary \
--save-mode append
Distributed execution
Pass --distributed to run the workload on Polars Cloud instead of in the local sandbox. Use --instance-type and --cluster-size to control the compute.
oleander polars --script ./my_analysis.py \
--distributed \
--instance-type t4g.large \
--cluster-size 4
Distributed execution incurs Polars Cloud compute costs in addition to your oleander usage. Large --cluster-size values or long-running scripts can accumulate significant charges. Start with the default instance type and a small cluster size, then scale up as needed.
Options
| Option | Description |
|---|
--script <file> | Path to a Python script (activates script mode) |
--table <alias=ns.table> | Register a table for query mode. Repeatable. |
--param <key=value> | Pass a parameter to the script via params. Repeatable. |
--distributed | Run on Polars Cloud instead of a local sandbox |
--instance-type <type> | Polars Cloud instance type (default: t4g.medium) |
--cluster-size <n> | Number of nodes for distributed execution |
--vcpus <n> | Sandbox vCPU tier for local execution: 2, 4, 8, 16, or 32 |
--save <ns.table> | Write results to a lake table |
--save-mode <mode> | overwrite (default) or append |
--catalog <name> | Catalog name (default: oleander) |