Skip to main content
Polars is in beta. Observability (lineage, traces, logs) is not yet available for Polars workloads.
The oleander polars command runs Polars workloads against your Iceberg lake tables. It supports two modes:
  • Query mode — execute a Polars SQL expression across one or more tables
  • Script mode — run a Python script that uses the Polars API directly
Workloads run in an isolated Daytona sandbox. Pass --distributed to offload execution to Polars Cloud.

Query mode

Run a Polars SQL query against one or more lake tables. Register each table with --table.
oleander polars "SELECT * FROM flowers LIMIT 25" \
  --table flowers=default.flowers
Use alias=namespace.table to register a table under a short alias. Multiple --table flags are supported:
oleander polars "SELECT f.species, m.mean_sepal_length FROM flowers f JOIN measurements m USING (species) LIMIT 50" \
  --table flowers=default.flowers \
  --table measurements=default.flower_measurements

Script mode

Pass a Python file with --script. oleander injects the Polars runtime, auth, and catalog — your script only needs to assign a result.
oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=100

Python template

The following variables are available in your script without any imports:
VariableDescription
plThe polars module
scan(table)Returns a LazyFrame for the given namespace.table with credentials already wired in
paramsdict of values passed via --param key=value
catalogThe underlying pyiceberg catalog (advanced use)
Your script must assign its output to result — a Polars LazyFrame or DataFrame. Do not call .collect() or .remote() yourself; oleander handles execution.
# my_analysis.py

table = params.get("table", "default.flowers")
limit = int(params.get("limit", 50))

flowers = scan(table)

result = (
    flowers.group_by("species")
    .agg(
        pl.len().alias("count"),
        pl.col("sepal_length").mean().round(2).alias("avg_sepal_length"),
        pl.col("petal_length").mean().round(2).alias("avg_petal_length"),
        pl.col("sepal_width").mean().round(2).alias("avg_sepal_width"),
    )
    .sort("avg_sepal_length", descending=True)
    .head(limit)
)
Run it:
oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=25

Saving results

Use --save to write the result to a table in your lake. The destination is a namespace.table identifier.
oleander polars --script ./my_analysis.py --save default.flower_summary
Append to an existing table instead of replacing it:
oleander polars --script ./my_analysis.py \
  --save default.flower_summary \
  --save-mode append

Distributed execution

Pass --distributed to run the workload on Polars Cloud instead of in the local sandbox. Use --instance-type and --cluster-size to control the compute.
oleander polars --script ./my_analysis.py \
  --distributed \
  --instance-type t4g.large \
  --cluster-size 4
Distributed execution incurs Polars Cloud compute costs in addition to your oleander usage. Large --cluster-size values or long-running scripts can accumulate significant charges. Start with the default instance type and a small cluster size, then scale up as needed.

Options

OptionDescription
--script <file>Path to a Python script (activates script mode)
--table <alias=ns.table>Register a table for query mode. Repeatable.
--param <key=value>Pass a parameter to the script via params. Repeatable.
--distributedRun on Polars Cloud instead of a local sandbox
--instance-type <type>Polars Cloud instance type (default: t4g.medium)
--cluster-size <n>Number of nodes for distributed execution
--vcpus <n>Sandbox vCPU tier for local execution: 2, 4, 8, 16, or 32
--save <ns.table>Write results to a lake table
--save-mode <mode>overwrite (default) or append
--catalog <name>Catalog name (default: oleander)