> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Polars

> Run Polars workloads against your lake — SQL queries or Python scripts — from the CLI.

<Warning>
  Polars is in **beta**. Observability (lineage, traces, logs) is not yet available for Polars workloads.
</Warning>

The `oleander polars` command runs Polars workloads against your Iceberg lake tables. It supports two modes:

* **Query mode** — execute a Polars SQL expression across one or more tables
* **Script mode** — run a Python script that uses the Polars API directly

Workloads run in an isolated Daytona sandbox. Pass `--distributed` to offload execution to Polars Cloud.

## Query mode

Run a Polars SQL query against one or more lake tables. Register each table with `--table`.

```bash theme={null}
oleander polars "SELECT * FROM flowers LIMIT 25" \
  --table flowers=default.flowers
```

Use `alias=namespace.table` to register a table under a short alias. Multiple `--table` flags are supported:

```bash theme={null}
oleander polars "SELECT f.species, m.mean_sepal_length FROM flowers f JOIN measurements m USING (species) LIMIT 50" \
  --table flowers=default.flowers \
  --table measurements=default.flower_measurements
```

## Script mode

Pass a Python file with `--script`. oleander injects the Polars runtime, auth, and catalog — your script only needs to assign a result.

```bash theme={null}
oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=100
```

### Python template

The following variables are available in your script without any imports:

| Variable      | Description                                                                             |
| ------------- | --------------------------------------------------------------------------------------- |
| `pl`          | The `polars` module                                                                     |
| `scan(table)` | Returns a `LazyFrame` for the given `namespace.table` with credentials already wired in |
| `params`      | `dict` of values passed via `--param key=value`                                         |
| `catalog`     | The underlying pyiceberg catalog (advanced use)                                         |

Your script must assign its output to `result` — a Polars `LazyFrame` or `DataFrame`. Do not call `.collect()` or `.remote()` yourself; oleander handles execution.

```python theme={null}
# my_analysis.py

table = params.get("table", "default.flowers")
limit = int(params.get("limit", 50))

flowers = scan(table)

result = (
    flowers.group_by("species")
    .agg(
        pl.len().alias("count"),
        pl.col("sepal_length").mean().round(2).alias("avg_sepal_length"),
        pl.col("petal_length").mean().round(2).alias("avg_petal_length"),
        pl.col("sepal_width").mean().round(2).alias("avg_sepal_width"),
    )
    .sort("avg_sepal_length", descending=True)
    .head(limit)
)
```

Run it:

```bash theme={null}
oleander polars --script ./my_analysis.py \
  --param table=default.flowers \
  --param limit=25
```

## Saving results

Use `--save` to write the result to a table in your lake. The destination is a `namespace.table` identifier.

```bash theme={null}
oleander polars --script ./my_analysis.py --save default.flower_summary
```

Append to an existing table instead of replacing it:

```bash theme={null}
oleander polars --script ./my_analysis.py \
  --save default.flower_summary \
  --save-mode append
```

## Distributed execution

Pass `--distributed` to run the workload on Polars Cloud instead of in the local sandbox. Use `--instance-type` and `--cluster-size` to control the compute.

```bash theme={null}
oleander polars --script ./my_analysis.py \
  --distributed \
  --instance-type t4g.large \
  --cluster-size 4
```

<Warning>
  Distributed execution incurs Polars Cloud compute costs in addition to your oleander usage. Large `--cluster-size` values or long-running scripts can accumulate significant charges. Start with the default instance type and a small cluster size, then scale up as needed.
</Warning>

## Options

| Option                     | Description                                                         |
| -------------------------- | ------------------------------------------------------------------- |
| `--script <file>`          | Path to a Python script (activates script mode)                     |
| `--table <alias=ns.table>` | Register a table for query mode. Repeatable.                        |
| `--param <key=value>`      | Pass a parameter to the script via `params`. Repeatable.            |
| `--distributed`            | Run on Polars Cloud instead of a local sandbox                      |
| `--instance-type <type>`   | Polars Cloud instance type (default: `t4g.medium`)                  |
| `--cluster-size <n>`       | Number of nodes for distributed execution                           |
| `--vcpus <n>`              | Sandbox vCPU tier for local execution: `2`, `4`, `8`, `16`, or `32` |
| `--save <ns.table>`        | Write results to a lake table                                       |
| `--save-mode <mode>`       | `overwrite` (default) or `append`                                   |
| `--catalog <name>`         | Catalog name (default: `oleander`)                                  |
