Spark

Run your PySpark applications directly on oleander’s managed Spark infrastructure. Upload your scripts, manage job execution, and automatically capture lineage metadata for complete observability of your data transformations.

Installation

Using Homebrew

Install the oleander CLI to manage your Spark jobs from the command line:

brew tap OleanderHQ/tap
brew install oleander-cli

Configuration

Authenticate with your oleander account using your API key. You can find your API key in your oleander settings.

oleander configure --api-key <YOUR_API_KEY>

Managing Spark Jobs

List your Spark jobs

View all your uploaded PySpark scripts and their current status:

oleander spark jobs list

This command displays all your Spark jobs, making it easy to see which scripts are available for execution.

Upload your PySpark script

Upload your PySpark application to oleander. Your script will be stored and ready to execute on demand:

oleander spark jobs upload <your_script_path>

Example:

oleander spark jobs upload ./transformations/process_sales_data.py

Include Python dependencies

If your PySpark script requires additional Python modules, package them in a ZIP archive and include them with the --py-files option:

oleander spark jobs upload <your_script_path> --py-files <module_archive_zip>

Example:

oleander spark jobs upload ./etl_pipeline.py --py-files ./dependencies.zip

Update an existing job

To replace an existing Spark job with a new version of your script, use the --overwrite flag:

oleander spark jobs upload <your_script_path> --overwrite

This is useful when iterating on your data transformations or fixing bugs in your PySpark code.

Delete a Spark job

Remove a Spark job that you no longer need:

oleander spark jobs delete <script_name>

Example:

oleander spark jobs delete process_sales_data

Submit and execute a Spark job

Run your uploaded PySpark script on oleander’s compute infrastructure. The --wait flag keeps the command running until the job completes, so you can monitor execution in real-time:

oleander spark jobs submit <script_name> --wait

Example:

oleander spark jobs submit process_sales_data --wait

When your Spark job runs, oleander automatically captures OpenLineage metadata, tracking data lineage, transformations, and dependencies. View the results and lineage graph in your oleander dashboard.

Get Started

Integrations

Features

Installation

Using Homebrew

Configuration

Managing Spark Jobs

List your Spark jobs

Upload your PySpark script

Include Python dependencies

Update an existing job

Delete a Spark job

Submit and execute a Spark job

Get Started

Integrations

Features

​Installation

​Using Homebrew

​Configuration

​Managing Spark Jobs

​List your Spark jobs

​Upload your PySpark script

​Include Python dependencies

​Update an existing job

​Delete a Spark job

​Submit and execute a Spark job

Installation

Using Homebrew

Configuration

Managing Spark Jobs

List your Spark jobs

Upload your PySpark script

Include Python dependencies

Update an existing job

Delete a Spark job

Submit and execute a Spark job