Documentation Index
Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt
Use this file to discover all available pages before exploring further.
Run your Spark applications on oleander-managed infrastructure or on your own registered clusters. For a full overview of Spark on oleander, see the Spark documentation.
Oleander-managed Spark
Initialize a PySpark workspace
Create a new PySpark job workspace:
oleander spark init <dirname>
Example:
oleander spark init my-job
You can also initialize the current directory:
The initialized workspace includes:
entrypoint.py as the Spark job entrypoint
mylib/ for Python modules packaged as pyFiles
pyproject.toml and uv.lock for project and dependency management with uv
Makefile targets for building deployable artifacts
Use uv to manage dependencies:
uv sync --dev
uv add <package>
uv add --dev <package>
Use make to build the deployment artifacts:
This builds:
out/pyfiles.zip
out/environment.tar.gz
You can also build individual artifacts:
make pyfiles
make environment
make rebuild
After building, upload and submit from the initialized workspace:
oleander spark jobs upload entrypoint.py \
--py-files out/pyfiles.zip \
--virtualenv out/environment.tar.gz
oleander spark jobs submit entrypoint.py \
--namespace <namespace> \
--name <job-name> \
--wait
List your Spark artifacts
Upload a Spark artifact
Upload a local .py or .jar artifact to oleander:
oleander spark jobs upload <your_artifact_path>
Example:
oleander spark jobs upload ./transformations/process_sales_data.py
JAR example:
oleander spark jobs upload ./jobs/process_sales_data.jar
Every upload creates a new artifact version on the backend.
Upload options:
| Flag | Description |
|---|
--namespace | Namespace used when uploading. Defaults to default. |
--py-files | Local .zip or .egg dependency archive uploaded alongside a Python artifact |
--virtualenv | Local virtual environment archive uploaded alongside a Python artifact |
--dry-run | Show what would be uploaded without uploading anything |
Delete a Spark artifact
oleander spark jobs delete <artifact_name>
Use the exact uploaded artifact name, including the file extension.
Example:
oleander spark jobs delete process_sales_data.py
Submit a job
Submit an uploaded artifact to the oleander-managed cluster. Use the uploaded artifact name as the entrypoint.
oleander spark jobs submit <entrypoint> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark jobs submit process_sales_data.py --namespace finance --name process-sales-data --wait
Common submit options:
| Flag | Description |
|---|
--cluster | Cluster name. Defaults to the oleander-managed cluster when omitted |
--namespace | Required job namespace |
--name | Required job name |
--args | Entrypoint arguments |
--sparkConf | Spark configurations, for example spark.default.parallelism=8 |
--packages | Extra package coordinates |
--jobTags | Job-specific tags |
--runTags | Run-specific tags |
--wait | Block until the job finishes |
Oleander-managed submit options:
| Flag | Description |
|---|
--driverMachineType | Driver machine type |
--executorMachineType | Executor machine type |
--executorNumbers | Number of executor instances |
Registered clusters
List registered clusters
oleander spark clusters list
Registered EMR Serverless
Register your EMR Serverless cluster, then target it by name when submitting jobs.
Register a cluster
oleander spark clusters register <name> \
--type emr-serverless \
--region <region> \
--account-id <aws_account_id> \
--controller-role-arn <controller_role_arn> \
--execution-role-arn <execution_role_arn> \
--application-id <application_id> \
--log-bucket <log_bucket>
See the Spark documentation for IAM policy details.
Submit a job
Use an EMR-compatible entrypoint, such as an S3 URI to a .py or .jar artifact.
oleander spark jobs submit <entrypoint> \
--cluster <cluster_name> \
--namespace <namespace> \
--name <run_name> \
--wait
Example:
oleander spark jobs submit s3://my-bucket/jobs/process_sales_data.py --cluster my-emr --namespace finance --name process-sales-data --wait
Additional EMR Serverless flags:
| Flag | Description |
|---|
--pyFiles | Zip archive of Python dependencies |
--virtualenv | Virtual environment archive for Python jobs |
--mainClass | Main class for JVM jobs; use instead of Python-specific options such as --pyFiles and --virtualenv |
--executionIamPolicy | IAM policy applied to execution |
Registered Glue
Register your Glue cluster, then target it by name.
Register a cluster
oleander spark clusters register <name> \
--type glue \
--controller-role-arn <controller_role_arn>
See the Spark documentation for IAM policy details.
Submit a job
For Glue, the entrypoint is the Glue job name.
oleander spark jobs submit <job_name> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Additional Glue flags:
| Flag | Description |
|---|
--workerType | Glue worker type |
--numberOfWorkers | Number of Glue workers |
--enableAutoScaling | Enable Glue auto scaling |
--timeoutMinutes | Timeout in minutes |
--executionClass | Execution class, such as STANDARD or FLEX |
--executionIamPolicy | IAM policy applied to execution |
For Glue jobs, --args must be passed as alternating key-value pairs, for example --args source s3://bucket/in target s3://bucket/out.