Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt

Use this file to discover all available pages before exploring further.

Run your Spark applications on oleander-managed infrastructure or on your own registered clusters. For a full overview of Spark on oleander, see the Spark documentation.

Oleander-managed Spark

Initialize a PySpark workspace

Create a new PySpark job workspace:
oleander spark init <dirname>
Example:
oleander spark init my-job
You can also initialize the current directory:
oleander spark init .
The initialized workspace includes:
  • entrypoint.py as the Spark job entrypoint
  • mylib/ for Python modules packaged as pyFiles
  • pyproject.toml and uv.lock for project and dependency management with uv
  • Makefile targets for building deployable artifacts
Use uv to manage dependencies:
uv sync --dev
uv add <package>
uv add --dev <package>
Use make to build the deployment artifacts:
make
This builds:
  • out/pyfiles.zip
  • out/environment.tar.gz
You can also build individual artifacts:
make pyfiles
make environment
make rebuild
After building, upload and submit from the initialized workspace:
oleander spark jobs upload entrypoint.py \
  --py-files out/pyfiles.zip \
  --virtualenv out/environment.tar.gz
oleander spark jobs submit entrypoint.py \
  --namespace <namespace> \
  --name <job-name> \
  --wait

List your Spark artifacts

oleander spark jobs list

Upload a Spark artifact

Upload a local .py or .jar artifact to oleander:
oleander spark jobs upload <your_artifact_path>
Example:
oleander spark jobs upload ./transformations/process_sales_data.py
JAR example:
oleander spark jobs upload ./jobs/process_sales_data.jar
Every upload creates a new artifact version on the backend. Upload options:
FlagDescription
--namespaceNamespace used when uploading. Defaults to default.
--py-filesLocal .zip or .egg dependency archive uploaded alongside a Python artifact
--virtualenvLocal virtual environment archive uploaded alongside a Python artifact
--dry-runShow what would be uploaded without uploading anything

Delete a Spark artifact

oleander spark jobs delete <artifact_name>
Use the exact uploaded artifact name, including the file extension. Example:
oleander spark jobs delete process_sales_data.py

Submit a job

Submit an uploaded artifact to the oleander-managed cluster. Use the uploaded artifact name as the entrypoint.
oleander spark jobs submit <entrypoint> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark jobs submit process_sales_data.py --namespace finance --name process-sales-data --wait
Common submit options:
FlagDescription
--clusterCluster name. Defaults to the oleander-managed cluster when omitted
--namespaceRequired job namespace
--nameRequired job name
--argsEntrypoint arguments
--sparkConfSpark configurations, for example spark.default.parallelism=8
--packagesExtra package coordinates
--jobTagsJob-specific tags
--runTagsRun-specific tags
--waitBlock until the job finishes
Oleander-managed submit options:
FlagDescription
--driverMachineTypeDriver machine type
--executorMachineTypeExecutor machine type
--executorNumbersNumber of executor instances

Registered clusters

List registered clusters

oleander spark clusters list

Registered EMR Serverless

Register your EMR Serverless cluster, then target it by name when submitting jobs.

Register a cluster

oleander spark clusters register <name> \
  --type emr-serverless \
  --region <region> \
  --account-id <aws_account_id> \
  --controller-role-arn <controller_role_arn> \
  --execution-role-arn <execution_role_arn> \
  --application-id <application_id> \
  --log-bucket <log_bucket>
See the Spark documentation for IAM policy details.

Submit a job

Use an EMR-compatible entrypoint, such as an S3 URI to a .py or .jar artifact.
oleander spark jobs submit <entrypoint> \
  --cluster <cluster_name> \
  --namespace <namespace> \
  --name <run_name> \
  --wait
Example:
oleander spark jobs submit s3://my-bucket/jobs/process_sales_data.py --cluster my-emr --namespace finance --name process-sales-data --wait
Additional EMR Serverless flags:
FlagDescription
--pyFilesZip archive of Python dependencies
--virtualenvVirtual environment archive for Python jobs
--mainClassMain class for JVM jobs; use instead of Python-specific options such as --pyFiles and --virtualenv
--executionIamPolicyIAM policy applied to execution

Registered Glue

Register your Glue cluster, then target it by name.

Register a cluster

oleander spark clusters register <name> \
  --type glue \
  --controller-role-arn <controller_role_arn>
See the Spark documentation for IAM policy details.

Submit a job

For Glue, the entrypoint is the Glue job name.
oleander spark jobs submit <job_name> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Additional Glue flags:
FlagDescription
--workerTypeGlue worker type
--numberOfWorkersNumber of Glue workers
--enableAutoScalingEnable Glue auto scaling
--timeoutMinutesTimeout in minutes
--executionClassExecution class, such as STANDARD or FLEX
--executionIamPolicyIAM policy applied to execution
For Glue jobs, --args must be passed as alternating key-value pairs, for example --args source s3://bucket/in target s3://bucket/out.