Skip to main content
Run your Spark applications on oleander-managed infrastructure or on your own registered clusters. For a full overview of Spark on oleander, see the Spark documentation.

Oleander Managed Spark

List your Spark jobs

oleander spark jobs list

Upload a Spark script

Upload your PySpark application to oleander:
oleander spark jobs upload <your_script_path>
Example:
oleander spark jobs upload ./transformations/process_sales_data.py
Include Python dependencies with --py-files:
oleander spark jobs upload <your_script_path> --py-files <module_archive_zip>
Replace an existing script with --overwrite:
oleander spark jobs upload <your_script_path> --overwrite

Delete a Spark job

oleander spark jobs delete <script_name>

Submit a job

Submit your uploaded script to the oleander managed cluster. Use the exact uploaded file name without the path.
oleander spark submit <script_name> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark submit process_sales_data.py --namespace finance --name process-sales-data --wait

Submit options

FlagDescription
--namespace(required) Logical group such as a team or project
--name(required) Job name. Runs with the same namespace and name are grouped
--argsEntrypoint arguments
--sparkConfSpark configurations, e.g. spark.default.parallelism=8
--jobTagsJob-specific tags in key=value form
--runTagsRun-specific tags
--executionIamPolicyIAM policy for job permissions
--driverMachineTypeDriver machine type
--executorMachineTypeExecutor machine type
--executorNumbersNumber of executor instances
--waitBlock until the job finishes

Registered EMR Serverless

Register your EMR Serverless cluster, then target it by name when submitting jobs.

Register a cluster

oleander spark cluster register <name> \
  --type emr-serverless \
  --region <region> \
  --account-id <awsAccountId> \
  --controller-role-arn <controllerRoleArn> \
  --execution-role-arn <executionRoleArn> \
  --application-id <applicationId> \
  --log-bucket <logBucket>
See the Spark documentation for IAM policy details.

Submit a job

oleander spark submit <entrypoint_s3_uri> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark submit s3://my-bucket/jobs/process_sales_data.py --cluster my-emr --namespace finance --name process-sales-data --wait
Additional flags: --pyFiles, --mainClass.

Registered Glue

Register your Glue cluster, then target it by name.

Register a cluster

oleander spark cluster register <name> \
  --type glue \
  --controller-role-arn <controllerRoleArn>
See the Spark documentation for IAM policy details.

Submit a job

oleander spark submit <job_name> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Additional flags: --workerType, --numberOfWorkers, --enableAutoScaling, --executionClass, --timeoutMinutes.