Skip to main content
Run your Spark applications on oleander-managed infrastructure or on your own registered clusters. For a full overview of Spark on oleander, see the Spark documentation.

Oleander-managed Spark

List your Spark jobs

oleander spark jobs list

Upload a Spark script

Upload a local .py file to oleander:
oleander spark jobs upload <your_script_path>
Example:
oleander spark jobs upload ./transformations/process_sales_data.py
Upload options:
FlagDescription
--namespaceNamespace used when uploading. Defaults to default.
--py-filesLocal .zip or .egg dependency archive uploaded alongside the script
--overwriteReplace an existing uploaded script
--dry-runShow what would be uploaded without uploading anything

Delete a Spark job

oleander spark jobs delete <script_name>

Submit a job

Submit an uploaded script to the oleander-managed cluster. Use the uploaded script name as the entrypoint.
oleander spark jobs submit <entrypoint> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark jobs submit process_sales_data.py --namespace finance --name process-sales-data --wait
Common submit options:
FlagDescription
--clusterCluster name. Defaults to the oleander-managed cluster when omitted
--namespaceRequired job namespace
--nameRequired job name
--argsEntrypoint arguments
--sparkConfSpark configurations, for example spark.default.parallelism=8
--packagesExtra package coordinates
--jobTagsJob-specific tags
--runTagsRun-specific tags
--waitBlock until the job finishes
Oleander-managed submit options:
FlagDescription
--driverMachineTypeDriver machine type
--executorMachineTypeExecutor machine type
--executorNumbersNumber of executor instances

Registered clusters

List registered clusters

oleander spark clusters list

Registered EMR Serverless

Register your EMR Serverless cluster, then target it by name when submitting jobs.

Register a cluster

oleander spark clusters register <name> \
  --type emr-serverless \
  --region <region> \
  --account-id <aws_account_id> \
  --controller-role-arn <controller_role_arn> \
  --execution-role-arn <execution_role_arn> \
  --application-id <application_id> \
  --log-bucket <log_bucket>
See the Spark documentation for IAM policy details.

Submit a job

Use an EMR-compatible entrypoint, such as an S3 URI.
oleander spark jobs submit <entrypoint> \
  --cluster <cluster_name> \
  --namespace <namespace> \
  --name <run_name> \
  --wait
Example:
oleander spark jobs submit s3://my-bucket/jobs/process_sales_data.py --cluster my-emr --namespace finance --name process-sales-data --wait
Additional EMR Serverless flags:
FlagDescription
--pyFilesZip archive of Python dependencies
--mainClassMain class for JVM jobs
--executionIamPolicyIAM policy applied to execution

Registered Glue

Register your Glue cluster, then target it by name.

Register a cluster

oleander spark clusters register <name> \
  --type glue \
  --controller-role-arn <controller_role_arn>
See the Spark documentation for IAM policy details.

Submit a job

For Glue, the entrypoint is the Glue job name.
oleander spark jobs submit <job_name> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Additional Glue flags:
FlagDescription
--workerTypeGlue worker type
--numberOfWorkersNumber of Glue workers
--enableAutoScalingEnable Glue auto scaling
--timeoutMinutesTimeout in minutes
--executionClassExecution class, such as STANDARD or FLEX
--executionIamPolicyIAM policy applied to execution
For Glue jobs, --args must be passed as alternating key-value pairs, for example --args source s3://bucket/in target s3://bucket/out.