Run your Spark applications on oleander-managed infrastructure or on your own registered clusters. For a full overview of Spark on oleander, see the Spark documentation.
Oleander-managed Spark
List your Spark jobs
Upload a Spark script
Upload a local .py file to oleander:
oleander spark jobs upload <your_script_path>
Example:
oleander spark jobs upload ./transformations/process_sales_data.py
Upload options:
| Flag | Description |
|---|
--namespace | Namespace used when uploading. Defaults to default. |
--py-files | Local .zip or .egg dependency archive uploaded alongside the script |
--overwrite | Replace an existing uploaded script |
--dry-run | Show what would be uploaded without uploading anything |
Delete a Spark job
oleander spark jobs delete <script_name>
Submit a job
Submit an uploaded script to the oleander-managed cluster. Use the uploaded script name as the entrypoint.
oleander spark jobs submit <entrypoint> --namespace <namespace> --name <run_name> --wait
Example:
oleander spark jobs submit process_sales_data.py --namespace finance --name process-sales-data --wait
Common submit options:
| Flag | Description |
|---|
--cluster | Cluster name. Defaults to the oleander-managed cluster when omitted |
--namespace | Required job namespace |
--name | Required job name |
--args | Entrypoint arguments |
--sparkConf | Spark configurations, for example spark.default.parallelism=8 |
--packages | Extra package coordinates |
--jobTags | Job-specific tags |
--runTags | Run-specific tags |
--wait | Block until the job finishes |
Oleander-managed submit options:
| Flag | Description |
|---|
--driverMachineType | Driver machine type |
--executorMachineType | Executor machine type |
--executorNumbers | Number of executor instances |
Registered clusters
List registered clusters
oleander spark clusters list
Registered EMR Serverless
Register your EMR Serverless cluster, then target it by name when submitting jobs.
Register a cluster
oleander spark clusters register <name> \
--type emr-serverless \
--region <region> \
--account-id <aws_account_id> \
--controller-role-arn <controller_role_arn> \
--execution-role-arn <execution_role_arn> \
--application-id <application_id> \
--log-bucket <log_bucket>
See the Spark documentation for IAM policy details.
Submit a job
Use an EMR-compatible entrypoint, such as an S3 URI.
oleander spark jobs submit <entrypoint> \
--cluster <cluster_name> \
--namespace <namespace> \
--name <run_name> \
--wait
Example:
oleander spark jobs submit s3://my-bucket/jobs/process_sales_data.py --cluster my-emr --namespace finance --name process-sales-data --wait
Additional EMR Serverless flags:
| Flag | Description |
|---|
--pyFiles | Zip archive of Python dependencies |
--mainClass | Main class for JVM jobs |
--executionIamPolicy | IAM policy applied to execution |
Registered Glue
Register your Glue cluster, then target it by name.
Register a cluster
oleander spark clusters register <name> \
--type glue \
--controller-role-arn <controller_role_arn>
See the Spark documentation for IAM policy details.
Submit a job
For Glue, the entrypoint is the Glue job name.
oleander spark jobs submit <job_name> --cluster <cluster_name> --namespace <namespace> --name <run_name> --wait
Additional Glue flags:
| Flag | Description |
|---|
--workerType | Glue worker type |
--numberOfWorkers | Number of Glue workers |
--enableAutoScaling | Enable Glue auto scaling |
--timeoutMinutes | Timeout in minutes |
--executionClass | Execution class, such as STANDARD or FLEX |
--executionIamPolicy | IAM policy applied to execution |
For Glue jobs, --args must be passed as alternating key-value pairs, for example --args source s3://bucket/in target s3://bucket/out.