Run your Spark applications on oleander-managed infrastructure or on your own registered Spark clusters. Upload artifacts to oleander for the managed cluster, or keep jobs in your environment for registered clusters. Manage runs and capture lineage metadata for full observability of your data transformations.Documentation Index
Fetch the complete documentation index at: https://docs.oleander.dev/llms.txt
Use this file to discover all available pages before exploring further.
Installation
Using Homebrew
Install the oleander CLI:Configuration
Authenticate with your API key. Find it in your oleander settings.Oleander Managed Spark
Upload, list, and delete artifacts only on the oleander-managed cluster.Initialize a PySpark workspace
Create a new PySpark job workspace:entrypoint.pyas the Spark job entrypointmylib/for Python modules packaged aspyFilespyproject.tomlanduv.lockfor project and dependency management withuvMakefiletargets for building deployable artifacts
uv to manage dependencies:
make to build the deployment artifacts:
out/pyfiles.zipout/environment.tar.gz
List your Spark artifacts
List your uploaded Spark artifacts:Upload your Spark artifact
Upload a local.py or .jar artifact to oleander:
Include Python dependencies
If your Python artifact needs additional Python modules, package them in a ZIP and include them with--py-files:
Include a virtual environment
If your Python artifact depends on a packaged virtual environment, include it with--virtualenv:
Delete a Spark artifact
Delete a Spark artifact:Submit and execute a Spark job
Submit your uploaded artifact to the oleander-managed cluster. Use the exact uploaded filename without the path, such asprocess_sales_data.py or analytics-batch.jar. The --wait flag keeps the command running until the job finishes.
Common submit options
--cluster: Cluster name. Defaults to the oleander-managed cluster when omitted.--namespace(required): Namespace for the job, a logical group such as a team or project.--name(required): Job name. Runs with the same namespace and name are grouped under the same job.--args: Spark job entrypoint arguments.--sparkConf: Spark configurations without--conf, for examplespark.default.parallelism=8. Separate multiple configurations with whitespace.--packages: Extra package coordinates.--jobTags: Job-specific tags inkey=valueform. Separate multiple tags with whitespace.--runTags: Run-specific tags.--wait: Wait until the job finishes.
Oleander-managed submit options
--driverMachineType: oleander Spark driver machine type.--executorMachineType: oleander Spark executor machine type.--executorNumbers: Number of executor instances.
Registered EMR Serverless Spark
Register your EMR Serverless cluster and target it by name when submitting jobs. Include--cluster <name> and provide the S3 entrypoint to a .py or .jar artifact.
Register an EMR Serverless cluster
Register options
--region: AWS region of the EMR Serverless application.--account-id: AWS account ID of the EMR Serverless application.--controller-role-arn: IAM role ARN oleander assumes to start job runs. Add this to the role’s trust policy so oleander can assume it:
--execution-role-arn: IAM role ARN the job uses; the Spark application runs with this role’s permissions.--application-id: EMR Serverless application ID.--log-bucket: S3 bucket for job logs.
Submit a job to EMR Serverless
Submit options
--cluster(required): Name of the registered cluster.--namespace(required): Namespace for the job, a logical group such as a team or project.--name(required): Job name. Runs with the same namespace and name are grouped under the same job.--args: Spark job entrypoint arguments.--sparkConf: Spark configurations without--conf, for examplespark.default.parallelism=8. Separate multiple configurations with whitespace.--packages: Extra package coordinates.--jobTags: Job-specific tags inkey=valueform. Separate multiple tags with whitespace.--runTags: Run-specific tags.--executionIamPolicy: IAM policy for job permissions. Final permissions are the intersection of the job execution role and this policy.--pyFiles: ExtrapyFilesfor the PySpark job. Mutually exclusive with--mainClass.--virtualenv: Virtual environment archive for Python jobs.--mainClass: Entrypoint main class for the Java/Scala Spark job. Use instead of Python-specific options such as--pyFilesand--virtualenv.--wait: Wait until the job finishes.
Registered Glue Spark
Register your Glue cluster and target it by name when submitting jobs. Include--cluster <name>. Submit uses the existing Glue job name in your environment.
Register a Glue cluster
Register options
--controller-role-arn: IAM role ARN oleander assumes to start job runs. Add this to the role’s trust policy so oleander can assume it:
Submit a job to Glue
Use--cluster to select the registered cluster:
Submit options
--cluster(required): Name of the registered cluster.--namespace(required): Namespace for the job, a logical group such as a team or project.--name(required): Job name. Runs with the same namespace and name are grouped under the same job.--args: Spark job entrypoint arguments.--sparkConf: Spark configurations without--conf, for examplespark.default.parallelism=8. Separate multiple configurations with whitespace.--packages: Extra package coordinates.--jobTags: Job-specific tags inkey=valueform. Separate multiple tags with whitespace.--runTags: Run-specific tags.--executionIamPolicy: IAM policy for job permissions. Final permissions are the intersection of the job execution role and this policy.--workerType: Glue worker type.--numberOfWorkers: Number of Glue workers.--enableAutoScaling: Set totruefor auto scaling,falseotherwise.--executionClass: Glue execution class. EitherSTANDARDorFLEX.--timeoutMinutes: Glue job timeout in minutes.--wait: Wait until the job finishes.