getSparkCluster({ name })
Look up Spark cluster details before submitting a job. Use oleander for the built-in managed cluster or the name of a registered external cluster.
const cluster = await client.getSparkCluster({ name: "emr-prod" });
console.log(cluster.type, cluster.properties);
Return shape
| Field | Type | Description |
|---|
name | string | Cluster name |
type | string | oleander, emr-serverless, or glue |
properties | unknown | Cluster-specific configuration |
listSparkJobs(options?)
List your uploaded Spark job scripts with pagination.
const { scripts, hasMore } = await client.listSparkJobs();
let offset = 0;
const allScripts: string[] = [];
while (true) {
const page = await client.listSparkJobs({ limit: 50, offset });
allScripts.push(...page.scripts);
if (!page.hasMore) break;
offset += 50;
}
Parameters
Number of scripts to return per page.
Number of scripts to skip for pagination.
Return type: ListSparkJobsResult
| Field | Type | Description |
|---|
scripts | string[] | Script names for the current page |
hasMore | boolean | Whether more scripts are available |
submitSparkJob(options)
Submit a Spark job for execution. cluster defaults to oleander.
For oleander-managed Spark, entrypoint is the uploaded script name. For external clusters, entrypoint is cluster-specific, such as an S3 URI for EMR Serverless or a Glue job name for Glue.
const { runId } = await client.submitSparkJob({
namespace: "my-namespace",
name: "daily-etl",
entrypoint: "etl_pipeline.py",
args: ["--date", "2026-03-11"],
executorNumbers: 4,
});
const run = await client.getRun(runId);
Common options
| Option | Type | Default | Description |
|---|
cluster | string | "oleander" | Managed cluster or registered cluster name |
namespace | string | | Job namespace |
name | string | | Job name |
entrypoint | string | | Script name, S3 URI, or Glue job name depending on cluster |
args | string[] | [] | Entrypoint arguments |
sparkConf | string[] | [] | Spark configuration values |
packages | string[] | [] | Extra package coordinates |
jobTags | string[] | [] | Tags applied to the job |
runTags | string[] | [] | Tags applied to this run |
Oleander-managed options
| Option | Type | Default | Description |
|---|
driverMachineType | SparkMachineType | spark.1.b | Driver machine type |
executorMachineType | SparkMachineType | spark.1.b | Executor machine type |
executorNumbers | number | 2 | Number of executors, from 1 to 20 |
EMR Serverless options
| Option | Type | Description |
|---|
pyFiles | string | Zip archive of Python dependencies |
mainClass | string | Main class for JVM jobs |
executionIamPolicy | string | IAM policy applied to execution |
Glue options
| Option | Type | Default | Description |
|---|
workerType | string | | Glue worker type |
numberOfWorkers | number | 1 | Number of Glue workers |
enableAutoScaling | boolean | | Enable Glue auto scaling |
timeoutMinutes | number | | Timeout in minutes |
executionClass | string | | Glue execution class such as STANDARD or FLEX |
executionIamPolicy | string | | IAM policy applied to execution |
For Glue jobs, args are converted into key-value pairs. Pass them as alternating entries such as ["--source", "s3://bucket/input", "--target", "s3://bucket/output"].
External cluster example
const { runId } = await client.submitSparkJob({
cluster: "emr-prod",
namespace: "finance",
name: "daily-etl",
entrypoint: "s3://my-bucket/jobs/etl_pipeline.py",
args: ["--date", "2026-03-11"],
pyFiles: "s3://my-bucket/jobs/deps.zip",
packages: ["org.example:my-lib:1.0.0"],
});
Machine types
The SparkMachineType enum covers compute-optimized (c), balanced (b), and memory-optimized (m) options:
| Type | vCPUs | Category |
|---|
spark.1.c / spark.1.b / spark.1.m | 1 | Compute / Balanced / Memory |
spark.2.c / spark.2.b / spark.2.m | 2 | Compute / Balanced / Memory |
spark.4.c / spark.4.b / spark.4.m | 4 | Compute / Balanced / Memory |
spark.8.c / spark.8.b / spark.8.m | 8 | Compute / Balanced / Memory |
spark.16.c / spark.16.b / spark.16.m | 16 | Compute / Balanced / Memory |
submitSparkJobAndWait(options)
Submit a Spark job and poll until it reaches a terminal state (COMPLETE, FAIL, or ABORT). Throws an error if the timeout is exceeded.
const { runId, state, run } = await client.submitSparkJobAndWait({
namespace: "my-namespace",
name: "daily-etl",
entrypoint: "etl_pipeline.py",
pollIntervalMs: 5000,
timeoutMs: 300000,
});
if (state === "COMPLETE") {
const elapsed = run.duration;
// proceed with downstream work ...
} else {
throw new Error(`Run ${runId} ended with state: ${state}`);
}
Accepts all submitSparkJob options plus:
Milliseconds between status polls.
Maximum time to wait in milliseconds before throwing a timeout error.
getRun(runId)
Get the current status of a run. Use this to poll a job submitted with submitSparkJob().
const run = await client.getRun(runId);
if (run.state === "COMPLETE") {
const duration = run.duration;
const jobName = run.job.name;
// handle completion ...
} else if (run.state === "FAIL") {
const error = run.error;
// handle failure ...
}
Return type: RunResponse
| Field | Type | Description |
|---|
id | string | Run ID |
state | nullable string | Current state |
started_at | nullable string | ISO timestamp when the run started |
queued_at | nullable string | ISO timestamp when the run was queued |
scheduled_at | nullable string | ISO timestamp when the run was scheduled |
ended_at | nullable string | ISO timestamp when the run ended |
duration | nullable number | Run duration in seconds |
error | unknown | Error details if the run failed |
tags | array | Array of { key, value, source } objects |
job | object | Job info with id, name, namespace |
pipeline | object | Pipeline info with id, name, namespace |