Skip to main content

getSparkCluster({ name })

Look up Spark cluster details before submitting a job. Use oleander for the built-in managed cluster or the name of a registered external cluster.
const cluster = await client.getSparkCluster({ name: "emr-prod" });
console.log(cluster.type, cluster.properties);

Return shape

FieldTypeDescription
namestringCluster name
typestringoleander, emr-serverless, or glue
propertiesunknownCluster-specific configuration

listSparkJobs(options?)

List your uploaded Spark job scripts with pagination.
const { scripts, hasMore } = await client.listSparkJobs();

let offset = 0;
const allScripts: string[] = [];
while (true) {
  const page = await client.listSparkJobs({ limit: 50, offset });
  allScripts.push(...page.scripts);
  if (!page.hasMore) break;
  offset += 50;
}

Parameters

options.limit
number
default:"20"
Number of scripts to return per page.
options.offset
number
default:"0"
Number of scripts to skip for pagination.

Return type: ListSparkJobsResult

FieldTypeDescription
scriptsstring[]Script names for the current page
hasMorebooleanWhether more scripts are available

submitSparkJob(options)

Submit a Spark job for execution. cluster defaults to oleander. For oleander-managed Spark, entrypoint is the uploaded script name. For external clusters, entrypoint is cluster-specific, such as an S3 URI for EMR Serverless or a Glue job name for Glue.
const { runId } = await client.submitSparkJob({
  namespace: "my-namespace",
  name: "daily-etl",
  entrypoint: "etl_pipeline.py",
  args: ["--date", "2026-03-11"],
  executorNumbers: 4,
});

const run = await client.getRun(runId);

Common options

OptionTypeDefaultDescription
clusterstring"oleander"Managed cluster or registered cluster name
namespacestringJob namespace
namestringJob name
entrypointstringScript name, S3 URI, or Glue job name depending on cluster
argsstring[][]Entrypoint arguments
sparkConfstring[][]Spark configuration values
packagesstring[][]Extra package coordinates
jobTagsstring[][]Tags applied to the job
runTagsstring[][]Tags applied to this run

Oleander-managed options

OptionTypeDefaultDescription
driverMachineTypeSparkMachineTypespark.1.bDriver machine type
executorMachineTypeSparkMachineTypespark.1.bExecutor machine type
executorNumbersnumber2Number of executors, from 1 to 20

EMR Serverless options

OptionTypeDescription
pyFilesstringZip archive of Python dependencies
mainClassstringMain class for JVM jobs
executionIamPolicystringIAM policy applied to execution

Glue options

OptionTypeDefaultDescription
workerTypestringGlue worker type
numberOfWorkersnumber1Number of Glue workers
enableAutoScalingbooleanEnable Glue auto scaling
timeoutMinutesnumberTimeout in minutes
executionClassstringGlue execution class such as STANDARD or FLEX
executionIamPolicystringIAM policy applied to execution
For Glue jobs, args are converted into key-value pairs. Pass them as alternating entries such as ["--source", "s3://bucket/input", "--target", "s3://bucket/output"].

External cluster example

const { runId } = await client.submitSparkJob({
  cluster: "emr-prod",
  namespace: "finance",
  name: "daily-etl",
  entrypoint: "s3://my-bucket/jobs/etl_pipeline.py",
  args: ["--date", "2026-03-11"],
  pyFiles: "s3://my-bucket/jobs/deps.zip",
  packages: ["org.example:my-lib:1.0.0"],
});

Machine types

The SparkMachineType enum covers compute-optimized (c), balanced (b), and memory-optimized (m) options:
TypevCPUsCategory
spark.1.c / spark.1.b / spark.1.m1Compute / Balanced / Memory
spark.2.c / spark.2.b / spark.2.m2Compute / Balanced / Memory
spark.4.c / spark.4.b / spark.4.m4Compute / Balanced / Memory
spark.8.c / spark.8.b / spark.8.m8Compute / Balanced / Memory
spark.16.c / spark.16.b / spark.16.m16Compute / Balanced / Memory

submitSparkJobAndWait(options)

Submit a Spark job and poll until it reaches a terminal state (COMPLETE, FAIL, or ABORT). Throws an error if the timeout is exceeded.
const { runId, state, run } = await client.submitSparkJobAndWait({
  namespace: "my-namespace",
  name: "daily-etl",
  entrypoint: "etl_pipeline.py",
  pollIntervalMs: 5000,
  timeoutMs: 300000,
});

if (state === "COMPLETE") {
  const elapsed = run.duration;
  // proceed with downstream work ...
} else {
  throw new Error(`Run ${runId} ended with state: ${state}`);
}
Accepts all submitSparkJob options plus:
options.pollIntervalMs
number
default:"10000"
Milliseconds between status polls.
options.timeoutMs
number
default:"600000"
Maximum time to wait in milliseconds before throwing a timeout error.

getRun(runId)

Get the current status of a run. Use this to poll a job submitted with submitSparkJob().
const run = await client.getRun(runId);

if (run.state === "COMPLETE") {
  const duration = run.duration;
  const jobName = run.job.name;
  // handle completion ...
} else if (run.state === "FAIL") {
  const error = run.error;
  // handle failure ...
}

Return type: RunResponse

FieldTypeDescription
idstringRun ID
statenullable stringCurrent state
started_atnullable stringISO timestamp when the run started
queued_atnullable stringISO timestamp when the run was queued
scheduled_atnullable stringISO timestamp when the run was scheduled
ended_atnullable stringISO timestamp when the run ended
durationnullable numberRun duration in seconds
errorunknownError details if the run failed
tagsarrayArray of { key, value, source } objects
jobobjectJob info with id, name, namespace
pipelineobjectPipeline info with id, name, namespace