Job
About #
A job is an execution of a pipeline triggered by new data detected in an input repository.
When a commit is made to the input repository of a pipeline, jobs are created for all downstream pipelines in a directed acyclic graph (DAG), but they do not run until the prior pipelines they depend on produce their output. Each job runs the user’s code against the current commit in a repository at a specified branch and then submits the results to the output repository of the pipeline as a single output commit.
Each job has a unique alphanumeric identifier (ID) that users can reference in the <pipeline>@<jobID>
format. Jobs have the following states:
Sate | Description |
---|---|
CREATED | An input commit exists, but the job has not been started by a worker yet. |
STARTING | The worker has allocated resources for the job (that is, the job counts towards parallelism), but it is still waiting on the inputs to be ready. |
UNRUNNABLE | The job could not be run, because one or more of its inputs is the result of a failed or unrunnable job. As a simple example, say that pipelines Y and Z both depend on the output from pipeline X. If pipeline X fails, both pipeline Y and Z will pass from STARTING to UNRUNNABLE to signify that they had to be cancelled because of upstream failures. |
RUNNING | The worker is processing datums. |
EGRESS | The worker has completed all the datums and is uploading the output to the egress endpoint. |
FINISHING | After all of the datum processing and egress (if any) is done, the job transitions to a finishing state where all of the post-processing tasks such as compaction are performed. |
FAILURE | The worker encountered too many errors when processing a datum. |
KILLED | The job timed out, or a user called StopJob |
SUCCESS | None of the bad stuff happened. |