Update a Pipeline
While working with your data, you often need to modify an existing pipeline with new transformation code or pipeline parameters.
Use the pachctl update pipeline
command to make changes to a pipeline,
whether you have re-built a docker image after a code change and/or
need to update pipeline parameters in the pipeline specification file.
Alternatively, you can update a pipeline using jsonnet pipeline specification files.
After You Changed Your Specification File #
Run the pachctl update pipeline
command to apply any change to your
pipeline specification JSON file, such as change to the
parallelism settings, change of an image tag, change of an input repository, etc…
By default, a pipeline update does not trigger the reprocessing of the data
that has already been processed. Instead,
it processes only the new data you submit to the input repo.
If you want to run the changes in your pipeline against the data in
your input repo’s HEAD
commit, use the --reprocess
flag.
The updated pipeline will then continue to process new input data only.
Previous results remain accessible through the corresponding commit IDs.
To update a pipeline, run the following command after you have updated your pipeline specification JSON file.
pachctl update pipeline -f pipeline.json
create pipeline
, update pipeline
with the -f
flag can take a URL if your JSON manifest is hosted on GitHub or other remote location.Using Jsonnet Pipeline Specification Files #
Jsonnet pipeline specs allow you to bypass the “update-your -specification-file” step and apply your changes at once by running:
pachctl update pipeline --jsonnet <your jsonnet pipeline specs path or URL> --arg <param 1>=<value 1> --arg <param 2>=<value 2>
Example #
pachctl update pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg tag=1.0.2
Update the Code in a Pipeline #
To update the code in your pipeline, complete the following steps:
-
Make the code changes.
-
Verify that the Docker daemon is running. Depending on your operating system and the Docker distribution that you use, steps for enabling it might vary:
docker ps
If you get an error message similar to the following:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
enable the Docker daemon (see the Docker documentation for your operating system and platform). For example, if you use
minikube
on macOS, run the following command:eval $(minikube docker-env)
Then build, tag, and push the new image to your image registry and update the pipeline. This step comes in 3 flavors:
If you prefer to use instructions from your image registry #
- Build, tag, and push a new image as described in your image registry documentation. For example, if you use DockerHub, see Docker Documentation.
- Update the
transform.image
field of your pipeline spec with your new tag.TipMake sure to update your tag every time you re-build. Our pull policy is IfNotPresent
(Only pull the image if it does not already exist on the node.). Failing to update your tag will result in your pipeline running on a previous version of your code. - Update the pipeline:
pachctl update pipeline -f <pipeline.json>
If you chose to use a jsonnet version of your pipeline specs #
-
Pass the tag of your image to your jsonnet specs.
As an example, see the
tag
parameter in this jsonnet version of opencv’s edges pipeline (edges.jsonnet
):
////
// Template arguments:
//
// suffix : An arbitrary suffix appended to the name of this pipeline, for
// disambiguation when multiple instances are created.
// src : the repo from which this pipeline will read the images to which
// it applies edge detection.
////
function(suffix, src)
{
pipeline: { name: "edges-"+suffix },
description: "OpenCV edge detection on "+src,
input: {
pfs: {
name: "images",
glob: "/*",
repo: src,
}
},
transform: {
cmd: [ "python3", "/edges.py" ],
image: "pachyderm/opencv:0.0.1"
}
}
-
Once your pipeline code is updated and your image is built, tagged, and pushed, update your pipeline using this command line. In this case, there is no need to edit the pipeline specification file to update the value of your new tag. This command will take care of it:
pachctl update pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg tag=1.0.2
If you use HPE Machine Learning Data Management commands #
- Build your new image using
docker build
(for example, in a makefile:@docker build --platform linux/amd64 -t $(DOCKER_ACCOUNT)/$(CONTAINER_NAME) .
). No tag needed, the following--push-images
flag will take care of it. - Run the following command:
If you use DockerHub, omit thepachctl update pipeline -f <pipeline name> --push-images --registry <registry> --username <registry user>
--registry
flag. Example:pachctl update pipeline -f edges.json --push-images --username testuser
- When prompted, type your image registry password:
Example:
Password for docker.io/testuser: Building pachyderm/opencv:f1e0239fce5441c483b09de425f06b40, this may take a while.