Canary Rollout

About

A canary rollout is a deployment strategy that allows you to test new features or changes on a small subset of users before rolling out to the entire user base. This strategy helps you to identify potential issues and gather feedback before making the changes available to everyone.

Rollout Behavior

How a canary rollout behaves depends on the traffic percentage (--canary-traffic-percent) set for the new model version. The traffic percentage you provide maps to the following behaviors:

  • 100%: Triggers a full, immediate rollout to the new version instead of performing a canary rollout. The prior model version stops serving traffic as soon as the new version is properly serving requests.
  • 1-99%: Triggers a canary rollout where both versions serve requests; the defined canary traffic percentage is sent to the new instance once it has become ready.
  • 0%: Cancels the canary rollout and reverts all traffic back to the prior model version deployed.

Once you are satisfied with the new model version’s performance, complete the rollout by updating the canary traffic percentage to 100%.

Canceled Canary Rollouts
If you cancel a canary rollout and set its traffic to 0%, the inference service will remain present but handle no requests and continue to consume any GPU resources assigned until you perform another rollout. This can be either a 100% roll out of the prior version, or a new rollout of another model version.