Heterogeneous GPU Support

This guide explains how to set up HPE Machine Learning Inferencing Software to support heterogeneous GPU environments.

With heterogenous GPU support enabled, users can specify the desired GPU type to be used for their deployed inference service by passing an argument (--gpu-type) when defining a packaged model. The platform will then schedule the inference service on a node equipped with the specified GPU type.

  • If the Kubernetes cluster is configured to use taints to prevent GPU scheduling of a particular GPU type, then the gpuType is required to enable access that GPU.
  • If the specified GPU type is not available, the platform will fail to schedule the inference service.
  • If the Kubernetes cluster is not configured to use taints and no GPU type is specified, the platform will schedule the inference service on any node with the number of GPUs requested.

How to Add Heterogeneous GPU Support

Label Nodes with GPU Type Names

Use the following command to label the cluster’s nodes with the GPU type names you want to surface to your users:

kubectl label nodes <NODE_NAME> cloud.google.com/gke-accelerator=<GPU_TYPE_NAME>

Label Requirements

  • Label Name: Must be cloud.google.com/gke-accelerator
  • Value: The GPU type name you want to surface to your users

Example

If a node named MYNODE has an Nvidia A100 GPU, you could enable selection of the gpu type name nvidia-tesla-a100 with the following command:

kubectl label nodes MYNODE cloud.google.com/gke-accelerator=nvidia-tesla-a100

Configure Helm Chart

Configure the Helm chart’s gpuSelector section. This section’s required configuration changes based on the environment in which you are deploying the platform.

Install or Upgrade via Helm

After configuring the Helm chart, perform a Helm upgrade to apply the changes:

helm upgrade mlis \
  --set 'global.imagePullSecrets[0].name=regcred' \
  --set 'global.imagePullSecrets[1].name=hpe-mlis-registry' \
  --set imageRegistry=hub.myenterpriselicense.hpe.com/hpe-mlis/<SKU> \
  --set defaultPassword=<CREATE_ADMIN_PASSWORD> \
  --values values.yaml \ 
  <SKU>_aioli-helm-chart<release/majorMinorPatchNumber>}}.tgz

Test

GKE

The following assumes that gpuSelector.gke: true is set in the Helm chart’s values.yaml file.

Test the setup by deploying a packaged model configured with the --gpu-type nvidia-tesla-a100 argument. The platform should then schedule the inference service on a node labeled with cloud.google.com/gke-accelerator=nvidia-tesla-a100.

You can see what nodes the pod may be scheduled on by checking the pod’s labels:

kubectl get pods --show-labels

To see what pods can use those nodes, check the node’s labels:

  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-a100

You can display the podnames and their node labels together with the following command:

kubectl get pods -Ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeSelector}{"\n"}{end}' | grep nvidia-tesla-a100
starcoder-predictor-00004-deployment-5c75bfbdbb-mcbz6 {"cloud.google.com/gke-accelerator":"nvidia-tesla-a100"}

Taints & Tolerations

You can check the tolerations for a specific inference service (or pod) by running the following command:

kubectl get inferenceservices.serving.kserve.io -o yaml starcoder | grep -A4 tolerations
tolerations:
- effect: NoSchedule
  key: accelerator
  operator: Equal
  value: nvidia-tesla-a100

Alternatively, you can check the tolerations for all pods by running the following command:

kubectl get pods -Ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.tolerations}{"\n"}{end}' | grep nvidia-tesla-a100
starcoder-predictor-00004-deployment-5c75bfbdbb-mcbz6 [{"effect":"NoSchedule","key":"accelerator","operator":"Equal","value":"nvidia-tesla-a100"},{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300},{"effect":"NoSchedule","key":"nvidia.com/gpu","operator":"Exists"}]