Heterogeneous GPU Support
This guide explains how to set up HPE Machine Learning Inferencing Software to support heterogeneous GPU environments.
With heterogenous GPU support enabled, users can specify the desired GPU type to be used for their deployed inference service by passing an argument (--gpu-type
) when defining a packaged model. The platform will then schedule the inference service on a node equipped with the specified GPU type.
- If the Kubernetes cluster is configured to use taints to prevent GPU scheduling of a particular GPU type, then the
gpuType
is required to enable access that GPU. - If the specified GPU type is not available, the platform will fail to schedule the inference service.
- If the Kubernetes cluster is not configured to use taints and no GPU type is specified, the platform will schedule the inference service on any node with the number of GPUs requested.
How to Add Heterogeneous GPU Support #
Label Nodes with GPU Type Names #
Use the following command to label the cluster’s nodes with the GPU type names you want to surface to your users:
kubectl label nodes <NODE_NAME> cloud.google.com/gke-accelerator=<GPU_TYPE_NAME>
Label Requirements #
- Label Name: Must be
cloud.google.com/gke-accelerator
- Value: The GPU type
name
you want to surface to your users
Example #
If a node named MYNODE
has an Nvidia A100 GPU, you could enable selection of the gpu type name nvidia-tesla-a100
with the following command:
kubectl label nodes MYNODE cloud.google.com/gke-accelerator=nvidia-tesla-a100
Configure Helm Chart #
Configure the Helm chart’s gpuSelector
section. This section’s required configuration changes based on the environment in which you are deploying the platform.
Install or Upgrade via Helm #
After configuring the Helm chart, perform a Helm upgrade to apply the changes:
helm upgrade mlis \
--set 'global.imagePullSecrets[0].name=regcred' \
--set 'global.imagePullSecrets[1].name=hpe-mlis-registry' \
--set imageRegistry=hub.myenterpriselicense.hpe.com/hpe-mlis/<SKU> \
--set defaultPassword=<CREATE_ADMIN_PASSWORD> \
--values values.yaml \
<SKU>_aioli-helm-chart<release/majorMinorPatchNumber>}}.tgz
Test #
GKE #
The following assumes that gpuSelector.gke: true
is set in the Helm chart’s values.yaml
file.
Test the setup by deploying a packaged model configured with the --gpu-type nvidia-tesla-a100
argument. The platform should then schedule the inference service on a node labeled with cloud.google.com/gke-accelerator=nvidia-tesla-a100
.
You can see what nodes the pod may be scheduled on by checking the pod’s labels:
kubectl get pods --show-labels
To see what pods can use those nodes, check the node’s labels:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-a100
You can display the podnames and their node labels together with the following command:
kubectl get pods -Ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.nodeSelector}{"\n"}{end}' | grep nvidia-tesla-a100
starcoder-predictor-00004-deployment-5c75bfbdbb-mcbz6 {"cloud.google.com/gke-accelerator":"nvidia-tesla-a100"}
Taints & Tolerations #
You can check the tolerations for a specific inference service (or pod) by running the following command:
kubectl get inferenceservices.serving.kserve.io -o yaml starcoder | grep -A4 tolerations
tolerations:
- effect: NoSchedule
key: accelerator
operator: Equal
value: nvidia-tesla-a100
Alternatively, you can check the tolerations for all pods by running the following command:
kubectl get pods -Ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.tolerations}{"\n"}{end}' | grep nvidia-tesla-a100
starcoder-predictor-00004-deployment-5c75bfbdbb-mcbz6 [{"effect":"NoSchedule","key":"accelerator","operator":"Equal","value":"nvidia-tesla-a100"},{"effect":"NoExecute","key":"node.kubernetes.io/not-ready","operator":"Exists","tolerationSeconds":300},{"effect":"NoExecute","key":"node.kubernetes.io/unreachable","operator":"Exists","tolerationSeconds":300},{"effect":"NoSchedule","key":"nvidia.com/gpu","operator":"Exists"}]