Azure Deployment

Before You Start #

This guide assumes that you have already tried HPE Machine Learning Data Management locally and have all of the following installed:

Kubectl
Pachctl
Helm
Azure CLI.

Kubernetes & Openshift Version Support

Kubernetes: HPE Machine Learning Data Management supports the three most recent minor release versions of Kubernetes. If your Kubernetes version is not among these, it is End of Life (EOL) and unsupported. This ensures HPE Machine Learning Data Management users access to the latest Kubernetes features and bug fixes.
Openshift: HPE Machine Learning Data Management is compatible with OpenShift versions within the “Full Support” window.

Hardened Security and Dependency Considerations

If you are deploying in a hardened security environment, such as within the DoD community or other regulated sectors, consider downloading and installing HPE Machine Learning Data Management from Iron Bank, a hardened container registry.

MLDM images may be pulled from Iron Bank by updating the global registry setting in the MLDM Helm chart values.yaml to use registry1.dso.mil/, e.g.

global:
  ...
  image:
    registry: registry1.dso.mil/

Additionally, note that the MLDM Helm chart relies on the Bitnami image and its associated sub-chart. If the Bitnami image is unavailable or if your available PostgreSQL image cannot be managed through the Bitnami sub-chart, you will need to install PostgreSQL separately. Refer to Global Helm Chart Values for details on specifying your separate PostgreSQL instance. Also, refer to Non-Bundled Database Setup for more detail on using your own PostgreSQL instance with MLDM.

If you have questions, please reach out to your Customer Support Engineer for assistance before proceeding.

1. Create an AKS Cluster #

You can deploy Kubernetes on Azure by following the official Azure Kubernetes Service documentation, use the quickstart walkthrough, or follow the steps in this section.

At a minimum, you will need to specify the parameters below:

Variable	Description
RESOURCE_GROUP	A unique name for the resource group where HPE Machine Learning Data Management is deployed. For example, `pach-resource-group`.
LOCATION	An Azure availability zone where AKS is available. For example, `centralus`.
NODE_SIZE	The size of the Kubernetes virtual machine (VM) instances. To avoid performance issues, HPE Machine Learning Data Management recommends that you set this value to at least `Standard_DS4_v2` which gives you 8 CPUs, 28 Gib of Memory, 56 Gib SSD. In any case, use VMs that support premium storage. See Azure VM sizes for details around which sizes support Premium storage.
CLUSTER_NAME	A unique name for the HPE Machine Learning Data Management cluster. For example, `pach-aks-cluster`.

You can choose to follow the guided steps in Azure Service Portal’s Kubernetes Services or use Azure CLI.

Log in to Azure:
```
az login
```
This command opens a browser window. Log in with your Azure credentials. Resources can now be provisioned on the Azure subscription linked to your account.

Create an Azure resource group or retrieve an existing group.

az group create --name ${RESOURCE_GROUP} --location ${LOCATION}

Example:

az group create --name test-group --location centralus

System Response:

{
  "id": "/subscriptions/6c9f2e1e-0eba-4421-b4cc-172f959ee110/resourceGroups/pach-resource-group",
  "location": "centralus",
  "managedBy": null,
  "name": "pach-resource-group",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": null
}

Create an AKS cluster in the resource group/location:

For more configuration options: Find the list of all available flags of the az aks create command.

az aks create --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} --node-vm-size ${NODE_SIZE} --node-count <node_pool_count> --location ${LOCATION}

Example:

az aks create --resource-group test-group --name test-cluster --generate-ssh-keys --node-vm-size Standard_DS4_v2 --location centralus

Confirm the version of the Kubernetes server by running kubectl version.

Note

“See Also:” - Azure Virtual Machine sizes

Once your Kubernetes cluster is up, and your infrastructure configured, you are ready to prepare for the installation of HPE Machine Learning Data Management. Some of the steps below will require you to keep updating the values.yaml started during the setup of the recommended infrastructure:

2. Create a Storage Container #

HPE Machine Learning Data Management needs an Azure Storage Container (Object store) to store your data.

To access your data, HPE Machine Learning Data Management uses a Storage Account with permissioned access to your desired container. You can either use an existing account or create a new one in your default subscription, then use the JSON key associated with the account and pass it on to HPE Machine Learning Data Management.

Set up the following variables:
- STORAGE_ACCOUNT: The name of the storage account where you store your data.
- CONTAINER_NAME: The name of the Azure blob container where you store your data.

Create an Azure storage account:

az storage account create \
  --resource-group="${RESOURCE_GROUP}" \
  --location="${LOCATION}" \
  --sku=Premium_LRS \
  --name="${STORAGE_ACCOUNT}" \
  --kind=BlockBlobStorage

System response:

{
  "accessTier": null,
  "creationTime": "2019-06-20T16:05:55.616832+00:00",
  "customDomain": null,
  "enableAzureFilesAadIntegration": null,
  "enableHttpsTrafficOnly": false,
  "encryption": {
    "keySource": "Microsoft.Storage",
    "keyVaultProperties": null,
    "services": {
      "blob": {
        "enabled": true,
  ...

Make sure that you set Stock Keeping Unit (SKU) to Premium_LRS and the kind parameter is set to BlockBlobStorage. This configuration results in a storage that uses SSDs rather than standard Hard Disk Drives (HDD). If you set this parameter to an HDD-based storage option, your HPE Machine Learning Data Management cluster will be too slow and might malfunction.

Verify that your storage account has been successfully created:
```
az storage account list
```
Obtain the key for the storage account (STORAGE_ACCOUNT) and the resource group to be used to deploy HPE Machine Learning Data Management:
```
STORAGE_KEY="$(az storage account keys list \
              --account-name="${STORAGE_ACCOUNT}" \
              --resource-group="${RESOURCE_GROUP}" \
              --output=json \
              | jq '.[0].value' -r
            )"
```
Note

Find the generated key in the Storage accounts > Access keys section in the Azure Portal or by running the following command az storage account keys list --account-name=${STORAGE_ACCOUNT}.

Create a new storage container within your storage account:

az storage container create --name ${CONTAINER_NAME} \
          --account-name ${STORAGE_ACCOUNT} \
          --account-key "${STORAGE_KEY}"

3. Create a Values.yaml #

 deployTarget: "MICROSOFT"
 proxy:
  enabled: true
  service:
    type: LoadBalancer

 pachd:
   storage:
     backend: "AZURE"
     storageURL: "azblob://container-name"
     microsoft:
       id: "AKIAIOSFODNN7EXAMPLE"
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

 console:
   enabled: true

Warning

Setting up Authentication?

Do not use mockIDP for clusters that will be deployed into production. If you do upgrade a cluster with mockIDP enabled, you must revoke the default mockIDP admin user by running the following command:

pachctl auth revoke --user kilgore@kilgore.trout

 deployTarget: "MICROSOFT"
 proxy:
  enabled: true
  service:
    type: LoadBalancer
  host: <insert-external-ip-address-or-dns-name>  

 pachd:
   storage:
     backend: "AZURE"
     storageURL: "azblob://container-name"
     microsoft:
       id: "AKIAIOSFODNN7EXAMPLE"
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
   externalService:
     enabled: true
   enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"

 console:
   enabled: true

Tip

If your Postgres deployment requires SSL, you may need to set up additional parameters in the global section of your Helm Chart Values (HCVs)

4. Configure Helm #

Run the following to add the HPE Machine Learning Data Management repo to Helm:

helm repo add pachyderm https://helm.pachyderm.com
helm repo update
helm install pachyderm pachyderm/pachyderm -f my_pachyderm_values.yaml

5. Verify Installation #

In a new terminal, run the following command to check the status of your pods:

kubectl get pods

NAME                                           READY   STATUS      RESTARTS   AGE
pod/console-5b67678df6-s4d8c                   1/1     Running     0          2m8s
pod/etcd-0                                     1/1     Running     0          2m8s
pod/pachd-c5848b5c7-zwb8p                      1/1     Running     0          2m8s
pod/pg-bouncer-7b855cb797-jqqpx                1/1     Running     0          2m8s
pod/postgres-0                                 1/1     Running     0          2m8s

Re-run this command after a few minutes if pachd is not ready.

6. Connect to Cluster #

You’ll need your organization’s cluster URL (proxy.host) value to connect.

Run the following command to get your cluster URL:

kubectl get services | grep pachyderm-proxy | awk '{print $4}'

Connect to your cluster:
pachctl connect http://pachyderm.<your-proxy.host-value>
pachctl connect https://pachyderm.<your-proxy.host-value>
The above comment connects to the proxy using the 30650 port whereas the proxy is listening on the 80 port. Also, the command in its present format didn’t work for us as it could not connect to the server. We had to use
```
pachctl connect <proxy ip>:80 
```