GCP Deployment

Before You Start

This guide assumes that:

Kubernetes & Openshift Version Support
  • Kubernetes: HPE Machine Learning Data Management supports the three most recent minor release versions of Kubernetes. If your Kubernetes version is not among these, it is End of Life (EOL) and unsupported. This ensures HPE Machine Learning Data Management users access to the latest Kubernetes features and bug fixes.
  • Openshift: HPE Machine Learning Data Management is compatible with OpenShift versions within the “Full Support” window.
Hardened Security and Dependency Considerations

If you are deploying in a hardened security environment, such as within the DoD community or other regulated sectors, consider downloading and installing HPE Machine Learning Data Management from Iron Bank, a hardened container registry.

MLDM images may be pulled from Iron Bank by updating the global registry setting in the MLDM Helm chart values.yaml to use registry1.dso.mil/, e.g.

global:
  ...
  image:
    registry: registry1.dso.mil/

Additionally, note that the MLDM Helm chart relies on the Bitnami image and its associated sub-chart. If the Bitnami image is unavailable or if your available PostgreSQL image cannot be managed through the Bitnami sub-chart, you will need to install PostgreSQL separately. Refer to Global Helm Chart Values for details on specifying your separate PostgreSQL instance. Also, refer to Non-Bundled Database Setup for more detail on using your own PostgreSQL instance with MLDM.

If you have questions, please reach out to your Customer Support Engineer for assistance before proceeding.

Configure Variables

  1. Configure the following variables.

    Optional Variables for Scripting

  2. Save to an .env file.
  3. Source them by inputting source .env into the terminal before starting the installation guide.

The following steps use a template to create a GKE cluster, a Cloud SQL instance, and a static IP address. The template also creates a service account for HPE Machine Learning Data Management and Loki, and grants the service account the necessary permissions to access the Cloud SQL instance and storage buckets. You do not have to use this template, but it’s a good outline for understanding how to create your own set up.

1. Create a New Project

  1. Create a new project (e.g.,pachyderm-quickstart-project). You can pre-define the project ID using between 6-30 characters, starting with a lowercase letter. This ID will be used to set up the cluster and will be referenced throughout this guide.

    gcloud projects create ${PROJECT_ID} --name=${PROJECT_NAME} --set-as-default
    gcloud alpha billing projects link ${PROJECT_ID} --billing-account=${BILLING_ACCOUNT_ID}
  2. Enable the following APIs:

    gcloud services enable container.googleapis.com
    gcloud services enable sqladmin.googleapis.com
    gcloud services enable compute.googleapis.com

2. Create a Static IP Address

  1. Create the static IP Address:
    gcloud compute addresses create ${STATIC_IP_NAME} --region=${GCP_REGION} 
  2. Get the static IP address:
    STATIC_IP_ADDR=$(gcloud compute addresses describe ${STATIC_IP_NAME} --region=${GCP_REGION} --format="json" --flatten="address" | jq -r '.[]')

3. Create a GKE Cluster

  1. Create a GKE cluster with the following command:

    gcloud container clusters create ${CLUSTER_NAME} \
      --region=${GCP_REGION} \
      --machine-type=${CLUSTER_MACHINE_TYPE} \
      --workload-pool=${PROJECT_ID}.svc.id.goog \
      --enable-ip-alias \
      --create-subnetwork="" \
      --logging=${LOGGING} \
      --enable-dataplane-v2 \
      --enable-shielded-nodes \
      --release-channel="regular" \
      --workload-metadata="GKE_METADATA" \
      --enable-autorepair \
      --enable-autoupgrade \
      --disk-type="pd-ssd" \
      --image-type="COS_CONTAINERD"
  2. Grant your user account the privileges needed for the helm install to work properly:

    # By default, GKE clusters have RBAC enabled. To allow the 'helm install' to give the 'pachyderm' service account
    # the requisite privileges via clusterrolebindings, you will need to grant *your user account* the privileges
    # needed to create those clusterrolebindings.
    #
    # Note that this command is simple and concise, but gives your user account more privileges than necessary. See
    # https://docs.pachyderm.io/en/latest/deploy-manage/deploy/rbac/ for the complete list of privileges that the
    # Pachydermserviceaccount needs.
    kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)
  3. Connect to the cluster:

    gcloud container clusters get-credentials ${CLUSTER_NAME} --region=${GCP_REGION}

4. Create Storage Buckets

gsutil mb -l ${GCP_REGION} gs://${BUCKET_NAME}
gsutil mb -l ${GCP_REGION} gs://${LOKI_BUCKET_NAME}

5. Create a Cloud SQL Instance

  1. Create a Cloud SQL instance with the following command:
    gcloud sql instances create ${CLOUDSQL_INSTANCE_NAME} \
      --database-version=POSTGRES_14 \
      --cpu=${SQL_CPU} \
      --memory=${SQL_MEM} \
      --zone=${GCP_ZONE} \
      --availability-type=ZONAL \
      --storage-size=50GB \
      --storage-type=SSD \
      --storage-auto-increase \
      --root-password=${SQL_ADMIN_PASSWORD}
  2. Create a databases for HPE Machine Learning Data Management and Dex:
    gcloud sql databases create pachyderm -i ${CLOUDSQL_INSTANCE_NAME}
    gcloud sql databases create dex -i ${CLOUDSQL_INSTANCE_NAME}
  3. Get the Cloud SQL connection name:
    CLOUDSQL_CONNECTION_NAME=$(gcloud sql instances describe ${CLOUDSQL_INSTANCE_NAME} --format=json | jq ."connectionName")

6. Create Service Accounts

Create a service account for HPE Machine Learning Data Management and Loki.

gcloud iam service-accounts create ${GSA_NAME}

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}" \
    --role="${ROLE1}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}" \
    --role="${ROLE2}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${SERVICE_ACCOUNT} \
    --role="${ROLE3}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${PACH_WI}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${SIDECAR_WI}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${CLOUDSQLAUTHPROXY_WI}"

gcloud iam service-accounts create ${LOKI_GSA_NAME}

gcloud iam service-accounts keys create "${LOKI_GSA_NAME}-key.json" --iam-account="${LOKI_SERVICE_ACCOUNT}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${LOKI_SERVICE_ACCOUNT}" \
    --role="${ROLE2}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${LOKI_SERVICE_ACCOUNT}" \
    --role="${ROLE3}"

7. Create a Loki Secret

kubectl create secret generic loki-service-account --from-file="${LOKI_GSA_NAME}-key.json"

8. Build a Helm Values File

Warning

Setting up Authentication?

Do not use mockIDP for clusters that will be deployed into production. If you do upgrade a cluster with mockIDP enabled, you must revoke the default mockIDP admin user by running the following command:

pachctl auth revoke --user kilgore@kilgore.trout
  1. Create a values.yaml file, inserting the variables we’ve created in the previous steps:

    Values.yaml

    Info
    If your Postgres deployment requires SSL, you may need to update the parameters in the global section of your Helm Chart Values (HCVs)

  2. Install using the following command:

    helm repo add pachyderm https://helm.pachyderm.com
    helm repo update
    helm install pachyderm -f ./${NAME}.values.yaml pachyderm/pachyderm

9. Connect to Cluster

You’ll need your organization’s cluster URL ( proxy.host) value to connect.

  1. Run the following command to get your cluster URL:

    kubectl get services | grep pachyderm-proxy | awk '{print $4}'
  2. Connect to your cluster:

  3. You can optionally run port-forward to connect to console in your dashboard at http://localhost:4000/.