GCP Deployment

Before You Start

This guide assumes that:

Kubernetes & Openshift Version Support
  • Kubernetes: HPE Machine Learning Data Management supports the three most recent minor release versions of Kubernetes. If your Kubernetes version is not among these, it is End of Life (EOL) and unsupported. This ensures HPE Machine Learning Data Management users access to the latest Kubernetes features and bug fixes.
  • Openshift: HPE Machine Learning Data Management is compatible with OpenShift versions within the “Full Support” window.

Configure Variables

  1. Configure the following variables.

    Optional Variables for Scripting

  2. Save to an .env file.
  3. Source them by inputting source .env into the terminal before starting the installation guide.

The following steps use a template to create a GKE cluster, a Cloud SQL instance, and a static IP address. The template also creates a service account for HPE Machine Learning Data Management and Loki, and grants the service account the necessary permissions to access the Cloud SQL instance and storage buckets. You do not have to use this template, but it’s a good outline for understanding how to create your own set up.

1. Create a New Project

  1. Create a new project (e.g.,pachyderm-quickstart-project). You can pre-define the project ID using between 6-30 characters, starting with a lowercase letter. This ID will be used to set up the cluster and will be referenced throughout this guide.

    gcloud projects create ${PROJECT_ID} --name=${PROJECT_NAME} --set-as-default
    gcloud alpha billing projects link ${PROJECT_ID} --billing-account=${BILLING_ACCOUNT_ID}
  2. Enable the following APIs:

    gcloud services enable container.googleapis.com
    gcloud services enable sqladmin.googleapis.com
    gcloud services enable compute.googleapis.com

2. Create a Static IP Address

  1. Create the static IP Address:
    gcloud compute addresses create ${STATIC_IP_NAME} --region=${GCP_REGION} 
  2. Get the static IP address:
    STATIC_IP_ADDR=$(gcloud compute addresses describe ${STATIC_IP_NAME} --region=${GCP_REGION} --format="json" --flatten="address" | jq -r '.[]')

3. Create a GKE Cluster

  1. Create a GKE cluster with the following command:

    gcloud container clusters create ${CLUSTER_NAME} \
      --region=${GCP_REGION} \
      --machine-type=${CLUSTER_MACHINE_TYPE} \
      --workload-pool=${PROJECT_ID}.svc.id.goog \
      --enable-ip-alias \
      --create-subnetwork="" \
      --logging=${LOGGING} \
      --enable-dataplane-v2 \
      --enable-shielded-nodes \
      --release-channel="regular" \
      --workload-metadata="GKE_METADATA" \
      --enable-autorepair \
      --enable-autoupgrade \
      --disk-type="pd-ssd" \
      --image-type="COS_CONTAINERD"
  2. Grant your user account the privileges needed for the helm install to work properly:

    # By default, GKE clusters have RBAC enabled. To allow the 'helm install' to give the 'pachyderm' service account
    # the requisite privileges via clusterrolebindings, you will need to grant *your user account* the privileges
    # needed to create those clusterrolebindings.
    #
    # Note that this command is simple and concise, but gives your user account more privileges than necessary. See
    # https://docs.pachyderm.io/en/latest/deploy-manage/deploy/rbac/ for the complete list of privileges that the
    # Pachydermserviceaccount needs.
    kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)
  3. Connect to the cluster:

    gcloud container clusters get-credentials ${CLUSTER_NAME} --region=${GCP_REGION}

4. Create Storage Buckets

gsutil mb -l ${GCP_REGION} gs://${BUCKET_NAME}
gsutil mb -l ${GCP_REGION} gs://${LOKI_BUCKET_NAME}

5. Create a Cloud SQL Instance

  1. Create a Cloud SQL instance with the following command:
    gcloud sql instances create ${CLOUDSQL_INSTANCE_NAME} \
      --database-version=POSTGRES_14 \
      --cpu=${SQL_CPU} \
      --memory=${SQL_MEM} \
      --zone=${GCP_ZONE} \
      --availability-type=ZONAL \
      --storage-size=50GB \
      --storage-type=SSD \
      --storage-auto-increase \
      --root-password=${SQL_ADMIN_PASSWORD}
  2. Create a databases for HPE Machine Learning Data Management and Dex:
    gcloud sql databases create pachyderm -i ${CLOUDSQL_INSTANCE_NAME}
    gcloud sql databases create dex -i ${CLOUDSQL_INSTANCE_NAME}
  3. Get the Cloud SQL connection name:
    CLOUDSQL_CONNECTION_NAME=$(gcloud sql instances describe ${CLOUDSQL_INSTANCE_NAME} --format=json | jq ."connectionName")

6. Create Service Accounts

Create a service account for HPE Machine Learning Data Management and Loki.

gcloud iam service-accounts create ${GSA_NAME}

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}" \
    --role="${ROLE1}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${SERVICE_ACCOUNT}" \
    --role="${ROLE2}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member=serviceAccount:${SERVICE_ACCOUNT} \
    --role="${ROLE3}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${PACH_WI}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${SIDECAR_WI}"

gcloud iam service-accounts add-iam-policy-binding ${SERVICE_ACCOUNT} \
    --role roles/iam.workloadIdentityUser \
    --member "${CLOUDSQLAUTHPROXY_WI}"

gcloud iam service-accounts create ${LOKI_GSA_NAME}

gcloud iam service-accounts keys create "${LOKI_GSA_NAME}-key.json" --iam-account="${LOKI_SERVICE_ACCOUNT}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${LOKI_SERVICE_ACCOUNT}" \
    --role="${ROLE2}"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${LOKI_SERVICE_ACCOUNT}" \
    --role="${ROLE3}"

7. Create a Loki Secret

kubectl create secret generic loki-service-account --from-file="${LOKI_GSA_NAME}-key.json"

8. Build a Helm Values File

Warning

Setting up Authentication?

Do not use mockIDP for clusters that will be deployed into production. If you do upgrade a cluster with mockIDP enabled, you must revoke the default mockIDP admin user by running the following command:

pachctl auth revoke --user kilgore@kilgore.trout
  1. Create a values.yaml file, inserting the variables we’ve created in the previous steps:

    Values.yaml

    Info
    If your Postgres deployment requires SSL, you may need to update the parameters in the global section of your Helm Chart Values (HCVs)

  2. Install using the following command:

    helm repo add pachyderm https://helm.pachyderm.com
    helm repo update
    helm install pachyderm -f ./${NAME}.values.yaml pachyderm/pachyderm

9. Connect to Cluster

You’ll need your organization’s cluster URL ( proxy.host) value to connect.

  1. Run the following command to get your cluster URL:

    kubectl get services | grep pachyderm-proxy | awk '{print $4}'
  2. Connect to your cluster:

    pachctl connect grpc://<your-proxy.host-value>:80
    pachctl connect grpcs://<your-proxy.host-value>:443
  3. You can optionally run port-forward to connect to console in your dashboard at http://localhost:4000/.