Snapshots

This article explains how to create and manage snapshots of your HPE Machine Learning Data Management cluster. Creating snapshots is essential before performing cluster upgrades to ensure you can recover your cluster state if needed.

Understanding Snapshots #

A HPE Machine Learning Data Management snapshot is a complete backup of your cluster state, consisting of:

Postgres Snapshot: A pg_dump of the database containing cluster state information
ChunkSet: A collection of live data chunks from object storage at the time of backup

You can use these snapshots to restore your cluster to a previous state during Helm upgrades.

helm upgrade pachd pachyderm/pachyderm \
  --set restoreSnapshot.enabled=true \
  --set restoreSnapshot.snapshot_id=42 \
  --reuse-values

How to Manage Snapshots #

Create a Snapshot #

Run pachctl create snapshot.

Obtain the ID of the snapshot by running pachctl list snapshot.

ID CHUNKSET CREATED 
2  2        3 seconds ago
1  1        58 seconds ago

List Available Snapshots #

If you create snapshots routinely, you can list all available snapshots by running pachctl list snapshot.

pachctl list snapshot

Inspect a Snapshot #

You can inspect snapshots to verify the contents by running pachctl inspect snapshot <SNAPSHOT_ID>.

pachctl inspect snapshot <SNAPSHOT_ID>

ID: 1
Chunkset: 1
Created: 2 minutes ago
Version: v2.12.0
Fileset: fa492645343dff8e0ed7f6685c8b6228.d724eb2b1bf365148b8f4ffa746a50cfc2446f7ca5fa713f4ef8a757b0a792fe

Delete a Snapshot #

Typically, you should delete snapshots after successfully upgrading your cluster.

pachctl delete snapshot <SNAPSHOT_ID>