Disaster Recovery
This page outlines the disaster scenarios that can affect your HPE Machine Learning Data Management instance and provides mitigation strategies to help you recover from them. Note that the strategies provided are generalized and may need to be adapted to your specific setup/environment.
Disaster Scenarios #
Postgres Database Loss #
A corrupted Postgres database will result in the loss of all of your data. This is due to the fact that we store the keys needed to de-encrypt your data chunks in the Postgres database. If the Postgres database is lost, the data is lost, regardless of whether the object store is still intact.
Mitigation Strategy #
- Back up your Postgres database before performing minor release upgrades.
- Periodically back up your Postgres database to ensure that you have a checkpoint to restore from in the event of a failure.
Object Store Loss #
It can be hard to back up the object store since it is very large. However, if you lose the object store, you will lose all of your data.
Mitigation Strategy #
Cloud Object Store #
Object storage is maintained by your chosen cloud provider, so you should be able to rely on their durability guarantees. It may be worth considering a multi-region object store for additional redundancy. You can also look into cold storage options for long-term retention of routine snapshots.
To back up the object store, you can either download all objects or use the object store provider’s backup method. See the following links for more information:
On-Premises Object Store #
If you are using an on-premises object store, you should have a backup strategy in place. This could involve replicating the object store to another location or taking regular snapshots of the object store. See your vendor’s documentation for more information on how to back up your object store.
Restoration #
See the Restoration guide for a high-level overview of the cluster restoration process.