Blob/Object Storage
You can enable blog/object storage for HPE Machine Learning Data Management by updating the pachd.storage
section in your Helm chart. The necessary configuration options depend on your chosen blob storage provider.
Before You Start #
- Ensure you have a blob storage provider account (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage).
- Ensure you have the necessary credentials and permissions to create and access a bucket or container in your blob storage provider.
- For production use, make sure your cloud provider credentials are properly configured in your environment.
Query Parameters for Storage URLs #
The available query parameters for each storage provider are not exhaustively documented in a single location. They are implemented in the Go CDK source code and can change with new releases. However, you can find the most commonly used parameters in the following locations:
- AWS S3: s3blob.URLOpener and aws.ConfigFromURLParams
- Azure Blob: azureblob.URLOpener
- Google Cloud Storage: gcsblob.URLOpener
Common Query Parameters #
Provider | Parameter | Description |
---|---|---|
AWS S3 | region |
The AWS region (e.g., “us-west-1”) |
AWS S3 | endpoint |
Custom endpoint for S3-compatible storage |
AWS S3 | disableSSL |
Set to “true” to disable SSL |
AWS S3 | s3ForcePathStyle |
Set to “true” to force path-style addressing |
AWS S3 | awssdk |
Set to “v2” to use the AWS SDK v2 |
Azure Blob | domain |
Custom domain for the storage account |
Azure Blob | protocol |
Set to “http” for local development with Azurite |
Azure Blob | cdn |
Can be set to “true” when using a CDN URL pointing to a blob storage account |
Azure Blob | localemu |
Set to “true” to use the Azurite emulator |
Google Cloud Storage | access_id |
HMAC Access ID for non-OAuth authentication |
Google Cloud Storage | private_key_path |
Path to service account JSON key file |
If you need to use a parameter not listed here, consult the Go CDK source code or reach out to Pachyderm support for guidance.
How to Set Up Blob Storage #
- Navigate to your
values.yaml
file or obtain your current Helmvalues.yaml
overrides:helm get values pachyderm > values.yaml
- Add the following
pachd.storage
fields to yourvalues.yaml
file:pachd: storage: gocdkEnabled: true storageURL: # The URL for your blob storage provider
- Update the storageURL to include provider-specific configuration options as needed; for options, see the related goCDK packages. For example:
"s3://my-bucket?region=us-west-1&awssdk=v2" "gs://${BUCKET_NAME}" "azblob://my-container?protocol=http&domain=localhost:10001"
- Save your changes and upgrade your cluster:
helm upgrade pachyderm pachyderm/pachyderm -f values.yaml
Limitations #
Some configuration settings such as verifySSL may not be passable via the storageURL as query parameters. In such cases, you can use the pachd.storage
section to set these options.