Blob/Object Storage

You can enable blog/object storage for HPE Machine Learning Data Management by updating the pachd.storage section in your Helm chart. The necessary configuration options depend on your chosen blob storage provider.

Before You Start

  • Ensure you have a blob storage provider account (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage).
  • Ensure you have the necessary credentials and permissions to create and access a bucket or container in your blob storage provider.
  • For production use, make sure your cloud provider credentials are properly configured in your environment.

Query Parameters for Storage URLs

The available query parameters for each storage provider are not exhaustively documented in a single location. They are implemented in the Go CDK source code and can change with new releases. However, you can find the most commonly used parameters in the following locations:

Common Query Parameters

Provider Parameter Description
AWS S3 region The AWS region (e.g., “us-west-1”)
AWS S3 endpoint Custom endpoint for S3-compatible storage
AWS S3 disableSSL Set to “true” to disable SSL
AWS S3 s3ForcePathStyle Set to “true” to force path-style addressing
AWS S3 awssdk Set to “v2” to use the AWS SDK v2
Azure Blob domain Custom domain for the storage account
Azure Blob protocol Set to “http” for local development with Azurite
Azure Blob cdn Can be set to “true” when using a CDN URL pointing to a blob storage account
Azure Blob localemu Set to “true” to use the Azurite emulator
Google Cloud Storage access_id HMAC Access ID for non-OAuth authentication
Google Cloud Storage private_key_path Path to service account JSON key file

If you need to use a parameter not listed here, consult the Go CDK source code or reach out to Pachyderm support for guidance.


How to Set Up Blob Storage

  1. Navigate to your values.yaml file or obtain your current Helm values.yaml overrides:
    helm get values pachyderm  > values.yaml
  2. Add the following pachd.storage fields to your values.yaml file:
    pachd:
      storage:
        gocdkEnabled: true 
        storageURL: # The URL for your blob storage provider
  3. Update the storageURL to include provider-specific configuration options as needed; for options, see the related goCDK packages. For example:
    "s3://my-bucket?region=us-west-1&awssdk=v2" 
    "gs://${BUCKET_NAME}"
    "azblob://my-container?protocol=http&domain=localhost:10001"
  4. Save your changes and upgrade your cluster:
    helm upgrade pachyderm pachyderm/pachyderm -f values.yaml

Limitations

Some configuration settings such as verifySSL may not be passable via the storageURL as query parameters. In such cases, you can use the pachd.storage section to set these options.