Client Initialization (Start Here)

The Pachyderm SDK enables you to interact with HPE Machine Learning Data Management’s API, client, and configuration directly in a powerful way.

1. Installation

Before using the Pachyderm SDK, make sure you have it installed. You can install the SDK using pip:

pip install pachyderm_sdk

2. Import the Client & API

To use the Client class and APIs, you need to import them from pachyderm_sdk:

from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs, pps 

3. Create & Connect a Client Instance

To interact with a Pachyderm cluster, you need to create an instance of the Client class. The Client class provides multiple ways to create a client instance based on your requirements.

Basic Initialization

The simplest way to create a client instance is by calling the constructor without any parameters. This creates a client that connects to the local Pachyderm cluster running on localhost:30650 with default authentication settings.

client = Client()

You can customize the client settings by providing the relevant parameters to the Client constructor. Here’s an example:

client = Client(
    host='localhost',
    port=8080,
    auth_token='your-auth-token',
    root_certs=None,
    transaction_id=None,
    tls=False
)

In the above example, the client is configured to connect to the local Pachyderm cluster running on localhost:8080 without TLS encryption.

  • The auth_token parameter allows you to specify an authentication token for accessing the cluster.
  • The root_certs parameter can be used to provide custom root certificates for secure connections.
  • The transaction_id parameter allows you to specify a transaction ID to run operations on.
Tip
By default, the client will attempt to read the authentication token from the AUTH_TOKEN_ENV environment variable. You can also set the authentication token after creating the client using the auth_token property:

From Config File

If you have a config.json configuration file, you can create a client instance using the from_config method:

def read_config(config_file):
    with open(config_file, "r") as f:
        return json.load(f)
    
def setup_config(config_file, repo, pipeline, project, job_id=None, fileset_id=None, datum_id=None):
    config = read_config(config_file)
    config["data"]["pachyderm"]["host"] = os.getenv("PACHD_PEER_SERVICE_HOST")
    config["data"]["pachyderm"]["port"] = os.getenv("PACHD_PEER_SERVICE_PORT")
    config["data"]["pachyderm"]["repo"] = repo
    config["data"]["pachyderm"]["branch"] = job_id
    config["data"]["pachyderm"]["token"] = os.getenv("PACH_TOKEN")
    config["data"]["pachyderm"]["project"] = project
    config["data"]["pachyderm"]["fileset_id"] = os.getenv(" FILESET_ID")
    config["data"]["pachyderm"]["datum_id"] = os.getenv("PACH_DATUM_ID")

    config["labels"] = [repo, job_id, pipeline]

    return config

# Creates a project, repo, branch, and pipeline
project = pfs.Project(name="sdk-basic-pipeline-6")
repo = pfs.Repo(name="housing_data", project=project)
branch = pfs.Branch.from_uri(f"{repo}@main")
pipeline = pps.Pipeline(name="pipeline-001", project=project)

# Creates a config object
config = setup_config("config.json", repo.name, pipeline.name, project.name, branch.name)
client = Client.from_config(config)

# Checks the version of Pachyderm and the address of pachd
version = client.get_version()
print("Pachyderm Version:", version)
print("Pachd Address:", client.pfs.client.address)

From Within a Cluster

If you’re running the code within a Pachyderm cluster, you can use the new_in_cluster method to create a client instance that operates within the cluster. This method reads the cluster configuration from the environment and creates a client based on the available configuration.

client = Client.new_in_cluster(auth_token='your-auth-token', transaction_id='your-transaction-id')

Via PachD Address

If you have the Pachd address (host:port) of the HPE Machine Learning Data Management cluster, you can create a client instance using the from_pachd_address method:

client = Client.from_pachd_address('pachd-address', auth_token='your-auth-token', root_certs='your-root-certs', transaction_id='your-transaction-id')

Test Connection

If you’d like to quickly test out working with the Pachyderm SDK on your local machine (e.g., using a locally deployed Docker Desktop instance), try out the following:

from pachyderm_sdk import Client 

client = Client(host="localhost", port="80")
version = client.get_version()
print(version)

Example Output

Version(major=2, minor=6, micro=4, git_commit='358bd1229130eb262c22caf82ed87b3cc91ec81c', git_tree_modified='false', build_date='2023-06-22T14:49:32Z', go_version='go1.20.5', platform='arm64')

If you see this, you are ready to start working with the SDK.