Cluster Usage Reference
Consult this resource when setting up your cluster to determine how many concurrent users and models each instance can support.
Calculating Concurrent Usage Per Instance #
This section shows you how to calculate concurrent usage per instance using a Llama-2-7b model as an example.
GPUs Required Per Job Type
Model | Job Type | Number of GPUs Required |
---|---|---|
Llama-2-7b | Prompt Engineering | 1 A100 or 2 V100s |
Llama-2-7b | Fine-Tuning | 8 A100 or 16 V100s |
How to Calculate Concurrent Usage Per Instance
To calculate the number of concurrent users each instance can support, find the number of cluster GPUs and the number of concurrent jobs by job type. In each example scenario shown below, the number of maximum concurrent supported users is based on the number of cluster GPUs and the job types.
Scenario | Cluster GPUs | Job Type(s) | Number of Concurrently Supported Users |
---|---|---|---|
A | 8 A100 GPUs, 8 V100 GPUs | Prompt Engineering | 12 |
B | 8 A100 GPUs, 8 V100 GPUs | Prompt Engineering | 4 |
Fine-Tuning | 1 | ||
C | 16 A100 GPUs | Fine-Tuning | 2 |