Cluster Usage Reference

Consult this resource when setting up your cluster to determine how many concurrent users and models each instance can support.

Calculating Concurrent Usage Per Instance

This section shows you how to calculate concurrent usage per instance using a Llama-2-7b model as an example.

GPUs Required Per Job Type

Model Job Type Number of GPUs Required
Llama-2-7b Prompt Engineering 1 A100 or 2 V100s
Llama-2-7b Fine-Tuning 8 A100 or 16 V100s

How to Calculate Concurrent Usage Per Instance

To calculate the number of concurrent users each instance can support, find the number of cluster GPUs and the number of concurrent jobs by job type. In each example scenario shown below, the number of maximum concurrent supported users is based on the number of cluster GPUs and the job types.

Scenario Cluster GPUs Job Type(s) Number of Concurrently Supported Users
A 8 A100 GPUs, 8 V100 GPUs Prompt Engineering 12
B 8 A100 GPUs, 8 V100 GPUs Prompt Engineering 4
Fine-Tuning 1
C 16 A100 GPUs Fine-Tuning 2