Supported Hardware Reference
Consult this resource when setting up your cluster to determine how many concurrent users and models each instance can support.
Calculating Concurrent Models Per Instance Type #
Prompt Engineering Requirements #
This table denotes how many models can run concurrently on one instance. For example, one p4d.24xlarge instance (8 A100 GPUs) can support eight Llama-2-7b models running prompt engineering at the same time.
Model | Instance Type | ||
---|---|---|---|
8 A100 GPUs (p4d.24xlarge) | 8 V100 GPUs (p3.16xlarge) | 4 T4 GPUs (g4dn.12xlarge) | |
Mistral-7b | 8 models | 4 models | 2 models |
Llama-2-7b | 8 models | 4 models | 2 models |
Llama-2-13b | 4 models | 2 models | 2 models |
Llama-2-70b | 2 models | N/A | N/A |
falcon-7b | 8 models | N/A | N/A |
falcon-40b | 2 models | 1 model | N/A |
mpt-7b | 8 models | 4 models | 2 models |
mpt-30b | 2 models | 1 model | N/A |
Fine-Tuning Requirements #
This table denotes how many instances of each model that can be used concurrently for fine-tuning on different types of hardware instances. For example, one p4d.24xlarge instance can only support one Llama-2-7b model running fine-tuning. To fine-tune the same model, two p3.16xlarge instances (totally 16 V100 GPUs) are required.
Model | Instance Type | ||
---|---|---|---|
8 A100 (p4d.24xlarge) | 8 V100 (p3.16xlarge) | 4 T4 (g4dn.12xlarge) | |
Mistral-7b | 1 instance / job | 3 instances / job | 4 instances / job |
Llama-2-7b | 1 instance / job | 2 instances / job | 4 instances / job |
Llama-2-13b | 1 or 2 instance(s) / job (depending on context_window length) | 4 instances / job | 8 instances / job |
Llama-2-70b | N/A | N/A | N/A |
falcon-7b | 0.5 instance / job | 2 instances / job | 4 instances / job |
falcon-40b | 4 instances / job | 16 instances / job | N/A |
mpt-7b | 0.5 instance / job | 2 instances / job | 4 instances / job |
mpt-30b | 2 or 4 instances / job (depending on context_window length) | 16 instances / job | 4 instances / job |