Fine-Tuning Reference

GenAI Studio supports the following model and hardware pair configurations.

Mistral-7b Configuration
T4 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 24
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 8
- context_window: 4096
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
Llama-2-7b Configuration Llama-2-13b Configuration
V100 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 32
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
T4 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
T4 - slots_per_trial: 32
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 4
- context_window: 2048
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 16
- context_window: 4096
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
- slots_per_trial: 8
- context_window: 4096
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
- slots_per_trial: 8
- context_window: 1024
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
falcon-7b Configuration falcon-40b Configuration
T4 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 128
- context_window: 1024
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 32
- context_window: 2048
- batch_size: 4
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 4
- context_window: 2048
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
mpt-7b Configuration mpt-30b Configuration
T4 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
T4 - slots_per_trial: 64
- context_window: 1024
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 16
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
V100 - slots_per_trial: 128
- context_window: 1024
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 4
- context_window: 2048
- batch_size: 1
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
A100 - slots_per_trial: 16
- context_window: 1024
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”
- slots_per_trial: 32
- context_window: 2048
- batch_size: 2
- deepspeed: True
- gradient_checkpointing: True
- torch_dtype: “float16”

Configuration Description

  • slots_per_trial: The number of slots (GPUs) each trial (such as a run of fine-tuning) will use. For example, if slots_per_trial is set to 16 and the hardware type is V100, then one fine-tuning run will need 16 V100 GPUs.

  • context_window: The maximum number of tokens the model can consider at once. For example, a context_window size of 2048 indicates the model can consider sequences of up to 2048 tokens long.

  • batch_size: The number of batches the model will process at a time.

  • deepspeed: When set to True, the training process uses DeepSpeed’s optimizations to improve speed and efficiency, reduce memory consumption, and potentially increase training speed.

  • gradient_checkpointing: A technique for reducing memory usage that allows training larger models on hardware with limited memory.

  • torch_dtype: Specifies the data type used during training. For example, float16 reduces memory usage and can help with faster computation.