Fine-Tuning Reference

GenAI Studio supports the following model and hardware pair configurations.

Mistral-7b	Configuration
T4	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
V100	- slots_per_trial: 24 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
A100	- slots_per_trial: 8 - context_window: 4096 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”

Llama-2-7b	Configuration	Llama-2-13b	Configuration
V100	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	V100	- slots_per_trial: 32 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
T4	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	T4	- slots_per_trial: 32 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
A100	- slots_per_trial: 4 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	A100	- slots_per_trial: 16 - context_window: 4096 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
	- slots_per_trial: 8 - context_window: 4096 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”		- slots_per_trial: 8 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”

falcon-7b	Configuration	falcon-40b	Configuration
T4	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	V100	- slots_per_trial: 128 - context_window: 1024 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
V100	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	A100	- slots_per_trial: 32 - context_window: 2048 - batch_size: 4 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
A100	- slots_per_trial: 4 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”

mpt-7b	Configuration	mpt-30b	Configuration
T4	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	T4	- slots_per_trial: 64 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
V100	- slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	V100	- slots_per_trial: 128 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
A100	- slots_per_trial: 4 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”	A100	- slots_per_trial: 16 - context_window: 1024 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”
			- slots_per_trial: 32 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16”

Configuration Description #

slots_per_trial: The number of slots (GPUs) each trial (such as a run of fine-tuning) will use. For example, if slots_per_trial is set to 16 and the hardware type is V100, then one fine-tuning run will need 16 V100 GPUs.
context_window: The maximum number of tokens the model can consider at once. For example, a context_window size of 2048 indicates the model can consider sequences of up to 2048 tokens long.
batch_size: The number of batches the model will process at a time.
deepspeed: When set to True, the training process uses DeepSpeed’s optimizations to improve speed and efficiency, reduce memory consumption, and potentially increase training speed.
gradient_checkpointing: A technique for reducing memory usage that allows training larger models on hardware with limited memory.
torch_dtype: Specifies the data type used during training. For example, float16 reduces memory usage and can help with faster computation.