A dataset is a collection of data that is used to train a machine learning model. Datasets are usually composed of two parts: the input data and the expected output data.

Dataset types


Evaluation datasets are used to measure the performance of a model. These are great for testing the accuracy, speed, and other metrics before training it on a larger, more relevant dataset. You’ll use an evaluation dataset during the model evaluation process.


Domain datasets are used to train a model on a specific domain. These are great for training a model to perform a specific task, such as generating code or writing a story. You’ll use a domain dataset during the model augmentation process.