Manage Datasets #
Datasets are the secret sauce of generative AI. What kind of data you train with — and how it is incorporated into your prompt — is what ultimately makes your fine-tuned model unique. GenAI Studio supports CSV datasets and HuggingFace datasets.
All uploaded datasets are scoped to the project level.
Linked Data Templating #
Once you have uploaded a dataset, you can then link it into your prompt input using a templating syntax that maps to the column names in your dataset. For example, if you have a dataset with the following columns: title
, text
, and label
, you can link the dataset to your prompt using the following templating syntax:
Title: {{title}}
Text: {{text}}
Label: {{label}}
This templating syntax supports auto-completion and is case-sensitive, so inputs must match the column names in your dataset exactly.
Model Creation Journey #
Datasets should be uploaded to your project at the beginning of your journey to be used for comparing base models, creating snapshots, fine-tuning models, and testing an output model.
graph LR; A(Create Project) --> B(Import Data); B --> C(Snapshot Model); C --> D(Fine-Tune Model); D --> E(Export Model); style A fill:#fff,stroke:#333,stroke-width:2px; style B fill:#7FF9E2,stroke:#333,stroke-width:4px; style C fill:#fff,stroke:#333,stroke-width:2px; style D fill:#fff,stroke:#333,stroke-width:2px; style E fill:#fff,stroke:#333,stroke-width:2px;