Create Project and Load Data
Introduction #
Welcome to the tutorial on classifying medical transcripts! In this guide, you’ll learn how to create a fine-tuned model that categorizes different types of medical reports, such as radiology, pathology, or discharge summaries. Effective training and the quality of data significantly influence your model’s classification accuracy.
Get Set Up #
Before we begin:
- Ensure you’re signed in.
- Confirm your Hugging Face access token is set.
Part 1: Create a Project & Load Data #
Create a Project #
- Select Create New Project from the GenAI Studio home page.
- Name your project
Medical Transcripts Classification Tutorial
and then select Create. You’ll see the new project in your project list. - Add this project to your favorites by clicking Favorites + in the sidebar for easy access.
- Open the project by selecting the project’s name from the sidebar.
Load Dataset #
Next, let’s load a medical-cases-classification-tutorial dataset equipped with a variety of medical transcripts, each labeled according to the report type.
Preparing Your Dataset #
It’s best to have your dataset already prepared before you start working with your model. If you’re new to this, learn about limitations for uploading local files. Here’s what your dataset might look like:
Transcript | Class |
---|---|
Full medical transcript | Type of medical report |
Load the Dataset #
For this tutorial, we’ll use a comprehensive dataset from Hugging Face:
- Navigate to the Datasets tab in your project.
- Select New Dataset.
- Ensure you are on the Hugging Face tab.
- Enter
hpe-ai/medical-cases-classification-tutorial
for the dataset name. - For descriptoin, type
Example medical transcripts and classifications
. - Select Create Dataset.
Recap #
- You’ve successfully created a new project and added it to your favorites.
- You’ve loaded a comprehensive dataset to train your model.