Medical Transcripts Classification:
Create Project and Load Data


Welcome to the tutorial on classifying medical transcripts! In this guide, you’ll learn how to create a fine-tuned model that categorizes different types of medical reports, such as radiology, pathology, or discharge summaries. Effective training and the quality of data significantly influence your model’s classification accuracy.

Get Set Up

Before we begin:

Part 1: Create a Project & Load Data

Create a Project

  1. Select Create New Project from the GenAI Studio home page.
  2. Name your project Medical Transcripts Classification Tutorial and then select Create. You’ll see the new project in your project list.
  3. Add this project to your favorites by clicking Favorites + in the sidebar for easy access.
  4. Open the project by selecting the project’s name from the sidebar.

Load Dataset

Next, let’s load a medical-cases-classification-tutorial dataset equipped with a variety of medical transcripts, each labeled according to the report type.

Preparing Your Dataset

It’s best to have your dataset already prepared before you start working with your model. If you’re new to this, learn about limitations for uploading local files. Here’s what your dataset might look like:

Transcript Class
Full medical transcript Type of medical report

Load the Dataset

For this tutorial, we’ll use a comprehensive dataset from Hugging Face:

  1. Navigate to the Datasets tab in your project.
  2. Select New Dataset.
  3. Ensure you are on the Hugging Face tab.
  4. Enter hpe-ai/medical-cases-classification-tutorial for the dataset name.
  5. For descriptoin, type Example medical transcripts and classifications.
  6. Select Create Dataset.
Preview Features
Now, you can preview the dataset features to ensure they meet your needs. The dataset includes sections for testing, training, and validation, each containing essential details like descriptions, transcriptions, and medical specialties.


  • You’ve successfully created a new project and added it to your favorites.
  • You’ve loaded a comprehensive dataset to train your model.