Batch Inference

The process of running a set of data through an AI model all at once as opposed to one at a time.



A checkpoint is a saved version of a model. Checkpoints are saved at regular intervals during training. Checkpoints are used to resume training from a specific point, or to evaluate a model at a specific point.

Context Understanding

The ability of an LLM to interpret and respond based on the given text input or prompt. It involves understanding the nuances, intent, and subtleties within the language.



A dataset is a collection of data that is used to train a machine learning model. Datasets are usually composed of two parts: the input data and the expected output data.


Evaluation Experiment

An Evaluation Experiment is an experiment that is used to evaluate the performance of a model on a test or training dataset.


Few-Shot Prompting

Few-shot prompting shows the model what kind of output you are expecting by providing a few examples (at least two but typically less than five).

Fine-Tuned Model

A fine-tuned model is a model that has been trained on a specific dataset to perform a specific task.

Foundation Model

A foundation model is a large-scale model that has been trained on diverse data. It is versatile and adaptable and provides a good starting point from which specialized abilities can be developed through further training and fine-tuning.


Generative AI

A type of AI that can generate novel content, including text, images, or music, based on learned patterns and data inputs.

Generative AI Chatbot

An artificial intelligence-powered conversational agent that uses generative models to produce human-like responses in text or speech.


Hugging Face

Hugging Face is a leading platform and community for natural language processing (NLP) and AI. They provide a wide range of pre-trained language models, including BERT and GPT, as well as tools and libraries for developers to work with these models. Developers use Hugging Face's Transformers library for fine-tuning models and integrating them into various applications.






Large Language Models (LLMs) are a category of artificial intelligence models designed to process and generate human-like text based on extensive training on vast datasets.


Max New Tokens

The maximum number of tokens to generate, excluding the prompt.

Model Properties

Model Properties in the playground view are used to guide LLM generation to meet the specific needs of your use case.


Natural Language Processing (NLP)

A field of computer science and artificial intelligence focused on enabling computers to understand, interpret, and respond to human language in a meaningful way.


Online Inference

Online inferencing is when an AI model processes data as soon as it gets it.



The input text that you feed to a model and is like telling your AI friend what you want to chat about.

Prompt Engineering

The process of strategically designing and refining prompts to effectively guide an LLM's responses towards a desired outcome or to improve its performance on specific tasks.

Prompt Template



Retrieval Augmented Generation (RAG)

RAG, or Retrieval-Augmented Generation, combines the capabilities of two types of AI models—a retriever and a generator—to create text that's both informative and accurate. RAG uses the retriever model to find relevant information from a large database, and the generator model then uses this information to construct coherent and contextually appropriate responses.



Learn about the concept of a Session.


A snapshot is a recording in time. It contains a collection of elements that include the prompt, parameters, and model output. In other words, a snapshot is a collation of model parameters and other elements that make system generation possible.

System Output

System output is the text that is generated by the LLM. The system output is generated based on the prompt that is provided to the LLM. The system output is also impacted by the LLM's training data, the LLM's training settings, and the LLM's training progress.



The degree of randomness of the model's output. The key is to find a middle ground. A higher temperature produces a more diverse and creative output.


The smallest unit of data processed by an LLM. In language models, tokens usually represent words, but they can also represent parts of words or punctuation.

Top K

A parameter that gives the AI more (higher top k) or less (lower top k) words to choose from.

Top P

A parameter that gives the AI more (higher top p) freedom to choose less obvious words or to stick to more likely, predictable words (lower top p).

Training Experiment

A Training Experiment is an experiment that is used to train a model. Training experiments are composed of a dataset, a model, and a training configuration.




A Variant is a specific version of a Model. A Model can have multiple Variants, each with different parameters.