Essential Guide to Foundation Models and Large Language Models

15 min readFeb 6, 2023

The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.

The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.

There are 5 key characteristics of Foundation Models:

Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
Adaptable (through prompting — the input to the model using say text)
Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.

Essential Guide to Foundation Models and Large Language Models

Written by Babar M Bhatti