Essential Guide to Foundation Models and Large Language Models
15 min readFeb 6, 2023
The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.
The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.
There are 5 key characteristics of Foundation Models:
- Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
- Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
- Adaptable (through prompting — the input to the model using say text)
- Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
- Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.