The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.
The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.
There are 5 key characteristics of Foundation Models:
- Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
- Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
- Adaptable (through prompting — the input to the model using say text)
- Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
- Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.
Examples of FMs include GPT-3, DALL-E-2, which allow non-developers and ordinary users to perform impressive tasks by providing “prompts.”
“Transfer learning is what makes foundation models possible, but scale is what makes them powerful.” 
Modalities of Foundation Models
FMs can handle a multitude of data and modalities. Once trained, a Foundation Model can handle a variety of downstream tasks. Using the right chain-of-work, such capabilities can automate complex workflows.
One important point to note — Foundation models can do more than just generation of content (text, images, audio, videos), they can also be used for predictions and classifications (known as discriminative modeling). Here’s a variation of the above view which illustrates…