Essential Guide to Foundation Models and Large Language Models

15 min readFeb 6, 2023

The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.

The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.

There are 5 key characteristics of Foundation Models:

Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
Adaptable (through prompting — the input to the model using say text)
Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.

Examples of FMs include GPT-3, DALL-E-2, which allow non-developers and ordinary users to perform impressive tasks by providing “prompts.”

“Transfer learning is what makes foundation models possible, but scale is what makes them powerful.” [1]

Modalities of Foundation Models

FMs can handle a multitude of data and modalities. Once trained, a Foundation Model can handle a variety of downstream tasks. Using the right chain-of-work, such capabilities can automate complex workflows.

One important point to note — Foundation models can do more than just generation of content (text, images, audio, videos), they can also be used for predictions and classifications (known as discriminative modeling). Here’s a variation of the above view which illustrates…

Essential Guide to Foundation Models and Large Language Models

Written by Babar M Bhatti