Essential Guide to Foundation Models and Large Language Models

Babar M Bhatti
15 min readFeb 6

The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.

The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.

There are 5 key characteristics of Foundation Models:

  1. Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
  2. Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
  3. Adaptable (through prompting — the input to the model using say text)
  4. Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
  5. Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.

Examples of FMs include GPT-3, DALL-E-2, which allow non-developers and ordinary users to perform impressive tasks by providing “prompts.”

“Transfer learning is what makes foundation models possible, but scale is what makes them powerful.” [1]

Modalities of Foundation Models

FMs can handle a multitude of data and modalities. Once trained, a Foundation Model can handle a variety of downstream tasks. Using the right chain-of-work, such capabilities can automate complex workflows.

One important point to note — Foundation models can do more than just generation of content (text, images, audio, videos), they can also be used for predictions and classifications (known as discriminative modeling). Here’s a variation of the above view which illustrates…

Babar M Bhatti

AI, Machine Learning for Executives, Data Science, Product Management. Co-Founder Speaker, Author. Co-founder @MutualMind

Recommended from Medium


See more recommendations