Essential Guide to Foundation Models and Large Language Models

Babar M Bhatti
15 min readFeb 6, 2023

The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.

The Stanford team made a point to note that FMs are NOT foundational models in the sense that they are not the foundation for AI — that is, such models are not implied to be AGI.

There are 5 key characteristics of Foundation Models:

  1. Pretrained (using large data and massive compute so that it is ready to be used without any additional training)
  2. Generalized — one model for many tasks (unlike traditional AI which was specific for a task such as image recognition)
  3. Adaptable (through prompting — the input to the model using say text)
  4. Large (in terms of model size and data size e.g. GPT-3 has 175B parameters and was trained on about 500,000 million words, equivalent of over 10 lifetimes of humans reading nonstop!)
  5. Self-supervised (see footnote 1) — no specific labels are provided and the model has to learn from the patterns in the data which is provided — see the cake illustration below.

--

--

Babar M Bhatti

AI, Machine Learning for Executives, Data Science, Product Management. Co-Founder Dallas-AI.org. Speaker, Author. Former Co-founder @MutualMind