Most engineers have seen this formula.
P(A|B) = P(B|A) × P(A) / P(B)
Almost none can explain what it actually does.
Here's Bayes' Theorem in plain English, and where it's hiding inside systems you use every day.
The core idea in one sentence:
Bayes' Theorem updates your belief about something after seeing new evidence.
That's it. Four terms:
Prior → what you believed before the evidence
Likelihood → how probable the evidence is, given your hypothesis
Evidence → how common the evidence is overall
Posterior → your updated belief after seeing the evidence
A concrete example:
Say 40% of all emails are spam (your prior).
You see a new email containing the word "lottery."
10% of spam emails contain "lottery." Only 1% of legitimate emails do.
Plug into Bayes:
P(spam | "lottery") = (0.10 × 0.40) / P("lottery") ≈ 87%
The word "lottery" updated your belief from 40% → 87%.
That's Bayes in action. Prior belief + new evidence = updated belief.
Where it lives in AI:
1/ Spam filters
The Naive Bayes classifier, the algorithm behind most spam filters - applies this exact calculation word by word across an entire email. Each word shifts the probability up or down. It's called "naive" because it assumes each word is independent of the others, which isn't realistic, but works remarkably well in practice.
2/ Medical diagnosis AI
A patient has symptom X. What's the probability of disease Y? Bayes updates the base rate (how common the disease is) with the likelihood of seeing that symptom in patients who have it. Same formula, different domain.
3/ Your LLM's uncertainty
Modern language models don't just predict the next token, they assign a probability to every possible token. The sampling process (temperature, top-p) is directly working with those probability distributions. Bayesian reasoning is embedded in every response your model generates.
The insight most engineers miss:
Bayes doesn't give you certainty. It gives you a rational way to update uncertainty.
That's exactly why it's foundational to AI - real-world systems are never certain. They're always working with incomplete, noisy, probabilistic information.
Every model that learns from data is, at its core, doing some version of this:
Start with a belief. See evidence. Update the belief.
That's Bayes. That's machine learning.