Designed by Md Nazmus Sakib, CFA · Made with Claude
Live Simulation · Drag · Flip · Roll
10 Tabs · Coin → Cards → Bayes → Monty Hall
Open Workshop · Free Forever
What is probability?
Let's go slowly. Probability is simpler than it looks. By the end of this workshop, you'll see it clearly — but only if we take it one small step at a time.
Step One
Probability is just counting.
Count what can happen. Count what you want. Divide. That's it.
Step Two — Watch this coin
HEADS
TAILS
Two possible outcomes. Each equally likely. The probability of heads is one out of two.
Step Three — Three ways to say the same thing
½
FRACTION
0.5
DECIMAL
50%
PERCENT
All three mean the same thing. Use whichever feels easiest.
Step Four — The boundaries
Every probability lives somewhere on this line — between 0 (won't happen) and 1 (will definitely happen). Nothing higher, nothing lower.
Step Five — Our three tools
H
T
The Coin
2 outcomes
The Die
6 outcomes
A♥♥A♥
The Deck
52 outcomes
Every concept in this workshop will be shown with these three. Familiar objects. Honest math.
▸ One promise before we start
Probability is the most counter-intuitive branch of math. Even experts get it wrong. Your gut feelings will sometimes mislead you — and that's fine. We'll move slowly, one small idea at a time. By Tab VIII, you'll be solving problems that have fooled mathematicians for decades.
Ready when you are
Up next: a single coin. Flip it 100 times. Watch the math come alive.
Single Events
One coin. One roll. One card. Let's start with the simplest possible question — and earn our way up from there.
The one formula you need
P(event) = favorable ÷ total
Count the outcomes you want. Count all possible outcomes. Divide.
Try it with a coin
A coin has 2 possible outcomes. 1 is heads. So P(heads) = 1/2.
Now Let's Flip 100 Times
HEADS
TAILS
Total Flips
0
Heads
0
0%
Tails
0
0%
▸ What just happened?
Flip 10 times — you might get 7 heads and 3 tails. Surprising? Don't worry. Flip 100 times — you'll land closer to 50/50, maybe 54/46. Probability is a long-run statement, not a guarantee about the next flip. The fewer flips, the more the result can wander. That's normal.
Now try a die
6 outcomes. Each equally likely.
So P(rolling a 6) = 1 ÷ 6 ≈ 16.7%.
Last roll: 5
Roll History0 rolls
Let's count a few die questions together
P(rolling a 6): 1 favorable face, 6 total → 1/6 ≈ 16.7%
P(rolling an even number): faces 2, 4, 6 work → 3 favorable, 6 total → 3/6 = 1/2 = 50%
P(rolling 4 or higher): faces 4, 5, 6 work → 3/6 = 1/2
P(rolling a 7): no face has 7 → 0/6 = 0 (impossible)
Now a deck of cards
52 cards. 4 suits × 13 ranks.
Hearts ♥ and Diamonds ♦ are red. Spades ♠ and Clubs ♣ are black.
Draw a Card
Five card probabilities worth knowing
P(any specific card) = 1/52 ≈ 1.9%
P(an Ace) — there are 4 aces → 4/52 = 1/13 ≈ 7.7%
P(a Heart) — there are 13 hearts → 13/52 = 1/4 = 25%
P(a Face card — J, Q, K) — 3 face cards × 4 suits → 12/52 ≈ 23.1%
Quick check
Two questions. Take your time. The answers will appear after you tap.
If you roll a die, what's the probability of rolling a number less than 3?
1/3 is right. "Less than 3" means rolling a 1 or 2 — that's 2 favorable outcomes out of 6 possible. 2/6 = 1/3 ≈ 33.3%.
Drawing one card from a deck — what's the probability it's a black face card?
6/52 is correct. Black suits are spades (♠) and clubs (♣). Face cards are J, Q, K. So there are 3 face cards × 2 black suits = 6 black face cards out of 52.
Take a breath
You've now got the foundation. Next: what happens when you want two things at once?
OR — When You Want One Thing Or Another
Sometimes you don't care exactly what happens — you just need any of a few things to work out. That's an OR question. Let's start with the easiest version.
A simple question
Roll a die. What's the probability of getting a 5 or a 6?
Just count it out
2 faces (the 5 and the 6) out of 6 total. So P = 2/6 = 1/3.
Can both happen at once? If no → just add the probabilities. If yes → add them, then subtract the overlap. That's the whole rule, in two short questions.
Try it yourself
Pick two events and the simulator will work out whether they overlap, and apply the right formula.
Event A
Event B
Quick check
Two questions to make sure it stuck.
From one card draw — probability of drawing a King OR a Queen?
8/52. A card can't be both a King and a Queen — these don't overlap. So just add: 4 Kings + 4 Queens = 8 favorable out of 52.
Probability of drawing a Heart OR an Ace?
16/52. These overlap — the Ace of Hearts is both. So: 13 Hearts + 4 Aces − 1 (overlap) = 16.
Take a breath
You've got OR. Up next: what if you want two things to both happen?
AND — When You Want Both Things To Happen
Sometimes one thing isn't enough — you want both events to work out. The math for this is just as simple as OR, but it goes the other direction: instead of adding, you multiply.
A simple question
Flip a coin twice. What's the probability of getting two heads?
Let's list every possibility
ALL 4 POSSIBLE OUTCOMES — EACH EQUALLY LIKELY
H
H
✓ WIN
H
T
T
H
T
T
1 out of 4 outcomes is HH. So P(two heads) = 1/4 = 25%.
Notice the pattern
P(H) = 1/2. P(H) = 1/2. And 1/2 × 1/2 = 1/4.
When two events don't affect each other — they're independent — you multiply their probabilities.
The AND rule
P(A and B) = P(A) × P(B)
Works when A and B are independent — neither one changes the other.
What "independent" means
Two events are independent when one happening tells you nothing about whether the other will happen. Coin flips are independent — flipping one heads doesn't change the next flip's probability. Rolling a die and drawing a card are independent. Two consecutive lottery draws are independent. Most "different objects, different actions" pairs are independent.
Let's stretch it
Flip a coin n times. What's P(all heads)?
Each flip is independent. Just multiply 1/2 by itself, n times.
P(All Heads)
12.5%
Shrinks fast as n grows
Total Outcomes
8
= 2 × 2 × 2 × ... (n times)
P(At Least One Tails)
87.5%
= 1 − P(All Heads)
▸ A useful trick
Look at the third stat. P(at least one tails) = 1 − P(no tails) = 1 − P(all heads). When "at least one" is hard to count, flip the question: ask "what's the chance of none?" instead, then subtract from 1. This is called the complement trick, and it'll save you hours of calculation in harder problems.
One more — mixing different objects
Flip a coin AND roll a die AND draw a card. What's P(Heads and a 6 and an Ace)?
Each event was reasonable on its own: 50%, 16.7%, 7.7%. Combined, less than 1%. This is why predicting compound futures is so hard — each step multiplies the chances down rapidly. A weather forecast that needs three independent conditions to all line up, each at 70%, is really 0.7 × 0.7 × 0.7 ≈ 34%. Always weaker than you'd think.
Quick check
A coin flipped 4 times — what's the probability of getting HHHH?
1/16. Each flip is independent at 1/2. Multiply four times: 1/2 × 1/2 × 1/2 × 1/2 = 1/16.
Roll two dice. What's the probability that both show a 6?
1/36. Each die is independent. 1/6 × 1/6 = 1/36. Or: 6 × 6 = 36 possible outcomes when rolling two dice, only one of which is (6,6).
Ready for something subtler
Up next: what happens when one event does affect another? When information changes what we should believe?
When Information Changes The Picture
Up to now, all our probabilities have been about events seen fresh, with no prior knowledge. But what happens when you learn something partway through? That's where probability becomes truly useful.
A story with marbles
A jar holds 10 marbles. Six red. Four blue.
You reach in without looking, take one out, set it aside. Then take another. The second pick is the interesting one.
10 marbles · 6 red · 4 blue
Notice what just happened
The probability of the second pick depended on the first.
Two different worlds for the second pick
If the first pick was Red: 5 red + 4 blue remain. P(Red on second) = 5/9 ≈ 55.6%
If the first pick was Blue: 6 red + 3 blue remain. P(Red on second) = 6/9 ≈ 66.7%
Same jar. Same second action. But the probability is different depending on what we learned from the first pick. That's conditional probability.
The notation
P(A | B) means "probability of A, given B."
The vertical bar is read as "given that". P(Red on 2nd | Red on 1st) = 5/9.
A new way to think about it
When you learn that B happened, you've entered a smaller world. You're no longer asking "out of all possible outcomes..." — you're asking "out of the outcomes where B is true, what fraction also have A?" The denominator changes.
A worked example with cards
You drew a card. A friend peeks and says: "It's a face card."
Given that info, what's the chance it's a King?
Count inside the smaller world
Normally, P(King) = 4/52 = 1/13. But now we know it's a face card. That means it's one of these 12 cards:
J, Q, K of ♠ · J, Q, K of ♥ · J, Q, K of ♦ · J, Q, K of ♣
Of those 12 face cards, 4 are Kings. So:
P(King | Face) = 4/12 = 1/3 ≈ 33.3%
The formula
P(A | B) = P(A and B) ÷ P(B)
"Of all the times B happens, what fraction also have A?"
A tool for these problems — the tree diagram
Draw the branches. Multiply along each path. Every path's probability tells the whole sequential story.
How to read this tree
You're drawing two cards without replacement. The first branch splits into Heart (1/4) or Not-Heart (3/4). Each of those splits again — but the second-pick odds depend on the first. After a Heart, only 12 Hearts remain in 51 cards (12/51). After a Not-Heart, all 13 Hearts still remain in 51 cards (13/51). Multiply along any path to get that outcome's probability. All four end-probabilities sum to 100%.
▸ Why this matters
Conditional probability is the bridge from counting to reasoning. Every Bayesian inference (next tab), every diagnostic test, every "given X, what's the chance of Y" question rests on this single idea: information changes the picture. Master this tab and Bayes will click easily.
Quick check
A die was rolled. Someone tells you the result was even. Given that, what's the probability it was a 6?
1/3. Given the result was even, there are only 3 possibilities: 2, 4, or 6. The 6 is one of them. P(6 | even) = 1/3.
Draw two cards without replacement. Probability that both are Aces?
4/52 × 3/51. Without replacement, the second probability changes. After drawing one Ace, 3 Aces remain in 51 cards.
You're ready
Next: the most famous use of conditional probability. The one that fooled mathematicians for centuries. Bayes' theorem.
Bayes — When Evidence Updates Belief
A medical test with 99% accuracy comes back positive. What's the chance you actually have the disease? Probably not what you think. Slow down here. This tab rewires intuition.
A small story to begin
A rare disease affects 1 in 1,000 people.
The test for it is 99% accurate. You take it. It comes back positive.
Pause and guess
What's the chance you actually have the disease?
Most people guess 95% or 99%. The real answer will surprise you. Keep going.
Let's count instead
Forget percentages for a moment. Imagine 10,000 actual people, and let's count what would happen.
1Picture 10,000 people taking the test
Each square = 1 person
True Positive
False Positive
Now the answer
Of people who test positive...
9.0%
Actually have the disease
Of people who test positive...
91.0%
Are false alarms
A positive result on a 99%-accurate test for a rare disease means you still probably don't have it.
Why?
Because the disease is rare. Even a small false-positive rate (1%) applied to many healthy people produces more false alarms than the test catches real cases.
2The story in plain numbers
The formula (only after the intuition)
Now that the idea makes sense, here's the official name for what we just did. It's called Bayes' Theorem.
Bayes' theorem solves a problem that comes up everywhere: you have some evidence, what should you now believe? Doctors face it with test results. Courts face it with DNA evidence. Spam filters face it with new emails. Self-driving cars face it with sensor readings. The lesson, repeated: base rates matter enormously. Rare things stay rare even when evidence points to them.
Quick check
Two questions to lock it in.
A test is 95% accurate. The condition affects 1% of the population. You test positive. The probability you have it is closest to:
About 16%. Of 10,000 people: 100 have it (95 test positive). 9,900 don't have it (495 false positives). Total positives: 590. True positives: 95. 95/590 ≈ 16%. Most positives are false.
What change shifts the Bayesian conclusion the most?
Base rate. A 100× increase in prevalence (0.1% → 10%) shifts the conclusion enormously. This is why screening rare conditions in healthy populations often produces more false alarms than true cases — and why doctors are skeptical of positives in low-risk patients.
Almost there
You've just learned the formula that powers modern AI, medicine, and law. Next: probability for many outcomes at once.
When You Want To See All Outcomes At Once
So far, we've asked about specific events: "what's the chance of X?" Now we'll ask something bigger: "what's the chance of every possible result?" That gives us a distribution.
A natural question
Flip a coin 4 times. How many heads will you get?
Could be 0. Could be 1, 2, 3, or 4. Some are more likely than others.
Let's count carefully
There are 16 possible sequences of 4 flips.
Each is equally likely. But several sequences give the same total of heads. Let's group them.
0
HEADS
1 way
6.25%
1
HEAD
4 ways
25%
2
HEADS
6 ways
37.5%
3
HEADS
4 ways
25%
4
HEADS
1 way
6.25%
2 heads is the most likely (37.5%) because more sequences add up to that — like HHTT, HTHT, HTTH, THHT, THTH, TTHH.
That's a distribution
A distribution just lists every possible outcome and how likely each is.
When the experiment is repeated independent yes/no trials (like coin flips), the distribution has a special name: the binomial distribution.
Play with it
Try changing the number of flips and the coin's bias. Watch how the shape moves.
P(getting exactly k heads in n flips)P(5) = 24.6%
Most Likely
5 heads
The peak of the distribution
Average (Mean)
5.0
= n × p · long-run average
Spread (Std Dev)
1.58
= √(n·p·(1−p))
▸ What to notice
Crank n up to 30. Notice how the shape becomes a bell. Drag p away from 0.5 — the peak shifts left or right. Distributions are more useful than single probabilities because they tell you not just the average outcome but how much it varies. A model that gives only "the average" is hiding most of the truth.
Asking real questions
"What's the chance of at least 7 heads?"
For ranges, we add up the relevant bars.
The formula (after the picture)
Now that you've seen the shape, here's the formula behind it.
Binomial Probability
P(exactly k heads in n flips) = C(n,k) × pk × (1−p)n−k
C(n,k) counts how many ways you can arrange k heads among n flips. pk handles the heads. (1−p)n−k handles the tails.
A real-world reading
A bank approves 70% of loan applications. They get 10 applications today. What's the chance they approve at least 8?
This is binomial with n = 10, p = 0.70, k ≥ 8. Add up P(8) + P(9) + P(10) — roughly 38%.
Even with a 70% approval rate, getting all 10 approved is unusual (0.7¹⁰ ≈ 2.8%). The shape of the distribution matters more than the average rate.
Distributions have shape — and sometimes they lean
Not every distribution is symmetric.
Real-world data is rarely a clean bell. Most distributions lean one way or the other. That lean has a name: skewness.
Three shapes to recognize
LEFT-SKEWED
Long tail on the left. Mean < Median.
SYMMETRIC
No lean. Mean = Median. The bell.
RIGHT-SKEWED
Long tail on the right. Mean > Median.
median mean
A simple way to remember
The tail tells you the direction.
If the tail stretches to the right, it's right-skewed (also called positive skew). If the tail stretches left, it's left-skewed (negative skew). The peak is on the opposite side.
Why mean and median pull apart
In a symmetric distribution, mean and median sit in the same place — right in the middle. But in a skewed distribution, the extreme values in the long tail pull the mean toward them. The median, being just the middle ranked value, stays closer to the bulk. So in right-skewed data, mean > median. In left-skewed data, mean < median. Memorize this — it's how you spot skew without seeing the chart.
Play with it
Drag the slider. Pull the tail.
Watch what happens to the shape — and especially to the mean (red) versus the median (black). One chases the tail. The other holds its ground.
← LEFT-SKEWEDSYMMETRICRIGHT-SKEWED →
SHAPE
Symmetric
MEAN
50.0
MEDIAN
50.0
GAP
0.0
Mean and median are equal. This is what a perfectly symmetric distribution looks like — like a bell curve, or coin flips.
JUMP TO A REAL EXAMPLE
▸ What you should notice
Start with the slider at zero — mean and median are right on top of each other. Now drag it to the right. The mean (red line) slides toward the tail. The median (black line) barely moves. Now drag it back to the left — the mean swings the other way. The further the slider goes, the bigger the gap between mean and median. This is exactly what skewness is — quantified asymmetry between the bulk of the data and its extremes.
Where skewness shows up — example 1
Long-term stock returns
Here's something counterintuitive: most individual stocks lose money over their lifetime. The market as a whole still wins — because a few stocks win enormously. This is positive skewness in its starkest form.
The Bessembinder study (2018)
Hendrik Bessembinder of Arizona State University looked at every single U.S. stock traded between 1926 and 2016 — about 26,000 stocks in total. He asked one question: did each stock's lifetime return beat one-month Treasury bills (essentially the risk-free rate)? The result was startling.
Lifetime Returns Of ~26,000 US Stocks (1926–2016)
Hover any bar to inspect it · Try the three views below
HOVER A BAR FOR DETAIL
Default view: ~26,000 US stocks bucketed by their lifetime compound return. The vertical dashed line marks the T-bill benchmark. Bars left of the line are stocks that lost to T-bills. Bars right are winners.
The numbers that changed how people think about stocks
Bessembinder's findings, in plain numbers:
57.4% of all stocks (4 out of every 7) had lifetime returns worse than 1-month Treasury bills. The median stock was a money-loser.
Just 4.3% of stocks — about 1,092 companies out of 26,000 — accounted for the entire net wealth gain of the U.S. stock market over T-bills: roughly $35 trillion.
The remaining ~24,900 stocks combined contributed approximately zero excess return over T-bills.
The skewness coefficient of the lifetime return distribution: 154.8 — an extraordinarily high number. (A normal distribution has skewness of 0.)
▸ Why this matters — the case for indexing
If you pick a stock at random, you have a 57% chance of underperforming T-bills. Picking a few stocks compounds this risk. The market's celebrated long-run returns aren't from typical stocks — they come from a tiny handful of mega-winners (Apple, Microsoft, Amazon, and the like) whose extreme positive returns drag the average up. Miss those few, and you'd underperform.
This is the strongest statistical case for index investing ever made. An index fund guarantees you own the small subset of stocks that drive market gains. Stock pickers, by contrast, must correctly identify the rare winners — a task at which even professionals routinely fail.
Bessembinder's own words
From a 2024 interview: "The essence of a positively skewed distribution is that most of the individual outcomes are lower than the average outcome. The median outcome is less than the average outcome. The only way that can be true is that you've got a few, relatively few, outcomes that are much bigger than the average. So in some sense, this paper is really about skewness in compound returns."
Play it for real
Step into a stock picker's shoes.
Below is a simulated bag of 1,000 stocks whose proportions and shape are calibrated to mimic the Bessembinder distribution. Pick a few. See what happens.
The Bag · 1,000 Stocks
574 Losers (below T-bills)
383 Modest (small wins over T-bills)
40 Winners (big wins)
3 Mega-Winners (life-changing)
⌬ How this works
This is a random generator, not real stock data. Each "stock" in the bag is a synthetic number with a randomly drawn return, and the tickers (CMA, REZ, BNX, etc.) are made up.
What is faithful to Bessembinder (2018) is the shape and proportions: 57.4% lose to T-bills, 38.3% post modest wins, 4% are winners, and 0.3% are mega-winners whose extreme returns drag the average up. The point of the simulator isn't to mimic specific companies — it's to let you feel what positive skewness does to a small portfolio.
How many stocks will you pick?
Click a button above to pick stocks at random from the bag. Your portfolio will appear here.
Your Portfolio History
Total portfolios tried:0
Beat T-bills:0 · Lost to T-bills:0
▸ Try this experiment
Click "Pick 1 Stock" ten times. Most of the time you'll land on a loser — about 57% of attempts. Now click "Pick 5" a few times. Still risky, but better. Now click "Pick All 1,000" — that's an index fund. You captured the mega-winners by force, simply by owning everything.
This is the Bessembinder argument made tactile. You can pick stocks and hope to find the Apple. Or you can own everything and guarantee you own the Apple. Math says the second strategy wins.
Where skewness shows up — example 2
In biology and human populations
Many natural and social distributions are right-skewed. The bulk clusters near a normal value — but a long tail of outliers stretches out.
Right-Skewed Distributions In Real Life — Stylized
Three quick examples
Human lifespan. Most people live 70–85 years. A few reach over 100. Almost nobody lives to 130. The distribution is mildly right-skewed — natural ceilings limit how far the tail can stretch.
Body weight, plant height, organ size. Biological measurements typically cluster around a typical value with a longer tail on the upper end. Growth processes (multiplicative, not additive) naturally produce right-skewed shapes.
Income. The most famously skewed of all. The median household income is far below the mean — because billionaires pull the average up while doing nothing to the middle. This is why governments report median income, not mean. The mean would lie.
A surprising universal pattern
Why do so many real-world distributions skew right rather than left? Often because the underlying process is multiplicative rather than additive. Wealth compounds. Bacteria multiply. Cities grow proportionally. When small things tend to grow by a percentage (not a fixed amount), the result is a log-normal-ish distribution — symmetric only when you look at the log of the values. Stock prices behave this way too. Stock returns are nearly symmetric in log terms — but stock price changes in dollars are right-skewed.
Why skewness matters
A bell curve is the exception, not the rule.
Most real-world data is skewed. Treating skewed data as if it were symmetric leads to wrong averages, mispriced risk, and bad decisions. Always check the shape.
▸ Three practical habits
1. Always plot a histogram before summarizing. The mean alone hides skew.
2. Report median for skewed data. Income, response times, lifetimes, stock losses — these all need medians, not means.
3. Watch the tails. In finance, biology, and operations, the long tail is where the rare-but-huge events live. Standard deviation underweights them. Tools like Value-at-Risk and CVaR exist because of skewness.
Quick check
In 4 coin flips, the most likely number of heads is:
2 heads. For a fair coin, n × p = 4 × 0.5 = 2. The probabilities go 6.25%, 25%, 37.5%, 25%, 6.25% for k = 0, 1, 2, 3, 4. More ways to get 2 heads than any other count.
Household income data: the mean is $85,000 and the median is $52,000. What does this tell you?
Right-skewed. Mean > Median means the long tail is on the right — high-income outliers pull the average up while leaving the median (the middle person) unchanged. Classic income shape.
In Bessembinder's study of 26,000 U.S. stocks, lifetime returns were positively skewed. What does that imply?
A few extreme winners pull the mean above the median. 57.4% of stocks underperformed Treasury bills over their lifetimes. The median stock lost to T-bills. But the mean was positive — because just 4.3% of stocks (1,092 companies out of 26,000) generated the entire $35 trillion in net wealth above T-bills. The market's celebrated returns come from a tiny tail of mega-winners.
A natural next question
You know probabilities. You know distributions. So how should you actually use them to make decisions? That's expected value.
Expected Value — How To Use Probability
Probability is interesting. Decisions are useful. Expected value is the bridge — the simplest, most powerful tool for asking: "should I take this bet?"
A friend offers you a deal
"Let's flip a coin. Heads, I give you $20. Tails, you give me $15."
Take it? Skip it? How would you even decide?
Imagine playing 100 times
Roughly 50 heads, 50 tails. So 50 × $20 = $1,000 gained. And 50 × $15 = $750 lost. Net: +$250 over 100 flips. That's +$2.50 per flip on average.
That number has a name
It's called expected value.
The average outcome of a random event if you played it many times. Positive EV → take the bet. Negative EV → walk away.
Expected Value
EV = (probability of A × outcome A) + (probability of B × outcome B) + ...
For the coin bet: EV = (0.5 × +$20) + (0.5 × −$15) = $10 − $7.50 = +$2.50
A few quick EV calculations
Roll a die. Win $30 on a 6, lose $5 otherwise.
EV = (1/6 × $30) + (5/6 × −$5) = $5 − $4.17 = +$0.83 per roll. Take it.
Same game but $10 win on a 6, lose $5 otherwise.
EV = (1/6 × $10) + (5/6 × −$5) = $1.67 − $4.17 = −$2.50 per roll. Skip it.
Coin flip. Win $1, lose $1.
EV = (0.5 × $1) + (0.5 × −$1) = $0. Fair bet. Neither side has an edge.
Build your own bet
Dial the numbers. Watch the EV update.
A two-outcome bet: probability of winning, payout when you win, cost when you lose. Find the edge — or the trap.
▸ Notice something subtle
A bet can have high probability of winning and still be a bad deal. Or low probability and still be great. Probability alone doesn't tell you what to do — the outcomes matter just as much. Try setting the probability to 99% with a $1 payout and a $200 loss. You'll win almost every time. But the EV is brutally negative.
A real-world bet
The lottery.
A ticket costs $2. The jackpot is $50 million. Odds of winning: 1 in 300 million. Should you buy a ticket?
Expected Value of one lottery ticket
EV = (1/300,000,000 × $50,000,000) + (≈1 × −$2)
= $0.167 − $2.00
= −$1.83 per ticket
▸ The lottery is mathematically a terrible bet
Every $2 you spend on lottery tickets, on average, returns 17 cents. You lose $1.83 per ticket in expected value. The jackpot is so massive it sounds appealing — but it's far smaller than 300 million times the ticket price. The math doesn't lie.
Why do people still buy? Two real reasons. Entertainment value — $2 for a few days of imagining yourself rich isn't crazy if you don't buy many. And the math fails for "ruin avoidance" — most life-changing events require some capital, and the lottery is, for some, the only conceivable path to that capital, however unlikely.
Now the opposite puzzle
Insurance.
You pay $1,200/year for home insurance. Your chance of a claim is roughly 4% per year. The average claim payout is $15,000. What's the EV? And should you still buy it?
EV of buying insurance (from your perspective)
EV = (0.04 × +$15,000) + (0.96 × $0) − $1,200
= $600 − $1,200
= −$600 per year
So why on earth buy insurance?
Because ruin isn't on the EV calculation.
When EV is the wrong question
If you lose your home and have no insurance, you don't just lose $300,000 — you lose your ability to keep playing. You can't average that loss over a lifetime because there's no lifetime left. Mathematicians call this ruin risk — when one bad outcome ends the game.
For ruin-risk decisions, smart players willingly accept negative EV in exchange for variance reduction. Insurance, diversification, seatbelts, vaccines. You're paying to avoid the catastrophic tail, not to maximize the average.
▸ The deeper rule
EV is the right tool when you can play the bet many times and small losses don't end you. Coin flips, dice games, business decisions, investments at small position size — all EV-driven. EV is the wrong tool when one loss is catastrophic. Then variance, not average, is what to manage.
See the variance for yourself
Play the same bet 1,000 times in a row.
A bet with positive EV should make you money long-term. But you'll see along the way that the path is bumpy — sometimes brutally so.
Cumulative Profit Over 1,000 BetsEV per bet: +$4.00
FINAL P&L
$0
WORST DRAWDOWN
$0
BEST PEAK
$0
RUINED?
—
Press Run 1,000 Bets to watch a single path. Press Run 3 Paths to compare how wildly the same bet can play out depending on luck.
▸ The lesson hidden inside
Even with a clearly positive EV (+$4 per bet), the path is wild. A single run might end up with $4,000 profit, or with $1,500 loss, or anywhere in between. Positive expected value is a long-run promise, not a short-run guarantee — exactly the same lesson the LLN teaches in the next tab. The bigger your bet size, the wilder the swings. The more bets you play, the more reliably EV realizes.
One last subtle idea
Even on a positive-EV bet, you can bet too much.
If you bet 100% of your bankroll on every positive-EV flip, you'll go bust. Sounds wrong? Watch.
The math of doubling down
Say you have $1,000, and you bet it all on a coin flip that pays 2-to-1 (you win $2,000 on heads, lose $1,000 on tails). EV is positive: (0.5 × $2,000) + (0.5 × −$1,000) = +$500 per bet.
Play once: 50/50 you have $3,000 or $0.
Play twice (if you survived round one): 50/50 you have $9,000 or $0.
Probability of being ruined after 10 rounds: 1 − 0.5¹⁰ = 99.9%.
Positive EV did not save you. The size of your bet, relative to your bankroll, determines whether you survive long enough for EV to play out.
The Kelly criterion in one sentence
For positive-EV bets, the optimal fraction of your bankroll to bet isn't 100%. It's something smaller — driven by how strong your edge is and how big your potential loss is. Named after John Kelly (1956), the rule maximizes long-run growth while avoiding ruin. Professional poker players, hedge fund managers, and sports bettors all think in Kelly fractions, not raw dollar amounts.
▸ Where this shows up in real life
Investing: position sizing matters more than stock picking. A 90% confident trade still shouldn't be your entire portfolio.
Poker: the entire game is EV calculation, hand by hand, plus pot management.
Business decisions: startup founders who go all-in on one bet have higher returns and higher ruin rates than those who diversify. EV is necessary. Sizing is what makes EV realizable.
Quick check
A bet: 70% chance of winning $10, 30% chance of losing $30. What's the EV?
−$2. EV = (0.7 × +$10) + (0.3 × −$30) = $7 − $9 = −$2. You'd win 70% of the time but the rare losses are too big — skip this bet.
Why do people buy insurance even though its expected value is negative?
Catastrophic-loss avoidance. EV assumes you can play many times — but if one bad outcome ends the game (house burns down, you can't rebuild), then variance is what matters, not average. Insurance trades expected dollars for survival.
You have a clearly positive-EV bet. To maximize long-run growth, you should bet:
A fraction of your bankroll. Betting 100% means one loss takes you to zero — and zero is permanent. The Kelly criterion gives the math, but the intuition is: positive EV only helps if you stay alive long enough to realize it.
Now you can act on probability
You've seen what EV is and why it works — sometimes. Next: the deep law that makes EV reliable in the first place. Why averages converge.
Why Probability Actually Works
Probability says "a coin lands heads 50% of the time." But your last 10 flips might have been 7 heads and 3 tails. Who's right? Both. Let's see why.
The puzzle to solve
If P(heads) = 50%, why don't 10 flips give exactly 5 heads?
The short answer: probability is a long-run truth, not a short-run guarantee. Let's watch this play out.
Watch it live
Press the button. We'll flip a coin 1,000 times and plot the running percentage of heads after each flip.
Running Proportion Of Heads0 flips · — heads
▸ What you'll see
The first 50 flips: wild. The line jumps between 30% and 70%. By flip 200 it's tightening. By flip 1000 it's hovering near 50% with tiny wiggles. The convergence is real but slow. Probability is a promise about the long run — and the long run takes a while to arrive.
This is called
The Law of Large Numbers.
As you do more trials, the observed frequency gets closer and closer to the true probability. Inevitably. Unavoidably.
Where this law shows up in real life
Insurance. An insurer can't predict whether you will crash your car. But across millions of customers, the average claim rate is rock-solid predictable. That's how they set premiums.
Polling. Ask 50 people who they'll vote for — the result is noisy. Ask 5,000 — the result tracks the true population very closely.
Casinos. Roulette has a 5.26% house edge. Any single spin, the house might lose big. Across millions of spins, their edge realizes with mathematical certainty. The casino isn't lucky. They're patient.
One more thing to watch
Why does the bell curve show up everywhere?
Let's drop some balls down a pegboard and find out.
The Galton Board, explained
A ball drops down through rows of pegs. At every peg it bounces left or right, equally likely — like a coin flip. After 10 rows of bounces, where does it land? Almost always near the middle, because it takes a long unbroken run of "all lefts" or "all rights" to drift to the edges.
0 BALLS DROPPED
▸ The bell curve — born from coin flips
Look at what just happened. Each ball took 10 independent random bounces. The collection of where they landed forms a bell curve. This is exactly the binomial from the last tab — but seen in physical form. The bell shape isn't a coincidence. It's forced by independence and counting. This is called the Central Limit Theorem, and it explains why bell curves appear so often in nature and business.
The deepest law in probability
Take any random process — weird, skewed, lumpy, doesn't matter. Take many independent samples and compute their average. The averages form a bell curve, centered at the true mean. This is why polling works, why insurance works, why averaging investments reduces risk, why bell curves appear when you measure heights, errors, returns. It is the most consequential result in all of probability.
Final stretch
You now have all the tools. Up next: three famous puzzles that have humbled mathematicians, doctors, and Nobel laureates. Time to apply what you've learned.
Three Famous Puzzles
Three problems that have tripped up mathematicians, doctors, and Nobel laureates. You have all the tools to solve them now. Let's take them one at a time — slowly.
Before we begin
Probability is counterintuitive.
Your first guesses on these puzzles will probably be wrong. That's normal. The cure isn't intelligence — it's counting carefully.
Puzzle One — The Monty Hall Problem
You're on a game show. Three doors. One car. Two goats.
Here's how it works
1. You pick a door (say Door 1).
2. The host — who knows where the car is — opens another door (say Door 3) to reveal a goat.
3. The host then asks: "Do you want to switch to Door 2?"
Pause and guess
Should you stay, switch, or does it not matter?
Most people say "it doesn't matter — two doors left, 50/50." That answer is wrong. Play the game and see.
1
?
2
?
3
?
Pick a door to begin.
Wins by Staying
0 / 0
Win rate: —
Wins by Switching
0 / 0
Win rate: —
▸ The reveal — switching wins 2/3 of the time
Here's why. When you first picked, you had a 1/3 chance of being right. That means 2/3 of the probability lived behind the other two doors combined. When the host opens one of those (knowing it's a goat), all that 2/3 probability collects on the remaining unopened door. So switching gives you 2/3. Staying gives you 1/3. The host's knowledge is what makes this work.
Puzzle Two — The Birthday Paradox
How many people in a room before two share a birthday?
Pause and guess
There are 365 days. So intuitively, you might guess around 180 people. The real answer is 23. Let's see why.
P(Shared Birthday)
50.7%
At least one pair shares
Number of Pairs
253
= n × (n−1) / 2
P(Shared Birthday) vs People In Room
▸ Why so few people?
The trick: you're not asking "does someone share my birthday?" You're asking "do any two people share?" With 23 people, there are 253 different pairs being compared. 253 chances of a match adds up fast. The pair-counting grows quickly: 10 people = 45 pairs. 30 people = 435 pairs. "Some pair, anywhere" probabilities compound much faster than intuition expects.
Puzzle Three — The Gambler's Fallacy
A coin lands Heads ten times in a row. What's the chance the next flip is Tails?
Most people think
"It's due for tails. Maybe 80% chance of tails now." Wrong.
The answer
Still 50%.
The coin has no memory. Past flips don't influence future flips. Believing otherwise is the gambler's fallacy — and it has bankrupted more gamblers than any other probability error.
Why the fallacy feels right
You know that in the long run, heads happen 50% of the time. So when 10 heads happen in a row, your brain thinks "the universe will balance it out." But the universe doesn't correct individual streaks — it averages across much larger samples. The 10 heads happened. They're done. The next flip starts fresh at 50/50. The long-run convergence is not retroactive correction — it's that future events are normal.
▸ The mirror error — Hot Hand Fallacy
The opposite error: assuming streaks continue. A trader sees three winning days and decides he's "in the zone." A gambler sees red hit four times and bets red again because it's "running hot." Both fallacies come from the same misunderstanding: independent events have no memory. Each flip, spin, or trade resets the probability.
You've reached the end
From one coin flip to Bayes' theorem and Monty Hall. You now have the conceptual machinery that statistics, finance, science, and machine learning are built on.
Come back when you want to refresh — these ideas reward repeated visits.