Learn Basics of Probability — through simulation

Live Simulation · Drag · Flip · Roll

10 Tabs · Coin → Cards → Bayes → Monty Hall

Open Workshop · Free Forever

What is probability?

Let's go slowly. Probability is simpler than it looks. By the end of this workshop, you'll see it clearly — but only if we take it one small step at a time.

Step One

Probability is just counting.

Count what can happen. Count what you want. Divide. That's it.

Step Two — Watch this coin

HEADS

TAILS

Two possible outcomes. Each equally likely. The probability of heads is one out of two.

Step Three — Three ways to say the same thing

FRACTION

0.5

DECIMAL

50%

PERCENT

All three mean the same thing. Use whichever feels easiest.

Step Four — The boundaries

Every probability lives somewhere on this line — between 0 (won't happen) and 1 (will definitely happen). Nothing higher, nothing lower.

Step Five — Our three tools

The Coin

2 outcomes

The Die

6 outcomes

A ♥ ♥ A ♥

The Deck

52 outcomes

Every concept in this workshop will be shown with these three. Familiar objects. Honest math.

▸ One promise before we start

Probability is the most counter-intuitive branch of math. Even experts get it wrong. Your gut feelings will sometimes mislead you — and that's fine. We'll move slowly, one small idea at a time. By Tab VIII, you'll be solving problems that have fooled mathematicians for decades.

Single Events

One coin. One roll. One card. Let's start with the simplest possible question — and earn our way up from there.

The one formula you need

P(event) = favorable ÷ total

Count the outcomes you want. Count all possible outcomes. Divide.

Try it with a coin

A coin has 2 possible outcomes. 1 is heads. So P(heads) = 1/2.

Now Let's Flip 100 Times

HEADS

TAILS

Total Flips

Heads

Tails

▸ What just happened?

Flip 10 times — you might get 7 heads and 3 tails. Surprising? Don't worry. Flip 100 times — you'll land closer to 50/50, maybe 54/46. Probability is a long-run statement, not a guarantee about the next flip. The fewer flips, the more the result can wander. That's normal.

Now try a die

6 outcomes. Each equally likely.

So P(rolling a 6) = 1 ÷ 6 ≈ 16.7%.

Last roll: 5

Roll History0 rolls

Let's count a few die questions together

P(rolling a 6): 1 favorable face, 6 total → 1/6 ≈ 16.7%

P(rolling an even number): faces 2, 4, 6 work → 3 favorable, 6 total → 3/6 = 1/2 = 50%

P(rolling 4 or higher): faces 4, 5, 6 work → 3/6 = 1/2

P(rolling a 7): no face has 7 → 0/6 = 0 (impossible)

Now a deck of cards

52 cards. 4 suits × 13 ranks.

Hearts ♥ and Diamonds ♦ are red. Spades ♠ and Clubs ♣ are black.

Draw a Card

Five card probabilities worth knowing

P(any specific card) = 1/52 ≈ 1.9%

P(an Ace) — there are 4 aces → 4/52 = 1/13 ≈ 7.7%

P(a Heart) — there are 13 hearts → 13/52 = 1/4 = 25%

P(a Red card) — 13 hearts + 13 diamonds → 26/52 = 1/2 = 50%

P(a Face card — J, Q, K) — 3 face cards × 4 suits → 12/52 ≈ 23.1%

Quick check

Two questions. Take your time. The answers will appear after you tap.

If you roll a die, what's the probability of rolling a number less than 3?

1/3 is right. "Less than 3" means rolling a 1 or 2 — that's 2 favorable outcomes out of 6 possible. 2/6 = 1/3 ≈ 33.3%.

Drawing one card from a deck — what's the probability it's a black face card?

6/52 is correct. Black suits are spades (♠) and clubs (♣). Face cards are J, Q, K. So there are 3 face cards × 2 black suits = 6 black face cards out of 52.

OR — When You Want One Thing Or Another

Sometimes you don't care exactly what happens — you just need any of a few things to work out. That's an OR question. Let's start with the easiest version.

A simple question

Roll a die. What's the probability of getting a 5 or a 6?

Just count it out

2 faces (the 5 and the 6) out of 6 total. So P = 2/6 = 1/3.

Notice what we just did

P(5) = 1/6. P(6) = 1/6. We added them: 1/6 + 1/6 = 2/6.

When two things can't both happen (a die shows one face at a time), you just add the probabilities together.

The simplest OR rule P(A or B) = P(A) + P(B)

Works when A and B can't both happen.

More easy examples

P(coin lands Heads OR Tails) = 1/2 + 1/2 = 1 = 100%. One of them must happen. Certain.

P(rolling 1, 2, or 3) = 1/6 + 1/6 + 1/6 = 1/2 = 50%. Works with three options too.

P(drawing a Heart or a Club) = 13/52 + 13/52 = 26/52 = 1/2. A card has only one suit at a time.

But what if two things can happen together?

What's the probability of drawing a Heart OR a Face card?

There are 13 hearts. There are 12 face cards. So 13 + 12 = 25 favorable cards?
Let's see all 52 and find out.

All 52 Cards · Tap the toggles to highlight categories

Hearts

Face cards

Both

J♥ Q♥ K♥

Total Unique

= 13 + 12 − 3

The same idea in classic form — a Venn diagram

Two overlapping circles. Each card lives in exactly one region. The middle holds the cards counted twice.

The fuller OR rule P(A or B) = P(A) + P(B) − P(both)

P(Heart or Face) = 13/52 + 12/52 − 3/52 = 22/52 ≈ 42.3%

A two-word rule for choosing

Can both happen at once? If no → just add the probabilities. If yes → add them, then subtract the overlap. That's the whole rule, in two short questions.

Try it yourself

Pick two events and the simulator will work out whether they overlap, and apply the right formula.

Event A

Event B

Quick check

Two questions to make sure it stuck.

From one card draw — probability of drawing a King OR a Queen?

8/52. A card can't be both a King and a Queen — these don't overlap. So just add: 4 Kings + 4 Queens = 8 favorable out of 52.

Probability of drawing a Heart OR an Ace?

16/52. These overlap — the Ace of Hearts is both. So: 13 Hearts + 4 Aces − 1 (overlap) = 16.

AND — When You Want Both Things To Happen

Sometimes one thing isn't enough — you want both events to work out. The math for this is just as simple as OR, but it goes the other direction: instead of adding, you multiply.

A simple question

Flip a coin twice. What's the probability of getting two heads?

Let's list every possibility

ALL 4 POSSIBLE OUTCOMES — EACH EQUALLY LIKELY

✓ WIN

1 out of 4 outcomes is HH. So P(two heads) = 1/4 = 25%.

Notice the pattern

P(H) = 1/2. P(H) = 1/2. And 1/2 × 1/2 = 1/4.

When two events don't affect each other — they're independent — you multiply their probabilities.

The AND rule P(A and B) = P(A) × P(B)

Works when A and B are independent — neither one changes the other.

What "independent" means

Two events are independent when one happening tells you nothing about whether the other will happen. Coin flips are independent — flipping one heads doesn't change the next flip's probability. Rolling a die and drawing a card are independent. Two consecutive lottery draws are independent. Most "different objects, different actions" pairs are independent.

Let's stretch it

Flip a coin n times. What's P(all heads)?

Each flip is independent. Just multiply 1/2 by itself, n times.

Number of flips: 3

P(All Heads)

12.5%

Shrinks fast as n grows

Total Outcomes

= 2 × 2 × 2 × ... (n times)

P(At Least One Tails)

87.5%

= 1 − P(All Heads)

▸ A useful trick

Look at the third stat. P(at least one tails) = 1 − P(no tails) = 1 − P(all heads). When "at least one" is hard to count, flip the question: ask "what's the chance of none?" instead, then subtract from 1. This is called the complement trick, and it'll save you hours of calculation in harder problems.

One more — mixing different objects

Flip a coin AND roll a die AND draw a card.
What's P(Heads and a 6 and an Ace)?

P(H) × P(6) × P(Ace)
= 1/2 × 1/6 × 4/52
= 1/156 ≈ 0.64%

Watch how small probabilities get

Each event was reasonable on its own: 50%, 16.7%, 7.7%. Combined, less than 1%. This is why predicting compound futures is so hard — each step multiplies the chances down rapidly. A weather forecast that needs three independent conditions to all line up, each at 70%, is really 0.7 × 0.7 × 0.7 ≈ 34%. Always weaker than you'd think.

Quick check

A coin flipped 4 times — what's the probability of getting HHHH?

1/16. Each flip is independent at 1/2. Multiply four times: 1/2 × 1/2 × 1/2 × 1/2 = 1/16.

Roll two dice. What's the probability that both show a 6?

1/36. Each die is independent. 1/6 × 1/6 = 1/36. Or: 6 × 6 = 36 possible outcomes when rolling two dice, only one of which is (6,6).

When Information Changes The Picture

Up to now, all our probabilities have been about events seen fresh, with no prior knowledge. But what happens when you learn something partway through? That's where probability becomes truly useful.

A story with marbles

A jar holds 10 marbles. Six red. Four blue.

You reach in without looking, take one out, set it aside. Then take another. The second pick is the interesting one.

10 marbles · 6 red · 4 blue

Notice what just happened

The probability of the second pick depended on the first.

Two different worlds for the second pick

If the first pick was Red: 5 red + 4 blue remain. P(Red on second) = 5/9 ≈ 55.6%

If the first pick was Blue: 6 red + 3 blue remain. P(Red on second) = 6/9 ≈ 66.7%

Same jar. Same second action. But the probability is different depending on what we learned from the first pick. That's conditional probability.

The notation

P(A | B) means "probability of A, given B."

The vertical bar is read as "given that". P(Red on 2nd | Red on 1st) = 5/9.

A new way to think about it

When you learn that B happened, you've entered a smaller world. You're no longer asking "out of all possible outcomes..." — you're asking "out of the outcomes where B is true, what fraction also have A?" The denominator changes.

A worked example with cards

You drew a card. A friend peeks and says: "It's a face card."

Given that info, what's the chance it's a King?

Count inside the smaller world

Normally, P(King) = 4/52 = 1/13. But now we know it's a face card. That means it's one of these 12 cards:

J, Q, K of ♠ · J, Q, K of ♥ · J, Q, K of ♦ · J, Q, K of ♣

Of those 12 face cards, 4 are Kings. So:

P(King | Face) = 4/12 = 1/3 ≈ 33.3%

The formula P(A | B) = P(A and B) ÷ P(B)

"Of all the times B happens, what fraction also have A?"

A tool for these problems — the tree diagram

Draw the branches. Multiply along each path. Every path's probability tells the whole sequential story.

How to read this tree

You're drawing two cards without replacement. The first branch splits into Heart (1/4) or Not-Heart (3/4). Each of those splits again — but the second-pick odds depend on the first. After a Heart, only 12 Hearts remain in 51 cards (12/51). After a Not-Heart, all 13 Hearts still remain in 51 cards (13/51). Multiply along any path to get that outcome's probability. All four end-probabilities sum to 100%.

▸ Why this matters

Conditional probability is the bridge from counting to reasoning. Every Bayesian inference (next tab), every diagnostic test, every "given X, what's the chance of Y" question rests on this single idea: information changes the picture. Master this tab and Bayes will click easily.

Quick check

A die was rolled. Someone tells you the result was even. Given that, what's the probability it was a 6?

1/3. Given the result was even, there are only 3 possibilities: 2, 4, or 6. The 6 is one of them. P(6 | even) = 1/3.

Draw two cards without replacement. Probability that both are Aces?

4/52 × 3/51. Without replacement, the second probability changes. After drawing one Ace, 3 Aces remain in 51 cards.

Bayes — When Evidence Updates Belief

A medical test with 99% accuracy comes back positive. What's the chance you actually have the disease? Probably not what you think. Slow down here. This tab rewires intuition.

A small story to begin

A rare disease affects 1 in 1,000 people.

The test for it is 99% accurate. You take it. It comes back positive.

Pause and guess

What's the chance you actually have the disease?

Most people guess 95% or 99%. The real answer will surprise you. Keep going.

Let's count instead

Forget percentages for a moment. Imagine 10,000 actual people, and let's count what would happen.

1Picture 10,000 people taking the test

Base rate (disease prevalence): 0.1%

Test sensitivity (catches the disease): 99%

Test specificity (correctly clears healthy): 99%

Each square = 1 person

True Positive

False Positive

Now the answer

Of people who test positive...

9.0%

Actually have the disease

Of people who test positive...

91.0%

Are false alarms

A positive result on a 99%-accurate test for a rare disease means you still probably don't have it.

Why?

Because the disease is rare. Even a small false-positive rate (1%) applied to many healthy people produces more false alarms than the test catches real cases.

2The story in plain numbers

The formula (only after the intuition)

Now that the idea makes sense, here's the official name for what we just did. It's called Bayes' Theorem.

Bayes' Theorem P(Disease | Positive) = P(Positive | Disease) × P(Disease) ÷ P(Positive)

Read as: "Of all positive tests, what fraction actually have the disease?"

Plugging in numbers — the original problem

Given: P(Disease) = 0.001 · P(Positive | Disease) = 0.99 · P(Positive | NoDisease) = 0.01

Step 1 — find P(Positive):
= (0.99 × 0.001) + (0.01 × 0.999)
= 0.00099 + 0.00999 = 0.01098

Step 2 — apply the formula:
P(Disease | Positive) = (0.99 × 0.001) ÷ 0.01098
= 9.0%

▸ The deeper lesson

Bayes' theorem solves a problem that comes up everywhere: you have some evidence, what should you now believe? Doctors face it with test results. Courts face it with DNA evidence. Spam filters face it with new emails. Self-driving cars face it with sensor readings. The lesson, repeated: base rates matter enormously. Rare things stay rare even when evidence points to them.

Quick check

Two questions to lock it in.

A test is 95% accurate. The condition affects 1% of the population. You test positive. The probability you have it is closest to:

About 16%. Of 10,000 people: 100 have it (95 test positive). 9,900 don't have it (495 false positives). Total positives: 590. True positives: 95. 95/590 ≈ 16%. Most positives are false.

What change shifts the Bayesian conclusion the most?

Base rate. A 100× increase in prevalence (0.1% → 10%) shifts the conclusion enormously. This is why screening rare conditions in healthy populations often produces more false alarms than true cases — and why doctors are skeptical of positives in low-risk patients.

When You Want To See All Outcomes At Once

So far, we've asked about specific events: "what's the chance of X?" Now we'll ask something bigger: "what's the chance of every possible result?" That gives us a distribution.

A natural question

Flip a coin 4 times. How many heads will you get?

Could be 0. Could be 1, 2, 3, or 4. Some are more likely than others.

Let's count carefully

There are 16 possible sequences of 4 flips.

Each is equally likely. But several sequences give the same total of heads. Let's group them.

HEADS

1 way

6.25%

HEAD

4 ways

25%

HEADS

6 ways

37.5%

HEADS

4 ways

25%

HEADS

1 way

6.25%

2 heads is the most likely (37.5%) because more sequences add up to that — like HHTT, HTHT, HTTH, THHT, THTH, TTHH.

That's a distribution

A distribution just lists every possible outcome and how likely each is.

When the experiment is repeated independent yes/no trials (like coin flips), the distribution has a special name: the binomial distribution.

Play with it

Try changing the number of flips and the coin's bias. Watch how the shape moves.

Number of flips n: 10

Probability of heads p: 0.50

Highlight outcome k: 5

P(getting exactly k heads in n flips) P(5) = 24.6%

Most Likely

5 heads

The peak of the distribution

Average (Mean)

5.0

= n × p · long-run average

Spread (Std Dev)

1.58

= √(n·p·(1−p))

▸ What to notice

Crank n up to 30. Notice how the shape becomes a bell. Drag p away from 0.5 — the peak shifts left or right. Distributions are more useful than single probabilities because they tell you not just the average outcome but how much it varies. A model that gives only "the average" is hiding most of the truth.

Asking real questions

"What's the chance of at least 7 heads?"

For ranges, we add up the relevant bars.

The formula (after the picture)

Now that you've seen the shape, here's the formula behind it.

Binomial Probability P(exactly k heads in n flips) = C(n,k) × p^k × (1−p)^n−k

C(n,k) counts how many ways you can arrange k heads among n flips. p^k handles the heads. (1−p)^n−k handles the tails.

A real-world reading

A bank approves 70% of loan applications. They get 10 applications today. What's the chance they approve at least 8?

This is binomial with n = 10, p = 0.70, k ≥ 8. Add up P(8) + P(9) + P(10) — roughly 38%.

Even with a 70% approval rate, getting all 10 approved is unusual (0.7¹⁰ ≈ 2.8%). The shape of the distribution matters more than the average rate.

Distributions have shape — and sometimes they lean

Not every distribution is symmetric.

Real-world data is rarely a clean bell. Most distributions lean one way or the other. That lean has a name: skewness.

Three shapes to recognize

LEFT-SKEWED

Long tail on the left. Mean < Median.

SYMMETRIC

No lean. Mean = Median. The bell.

RIGHT-SKEWED

Long tail on the right. Mean > Median.

median mean

A simple way to remember

The tail tells you the direction.

If the tail stretches to the right, it's right-skewed (also called positive skew). If the tail stretches left, it's left-skewed (negative skew). The peak is on the opposite side.

Why mean and median pull apart

In a symmetric distribution, mean and median sit in the same place — right in the middle. But in a skewed distribution, the extreme values in the long tail pull the mean toward them. The median, being just the middle ranked value, stays closer to the bulk. So in right-skewed data, mean > median. In left-skewed data, mean < median. Memorize this — it's how you spot skew without seeing the chart.

Play with it

Drag the slider. Pull the tail.

Watch what happens to the shape — and especially to the mean (red) versus the median (black). One chases the tail. The other holds its ground.

Skewness: 0.0

        ← LEFT-SKEWEDSYMMETRICRIGHT-SKEWED →
      

SHAPE

Symmetric

MEAN

50.0

MEDIAN

50.0

GAP

0.0

Mean and median are equal. This is what a perfectly symmetric distribution looks like — like a bell curve, or coin flips.

JUMP TO A REAL EXAMPLE

▸ What you should notice

Start with the slider at zero — mean and median are right on top of each other. Now drag it to the right. The mean (red line) slides toward the tail. The median (black line) barely moves. Now drag it back to the left — the mean swings the other way. The further the slider goes, the bigger the gap between mean and median. This is exactly what skewness is — quantified asymmetry between the bulk of the data and its extremes.

Where skewness shows up — example 1

Long-term stock returns

Here's something counterintuitive: most individual stocks lose money over their lifetime. The market as a whole still wins — because a few stocks win enormously. This is positive skewness in its starkest form.

The Bessembinder study (2018)

Hendrik Bessembinder of Arizona State University looked at every single U.S. stock traded between 1926 and 2016 — about 26,000 stocks in total. He asked one question: did each stock's lifetime return beat one-month Treasury bills (essentially the risk-free rate)? The result was startling.

Lifetime Returns Of ~26,000 US Stocks (1926–2016)

Hover any bar to inspect it · Try the three views below

HOVER A BAR FOR DETAIL

Default view: ~26,000 US stocks bucketed by their lifetime compound return. The vertical dashed line marks the T-bill benchmark. Bars left of the line are stocks that lost to T-bills. Bars right are winners.

The numbers that changed how people think about stocks

Bessembinder's findings, in plain numbers:

57.4% of all stocks (4 out of every 7) had lifetime returns worse than 1-month Treasury bills. The median stock was a money-loser.

Just 4.3% of stocks — about 1,092 companies out of 26,000 — accounted for the entire net wealth gain of the U.S. stock market over T-bills: roughly $35 trillion.

The remaining ~24,900 stocks combined contributed approximately zero excess return over T-bills.

The skewness coefficient of the lifetime return distribution: 154.8 — an extraordinarily high number. (A normal distribution has skewness of 0.)

▸ Why this matters — the case for indexing

If you pick a stock at random, you have a 57% chance of underperforming T-bills. Picking a few stocks compounds this risk. The market's celebrated long-run returns aren't from typical stocks — they come from a tiny handful of mega-winners (Apple, Microsoft, Amazon, and the like) whose extreme positive returns drag the average up. Miss those few, and you'd underperform.

This is the strongest statistical case for index investing ever made. An index fund guarantees you own the small subset of stocks that drive market gains. Stock pickers, by contrast, must correctly identify the rare winners — a task at which even professionals routinely fail.

Bessembinder's own words

From a 2024 interview: "The essence of a positively skewed distribution is that most of the individual outcomes are lower than the average outcome. The median outcome is less than the average outcome. The only way that can be true is that you've got a few, relatively few, outcomes that are much bigger than the average. So in some sense, this paper is really about skewness in compound returns."

Play it for real

Step into a stock picker's shoes.

Below is a simulated bag of 1,000 stocks whose proportions and shape are calibrated to mimic the Bessembinder distribution. Pick a few. See what happens.

The Bag · 1,000 Stocks

574 Losers (below T-bills)

383 Modest (small wins over T-bills)

40 Winners (big wins)

3 Mega-Winners (life-changing)

⌬ How this works

This is a random generator, not real stock data. Each "stock" in the bag is a synthetic number with a randomly drawn return, and the tickers (CMA, REZ, BNX, etc.) are made up.

What is faithful to Bessembinder (2018) is the shape and proportions: 57.4% lose to T-bills, 38.3% post modest wins, 4% are winners, and 0.3% are mega-winners whose extreme returns drag the average up. The point of the simulator isn't to mimic specific companies — it's to let you feel what positive skewness does to a small portfolio.

How many stocks will you pick?

Click a button above to pick stocks at random from the bag.
Your portfolio will appear here.

▸ Try this experiment

Click "Pick 1 Stock" ten times. Most of the time you'll land on a loser — about 57% of attempts. Now click "Pick 5" a few times. Still risky, but better. Now click "Pick All 1,000" — that's an index fund. You captured the mega-winners by force, simply by owning everything.

This is the Bessembinder argument made tactile. You can pick stocks and hope to find the Apple. Or you can own everything and guarantee you own the Apple. Math says the second strategy wins.

Where skewness shows up — example 2

In biology and human populations

Many natural and social distributions are right-skewed. The bulk clusters near a normal value — but a long tail of outliers stretches out.

Right-Skewed Distributions In Real Life — Stylized

Three quick examples

Human lifespan. Most people live 70–85 years. A few reach over 100. Almost nobody lives to 130. The distribution is mildly right-skewed — natural ceilings limit how far the tail can stretch.

Body weight, plant height, organ size. Biological measurements typically cluster around a typical value with a longer tail on the upper end. Growth processes (multiplicative, not additive) naturally produce right-skewed shapes.

Income. The most famously skewed of all. The median household income is far below the mean — because billionaires pull the average up while doing nothing to the middle. This is why governments report median income, not mean. The mean would lie.

A surprising universal pattern

Why do so many real-world distributions skew right rather than left? Often because the underlying process is multiplicative rather than additive. Wealth compounds. Bacteria multiply. Cities grow proportionally. When small things tend to grow by a percentage (not a fixed amount), the result is a log-normal-ish distribution — symmetric only when you look at the log of the values. Stock prices behave this way too. Stock returns are nearly symmetric in log terms — but stock price changes in dollars are right-skewed.

Why skewness matters

A bell curve is the exception, not the rule.

Most real-world data is skewed. Treating skewed data as if it were symmetric leads to wrong averages, mispriced risk, and bad decisions. Always check the shape.

▸ Three practical habits

1. Always plot a histogram before summarizing. The mean alone hides skew.

2. Report median for skewed data. Income, response times, lifetimes, stock losses — these all need medians, not means.

3. Watch the tails. In finance, biology, and operations, the long tail is where the rare-but-huge events live. Standard deviation underweights them. Tools like Value-at-Risk and CVaR exist because of skewness.

Quick check

In 4 coin flips, the most likely number of heads is:

2 heads. For a fair coin, n × p = 4 × 0.5 = 2. The probabilities go 6.25%, 25%, 37.5%, 25%, 6.25% for k = 0, 1, 2, 3, 4. More ways to get 2 heads than any other count.

Household income data: the mean is $85,000 and the median is $52,000. What does this tell you?

Right-skewed. Mean > Median means the long tail is on the right — high-income outliers pull the average up while leaving the median (the middle person) unchanged. Classic income shape.

In Bessembinder's study of 26,000 U.S. stocks, lifetime returns were positively skewed. What does that imply?

A few extreme winners pull the mean above the median. 57.4% of stocks underperformed Treasury bills over their lifetimes. The median stock lost to T-bills. But the mean was positive — because just 4.3% of stocks (1,092 companies out of 26,000) generated the entire $35 trillion in net wealth above T-bills. The market's celebrated returns come from a tiny tail of mega-winners.

Expected Value — How To Use Probability

Probability is interesting. Decisions are useful. Expected value is the bridge — the simplest, most powerful tool for asking: "should I take this bet?"

A friend offers you a deal

"Let's flip a coin. Heads, I give you $20. Tails, you give me $15."

Take it? Skip it? How would you even decide?

Imagine playing 100 times

Roughly 50 heads, 50 tails. So 50 × $20 = $1,000 gained. And 50 × $15 = $750 lost.
Net: +$250 over 100 flips. That's +$2.50 per flip on average.

That number has a name

It's called expected value.

The average outcome of a random event if you played it many times. Positive EV → take the bet. Negative EV → walk away.

Expected Value EV = (probability of A × outcome A) + (probability of B × outcome B) + ...

For the coin bet: EV = (0.5 × +$20) + (0.5 × −$15) = $10 − $7.50 = +$2.50

A few quick EV calculations

Roll a die. Win $30 on a 6, lose $5 otherwise.
EV = (1/6 × $30) + (5/6 × −$5) = $5 − $4.17 = +$0.83 per roll. Take it.

Same game but $10 win on a 6, lose $5 otherwise.
EV = (1/6 × $10) + (5/6 × −$5) = $1.67 − $4.17 = −$2.50 per roll. Skip it.

Coin flip. Win $1, lose $1.
EV = (0.5 × $1) + (0.5 × −$1) = $0. Fair bet. Neither side has an edge.

Build your own bet

Dial the numbers. Watch the EV update.

A two-outcome bet: probability of winning, payout when you win, cost when you lose. Find the edge — or the trap.

Probability of winning: 50%

Payout if you win: $20

Cost if you lose: $15

▸ Notice something subtle

A bet can have high probability of winning and still be a bad deal. Or low probability and still be great. Probability alone doesn't tell you what to do — the outcomes matter just as much. Try setting the probability to 99% with a $1 payout and a $200 loss. You'll win almost every time. But the EV is brutally negative.

A real-world bet

The lottery.

A ticket costs $2. The jackpot is $50 million. Odds of winning: 1 in 300 million.
Should you buy a ticket?

Expected Value of one lottery ticket EV = (1/300,000,000 × $50,000,000) + (≈1 × −$2)
= $0.167 − $2.00
= −$1.83 per ticket

▸ The lottery is mathematically a terrible bet

Every $2 you spend on lottery tickets, on average, returns 17 cents. You lose $1.83 per ticket in expected value. The jackpot is so massive it sounds appealing — but it's far smaller than 300 million times the ticket price. The math doesn't lie.

Why do people still buy? Two real reasons. Entertainment value — $2 for a few days of imagining yourself rich isn't crazy if you don't buy many. And the math fails for "ruin avoidance" — most life-changing events require some capital, and the lottery is, for some, the only conceivable path to that capital, however unlikely.

Now the opposite puzzle

Insurance.

You pay $1,200/year for home insurance. Your chance of a claim is roughly 4% per year. The average claim payout is $15,000.
What's the EV? And should you still buy it?

EV of buying insurance (from your perspective) EV = (0.04 × +$15,000) + (0.96 × $0) − $1,200
= $600 − $1,200
= −$600 per year

So why on earth buy insurance?

Because ruin isn't on the EV calculation.

When EV is the wrong question

If you lose your home and have no insurance, you don't just lose $300,000 — you lose your ability to keep playing. You can't average that loss over a lifetime because there's no lifetime left. Mathematicians call this ruin risk — when one bad outcome ends the game.

For ruin-risk decisions, smart players willingly accept negative EV in exchange for variance reduction. Insurance, diversification, seatbelts, vaccines. You're paying to avoid the catastrophic tail, not to maximize the average.

▸ The deeper rule

EV is the right tool when you can play the bet many times and small losses don't end you. Coin flips, dice games, business decisions, investments at small position size — all EV-driven. EV is the wrong tool when one loss is catastrophic. Then variance, not average, is what to manage.

See the variance for yourself

Play the same bet 1,000 times in a row.

A bet with positive EV should make you money long-term. But you'll see along the way that the path is bumpy — sometimes brutally so.

Win probability: 52%

Bet size: $100

Cumulative Profit Over 1,000 Bets EV per bet: +$4.00

FINAL P&L

WORST DRAWDOWN

BEST PEAK

RUINED?

—

Press Run 1,000 Bets to watch a single path. Press Run 3 Paths to compare how wildly the same bet can play out depending on luck.

▸ The lesson hidden inside

Even with a clearly positive EV (+$4 per bet), the path is wild. A single run might end up with $4,000 profit, or with $1,500 loss, or anywhere in between. Positive expected value is a long-run promise, not a short-run guarantee — exactly the same lesson the LLN teaches in the next tab. The bigger your bet size, the wilder the swings. The more bets you play, the more reliably EV realizes.

One last subtle idea

Even on a positive-EV bet, you can bet too much.

If you bet 100% of your bankroll on every positive-EV flip, you'll go bust. Sounds wrong? Watch.

The math of doubling down

Say you have $1,000, and you bet it all on a coin flip that pays 2-to-1 (you win $2,000 on heads, lose $1,000 on tails). EV is positive: (0.5 × $2,000) + (0.5 × −$1,000) = +$500 per bet.

Play once: 50/50 you have $3,000 or $0.
Play twice (if you survived round one): 50/50 you have $9,000 or $0.

Probability of being ruined after 10 rounds: 1 − 0.5¹⁰ = 99.9%.

Positive EV did not save you. The size of your bet, relative to your bankroll, determines whether you survive long enough for EV to play out.

The Kelly criterion in one sentence

For positive-EV bets, the optimal fraction of your bankroll to bet isn't 100%. It's something smaller — driven by how strong your edge is and how big your potential loss is. Named after John Kelly (1956), the rule maximizes long-run growth while avoiding ruin. Professional poker players, hedge fund managers, and sports bettors all think in Kelly fractions, not raw dollar amounts.

▸ Where this shows up in real life

Investing: position sizing matters more than stock picking. A 90% confident trade still shouldn't be your entire portfolio.

Poker: the entire game is EV calculation, hand by hand, plus pot management.

Business decisions: startup founders who go all-in on one bet have higher returns and higher ruin rates than those who diversify. EV is necessary. Sizing is what makes EV realizable.

Quick check

A bet: 70% chance of winning $10, 30% chance of losing $30. What's the EV?

−$2. EV = (0.7 × +$10) + (0.3 × −$30) = $7 − $9 = −$2. You'd win 70% of the time but the rare losses are too big — skip this bet.

Why do people buy insurance even though its expected value is negative?

Catastrophic-loss avoidance. EV assumes you can play many times — but if one bad outcome ends the game (house burns down, you can't rebuild), then variance is what matters, not average. Insurance trades expected dollars for survival.

You have a clearly positive-EV bet. To maximize long-run growth, you should bet:

A fraction of your bankroll. Betting 100% means one loss takes you to zero — and zero is permanent. The Kelly criterion gives the math, but the intuition is: positive EV only helps if you stay alive long enough to realize it.

Why Probability Actually Works

Probability says "a coin lands heads 50% of the time." But your last 10 flips might have been 7 heads and 3 tails. Who's right? Both. Let's see why.

The puzzle to solve

If P(heads) = 50%, why don't 10 flips give exactly 5 heads?

The short answer: probability is a long-run truth, not a short-run guarantee. Let's watch this play out.

Watch it live

Press the button. We'll flip a coin 1,000 times and plot the running percentage of heads after each flip.

Running Proportion Of Heads 0 flips · — heads

▸ What you'll see

The first 50 flips: wild. The line jumps between 30% and 70%. By flip 200 it's tightening. By flip 1000 it's hovering near 50% with tiny wiggles. The convergence is real but slow. Probability is a promise about the long run — and the long run takes a while to arrive.

This is called

The Law of Large Numbers.

As you do more trials, the observed frequency gets closer and closer to the true probability. Inevitably. Unavoidably.

Where this law shows up in real life

Insurance. An insurer can't predict whether you will crash your car. But across millions of customers, the average claim rate is rock-solid predictable. That's how they set premiums.

Polling. Ask 50 people who they'll vote for — the result is noisy. Ask 5,000 — the result tracks the true population very closely.

Casinos. Roulette has a 5.26% house edge. Any single spin, the house might lose big. Across millions of spins, their edge realizes with mathematical certainty. The casino isn't lucky. They're patient.

One more thing to watch

Why does the bell curve show up everywhere?

Let's drop some balls down a pegboard and find out.

The Galton Board, explained

A ball drops down through rows of pegs. At every peg it bounces left or right, equally likely — like a coin flip. After 10 rows of bounces, where does it land? Almost always near the middle, because it takes a long unbroken run of "all lefts" or "all rights" to drift to the edges.

      0 BALLS DROPPED
    

▸ The bell curve — born from coin flips

Look at what just happened. Each ball took 10 independent random bounces. The collection of where they landed forms a bell curve. This is exactly the binomial from the last tab — but seen in physical form. The bell shape isn't a coincidence. It's forced by independence and counting. This is called the Central Limit Theorem, and it explains why bell curves appear so often in nature and business.

The deepest law in probability

Take any random process — weird, skewed, lumpy, doesn't matter. Take many independent samples and compute their average. The averages form a bell curve, centered at the true mean. This is why polling works, why insurance works, why averaging investments reduces risk, why bell curves appear when you measure heights, errors, returns. It is the most consequential result in all of probability.

Three Famous Puzzles

Three problems that have tripped up mathematicians, doctors, and Nobel laureates. You have all the tools to solve them now. Let's take them one at a time — slowly.

Before we begin

Probability is counterintuitive.

Your first guesses on these puzzles will probably be wrong. That's normal. The cure isn't intelligence — it's counting carefully.

Puzzle One — The Monty Hall Problem

You're on a game show.
Three doors. One car. Two goats.

Here's how it works

1. You pick a door (say Door 1).

2. The host — who knows where the car is — opens another door (say Door 3) to reveal a goat.

3. The host then asks: "Do you want to switch to Door 2?"

Pause and guess

Should you stay, switch, or does it not matter?

Most people say "it doesn't matter — two doors left, 50/50." That answer is wrong. Play the game and see.

Pick a door to begin.

Wins by Staying

0 / 0

Win rate: —

Wins by Switching

0 / 0

Win rate: —

▸ The reveal — switching wins 2/3 of the time

Here's why. When you first picked, you had a 1/3 chance of being right. That means 2/3 of the probability lived behind the other two doors combined. When the host opens one of those (knowing it's a goat), all that 2/3 probability collects on the remaining unopened door. So switching gives you 2/3. Staying gives you 1/3. The host's knowledge is what makes this work.

Puzzle Two — The Birthday Paradox

How many people in a room before two share a birthday?

Pause and guess

There are 365 days. So intuitively, you might guess around 180 people. The real answer is 23. Let's see why.

People in the room: 23

P(Shared Birthday)

50.7%

At least one pair shares

Number of Pairs

253

= n × (n−1) / 2

P(Shared Birthday) vs People In Room

▸ Why so few people?

The trick: you're not asking "does someone share my birthday?" You're asking "do any two people share?" With 23 people, there are 253 different pairs being compared. 253 chances of a match adds up fast. The pair-counting grows quickly: 10 people = 45 pairs. 30 people = 435 pairs. "Some pair, anywhere" probabilities compound much faster than intuition expects.

Puzzle Three — The Gambler's Fallacy

A coin lands Heads ten times in a row.
What's the chance the next flip is Tails?

Most people think

"It's due for tails. Maybe 80% chance of tails now." Wrong.

The answer

Still 50%.

The coin has no memory. Past flips don't influence future flips. Believing otherwise is the gambler's fallacy — and it has bankrupted more gamblers than any other probability error.

Why the fallacy feels right

You know that in the long run, heads happen 50% of the time. So when 10 heads happen in a row, your brain thinks "the universe will balance it out." But the universe doesn't correct individual streaks — it averages across much larger samples. The 10 heads happened. They're done. The next flip starts fresh at 50/50. The long-run convergence is not retroactive correction — it's that future events are normal.

▸ The mirror error — Hot Hand Fallacy

The opposite error: assuming streaks continue. A trader sees three winning days and decides he's "in the zone." A gambler sees red hit four times and bets red again because it's "running hot." Both fallacies come from the same misunderstanding: independent events have no memory. Each flip, spin, or trade resets the probability.

You've reached the end

From one coin flip to Bayes' theorem and Monty Hall. You now have the conceptual machinery that statistics, finance, science, and machine learning are built on.

Come back when you want to refresh — these ideas reward repeated visits.