The Game That Seems Impossible
Think of any object. A cat, a skyscraper, a slice of pizza. Now imagine an AI asks you just 4-6 yes/no questions and correctly guesses what you were thinking.
Sounds like magic? It's actually math. Specifically, it's Shannon entropy and information theory — the same principles that power compression algorithms, cryptography, and modern machine learning.
Let's break down exactly how it works.
Shannon Entropy: Measuring Uncertainty
In 1948, Claude Shannon published "A Mathematical Theory of Communication" and forever changed how we think about information. His key insight: information is surprise.
If I tell you "the sun rose today," that carries almost zero information — you already expected it. But if I tell you "it snowed in the Sahara," that's highly informative because it's unexpected.
Shannon formalized this with entropy, a measure of uncertainty:
The higher the entropy, the more uncertain you are, and the more information you need to resolve that uncertainty.
A Simple Example
Imagine a bag with 8 equally likely items. The entropy is:
H = -8 × (1/8 × log2(1/8))
H = -8 × (1/8 × -3)
H = 3 bits
This means you need exactly 3 yes/no questions to identify the item. Each perfect question cuts the remaining possibilities in half.
Binary Search on the Property Space
Here's how the AI actually works under the hood:
- Property database — Every object has a vector of properties: is it alive? is it bigger than a breadbox? can you eat it?
- Entropy calculation — For each possible question, the AI calculates how much entropy it would eliminate.
- Greedy selection — It picks the question that maximizes information gain — the question that most evenly splits the remaining candidates.
- Update beliefs — After your answer, the AI eliminates all objects inconsistent with it and recalculates.
This is essentially binary search on a property space rather than a sorted list.
Why 4 Questions, Not 20?
Humans playing 20 Questions waste questions. They ask things like "Is it a type of food?" when only 12% of objects are food. That question eliminates just 12% of possibilities on a "yes" or 88% on a "no" — very uneven.
The AI always asks questions that split possibilities close to 50/50, which is mathematically optimal.
| Strategy | Avg. Questions | Efficiency |
|---|---|---|
| Random human | 15-20 | ~30% |
| Experienced human | 8-12 | ~55% |
| Entropy-optimal AI | 4-6 | ~90% |
| Theoretical minimum | log2(N) | 100% |
With a database of ~30 common objects, log2(30) = 4.9 bits. The AI typically needs 4-6 questions because real-world properties aren't perfectly binary and some objects share many properties.
Information Gain: Picking the Best Question
The secret sauce is information gain — the reduction in entropy after asking a question:
A perfect question has information gain of exactly 1 bit — it cuts uncertainty in half. The AI greedily picks the highest-gain question at each step.
Example Walkthrough
Say we have 16 candidate objects. Entropy = 4 bits.
- Q1: "Is it alive?" — Splits 8/8. Gain: 1 bit. Remaining: 3 bits.
- Q2: "Can you hold it in your hand?" — Splits 4/4. Gain: 1 bit. Remaining: 2 bits.
- Q3: "Is it found indoors?" — Splits 2/2. Gain: 1 bit. Remaining: 1 bit.
- Q4: "Is it used daily?" — Splits 1/1. Gain: 1 bit. Solved!
4 questions, 16 objects, zero wasted bits.
Interactive: Entropy Calculator
Adjust the number of objects and see how many questions the AI needs:
The AI needs ~5 questions to identify the object.
Try It Yourself!
Think of something and see if the AI can guess it in under 6 questions.
Play 20 Questions AIBeyond Games: Where This Math Lives
Shannon entropy isn't just a party trick. It's the backbone of:
- Data compression (ZIP, MP3, JPEG) — remove bits that carry no information
- Decision trees in machine learning — split on features with highest information gain
- Cryptography — maximize entropy to make messages unpredictable
- Natural language processing — predict next words by measuring surprise
- Medical diagnosis — ask the most discriminating tests first
The 20 Questions game is, in a way, the purest expression of how information theory works. Every question is a measurement. Every answer reduces uncertainty. Optimality means asking the right questions.
Key Takeaways
- Entropy measures uncertainty — more possible outcomes = more bits needed.
- Optimal questions split 50/50 — anything else wastes information capacity.
- log2(N) is the floor — you can't beat math, but you can get close.
- Real-world noise adds 1-2 extra questions — property overlap and ambiguity cost bits.
- This is how all AI "thinks" — from ChatGPT to medical AI, reducing uncertainty is the game.