AI Games Information Theory

How AI Guesses What You're Thinking in Just 4 Questions

March 26, 2026 · 8 min read · pure-flon

The Game That Seems Impossible

Think of any object. A cat, a skyscraper, a slice of pizza. Now imagine an AI asks you just 4-6 yes/no questions and correctly guesses what you were thinking.

Sounds like magic? It's actually math. Specifically, it's Shannon entropy and information theory — the same principles that power compression algorithms, cryptography, and modern machine learning.

Let's break down exactly how it works.

Shannon Entropy: Measuring Uncertainty

In 1948, Claude Shannon published "A Mathematical Theory of Communication" and forever changed how we think about information. His key insight: information is surprise.

If I tell you "the sun rose today," that carries almost zero information — you already expected it. But if I tell you "it snowed in the Sahara," that's highly informative because it's unexpected.

Shannon formalized this with entropy, a measure of uncertainty:

H(X) = -Σ p(x) · log2 p(x)
Shannon Entropy — measured in bits

The higher the entropy, the more uncertain you are, and the more information you need to resolve that uncertainty.

A Simple Example

Imagine a bag with 8 equally likely items. The entropy is:

H = -8 × (1/8 × log2(1/8))
H = -8 × (1/8 × -3)
H = 3 bits

This means you need exactly 3 yes/no questions to identify the item. Each perfect question cuts the remaining possibilities in half.

Binary Search on the Property Space

Here's how the AI actually works under the hood:

  1. Property database — Every object has a vector of properties: is it alive? is it bigger than a breadbox? can you eat it?
  2. Entropy calculation — For each possible question, the AI calculates how much entropy it would eliminate.
  3. Greedy selection — It picks the question that maximizes information gain — the question that most evenly splits the remaining candidates.
  4. Update beliefs — After your answer, the AI eliminates all objects inconsistent with it and recalculates.

This is essentially binary search on a property space rather than a sorted list.

Why 4 Questions, Not 20?

Humans playing 20 Questions waste questions. They ask things like "Is it a type of food?" when only 12% of objects are food. That question eliminates just 12% of possibilities on a "yes" or 88% on a "no" — very uneven.

The AI always asks questions that split possibilities close to 50/50, which is mathematically optimal.

StrategyAvg. QuestionsEfficiency
Random human15-20~30%
Experienced human8-12~55%
Entropy-optimal AI4-6~90%
Theoretical minimumlog2(N)100%

With a database of ~30 common objects, log2(30) = 4.9 bits. The AI typically needs 4-6 questions because real-world properties aren't perfectly binary and some objects share many properties.

Information Gain: Picking the Best Question

The secret sauce is information gain — the reduction in entropy after asking a question:

IG(Q) = H(before) - H(after | Q)
Information Gain — higher is better

A perfect question has information gain of exactly 1 bit — it cuts uncertainty in half. The AI greedily picks the highest-gain question at each step.

Example Walkthrough

Say we have 16 candidate objects. Entropy = 4 bits.

  • Q1: "Is it alive?" — Splits 8/8. Gain: 1 bit. Remaining: 3 bits.
  • Q2: "Can you hold it in your hand?" — Splits 4/4. Gain: 1 bit. Remaining: 2 bits.
  • Q3: "Is it found indoors?" — Splits 2/2. Gain: 1 bit. Remaining: 1 bit.
  • Q4: "Is it used daily?" — Splits 1/1. Gain: 1 bit. Solved!

4 questions, 16 objects, zero wasted bits.

Interactive: Entropy Calculator

Adjust the number of objects and see how many questions the AI needs:

30
Entropy: 4.91 bits
The AI needs ~5 questions to identify the object.

Try It Yourself!

Think of something and see if the AI can guess it in under 6 questions.

Play 20 Questions AI

Beyond Games: Where This Math Lives

Shannon entropy isn't just a party trick. It's the backbone of:

  • Data compression (ZIP, MP3, JPEG) — remove bits that carry no information
  • Decision trees in machine learning — split on features with highest information gain
  • Cryptography — maximize entropy to make messages unpredictable
  • Natural language processing — predict next words by measuring surprise
  • Medical diagnosis — ask the most discriminating tests first

The 20 Questions game is, in a way, the purest expression of how information theory works. Every question is a measurement. Every answer reduces uncertainty. Optimality means asking the right questions.

Key Takeaways

  1. Entropy measures uncertainty — more possible outcomes = more bits needed.
  2. Optimal questions split 50/50 — anything else wastes information capacity.
  3. log2(N) is the floor — you can't beat math, but you can get close.
  4. Real-world noise adds 1-2 extra questions — property overlap and ambiguity cost bits.
  5. This is how all AI "thinks" — from ChatGPT to medical AI, reducing uncertainty is the game.