tl; DR: If you follow every hint you’ll always be left with only one word by your sixth guess…based on my 100,000 computer simulations
One day I started noticing colored squares appearing on my social media timelines. Like many people, I was curious about the meaning of the word “wordle” and quickly got into the routine of completing a daily “wordle” every morning. They are considerate enough to make the color blindness layout suitable for people like me.
If you haven’t played it, this is it.
Wordle reminds me of Mastermind where you try to guess the order of colors in the shortest amount of time. However, it gives you important information that the mastermind does not have: which letters are wrong guesses.
after sOur first five-character guess in Wordle that you’re getting important information. The letters light up in different colors to mean:
- The message is not in the word
- A letter in a word but in the wrong place
- The letter in the word and in the correct place
If I know there is no ‘e’ in the word, that definitely leaves out a lot of possible words. The letter “E” is the most common letter in the English language. And if you get other info like the third letter, it should be the letter ‘k’ which should narrow down the words.
Tyler Glaiel recently wrote an interesting article about the mathematically optimal first word to solve a Wordle quickly.
However, I wanted to know what the probability of solving it would be if I guessed at random. I mean, when I’m on my sixth attempt, the problem is often trying to think of a word that follows all the hints about letters and placement from the previous five attempts.
(results below if you want to skip code)
I’ve found that the easiest way to test this is to create a Wordle in Python, have my computer play 100,000 games, and keep track of the stats. You tracked the number of guesses before solving, and the number of words left when guessing the correct word. This is the basic logical flow:
- Get a list of valid words (12,972 words from Jalil’s article, see his GitHub).
import numpy as np
with open("wordlist_guesses.txt", 'r') as f:
all_words = f.readlines()
- Make a post to score a guess. Instead of colours, I used numbers (more computer friendly, and as I said earlier, I’m color blind. -1 means the letter is not in the word, 0 means the letter in the word but the wrong position, and 1 means the correct letter and location.
def score_word(chosen, guess):
score = [-1, -1, -1, -1, -1]
for i, w in enumerate(guess):
if w in chosen:
if w == chosen[i]:
score[i] = 1
score[i] = 0
- Make a function to remove all words based on the result. (For example, if there are no ‘e’, remove all words containing ‘e’)
def remove_words(score, guess, word_list):
to_remove = 
for i, word in enumerate(word_list):
for s, w1, w2 in zip(score, word, guess):
if s == 1:
if w1 != w2:
elif s == -1:
if w2 in word:
elif s == 0:
if w1 == w2:
elif w2 not in word:
word_list = list(set(word_list) - set(to_remove))
- Play Wordle! (starting with a randomly selected word)
count = 0
chosen_word = np.random.choice(word_list)
score = [-1, -1, -1, -1, -1]
left = None
while score != [1, 1, 1, 1, 1]:
count += 1
guess = np.random.choice(word_list)
score = score_word(chosen_word, guess)
if score == [1, 1, 1, 1, 1]:
left = len(word_list)
word_list = remove_words(score, guess, word_list)
return count, left
- After 100,000 simulations, the puzzle is solved on average every 5.21 guesses. The correct word was guessed correctly 8 times, the worst result was 18, which happened twice. This is the distribution:
- When making a correct guess, the average number of words left was 1, while the mean was 5.28. The last guess was the only choice in 54,073 out of 100,000 simulations, or 54%.
- Here is a concrete example of how quickly options are narrowed. I chose the word “house” and each guess greatly reduced the possibilities:
Selected word: house (chosen from 12,972 words)Guess Score Words left
stent [0, -1, 0, -1, -1] 941
hexes [1, 0, -1, 0, 0] 6
horse [1, 1, -1, 1, 1] 2
hoise [1, 1, -1, 1, 1] 1
house [1, 1, 1, 1, 1] (it was the only option!)
The word list includes words like “xylol” and “alaii” which makes me wonder if Wordle’s creators were really picking a word at random, or if they picked a more common word.
Let’s say they really choose from a “mental list” of 5,000 words. Since adults know somewhere around 42,000 words, the number of 5-letter words known to the general population is likely much smaller than the full list of 12,972 words. If so, the average number of guesses to solve is 4.55 while instances of the final guess as the only option rise to nearly 60%.
Like any good game designer, the makers of Wordle know to balance the probability of guessing the correct word in a way that keeps players in the “flow” between too easy and frustrating – where it’s fun…that and the great idea of sharing color-coded cubes on social media and only one version a day .
So, (probably, almost) you will always solve it. And it will always be fun.