Wordle was purportedly created by Josh Wardle [link] as a fun word-guessing game. The rules are simple: You have six tries to guess a five-letter word. If a letter is guessed right, the letter gets colored green. If a letter is present in the word but it is not in that position, the letter gets colored yellow. Otherwise, the letter is colored gray. It's a conceptually simple game, but it has spawned a large amount of discussion over how it can be optimized.
In this article, I discuss an automaton I created, which is capable of beating Wordle efficiently and in a near-optimal manner.
The github repo at hand.
Let's go!What does information theory tell us about Wordle?
Information theory is, as the name suggests, the study of information [link] . More specifically, information theory discusses the quantification, storage, and communication of information.
Information is generally described in terms of "Shannons", or bits, which are a relatively abstract unit of measurement. Essentially, if some event has one Shannon of information, then the probability of that event is 50%.
Clearly, when a guess fails, we want to maximize the amount of information we get. A word that has a low likelihood of providing a lot of information is far less useful that a word that has a high likelihood of providing a small amount of new information. We therefore want to minimize entropy and maximize possible information.
Since most letters will end up being gray, one approximation is to approximate information by the probability that we discover a letter that will be present in the word. This is because we expect letters to be gray, so knowing that a letter is gray tells us very little. owever, only up to 5 letters can be green or yellow. That's only about 20% of all possible letters. Hence knowing that a letter is green or yellow tells us much more than does knowing that it is gray.
A probabilistic approach
Of course, we're not just trying to find information. We want to win this game.
Most simple Wordle-solving algorithms try to maximize the probability that a guess is correct. This is a good approach, especially late in the game when information is ample. However, in the early game, any given word is very unlikely to win. It is therefore much more useful to optimize for gathering information when there is little present.
To accomodate both facts, I chose to have the algorithm maximize a synthetic "utility factor". This factor represents the probability that the guess is useful - that is, that the guess is either correct or earns information. We can represent this trivially as P(win) + P(get information | not win).
Notably, P(win) and P(get information) can be treated as approximately independent. For the most part, knowing that a certain guess will not win does not significantly impact the probability that the guess earns new information. We therefore represent P(useful) as P(win) + P(not win) * P(get information).
The solution
By simply maximizing the utility of each guess, we're able to win very efficiently. Every time I've tested the tool, it won in 5 tries or fewer. It almost always succeeds in 4 tries, and frequently takes as few as three.
The github repo at hand.
Let's go!Useful takeaways
One thing to note is that seeking information tends to have diminishing returns to scale. If you know a lot, there is very little you have left to learn, and any subsequent attempt is thus unlikely to yield new data. On the other hand, if you know very little, you thus have a lot to learn. Hence it is useful to start by optimizing for information gathering, and gradually shift your focus towards producing immediate value (maximizing Y) as your information increases.