Cheating at Wordle

2022-09-06
/code /rust#wordle

Everything you really need to know

tl;dr

Your best pair of words to start with are SOARE and CLINT. Even using simplified look-up tables to pick a word to use after SOARE, you'll pick CLINT 80% of the time. If you want an unsimplified cheat-sheet, I've made one you can fit in your wallet. It assumes that your first guess is SOARE.

You can find the source code on my Github.

What is Wordle?

Wordle is a logic game where the goal is to guess a chosen 5-letter word within six attempts. In the original game, there is one word chosen per day, allowing people to compare how quickly they were able to guess the day's word.

There is a pool of 2315 words that the game might choose as a target, and an additional 10657 words that the game recognizes for a total of 12972 words. Guesses must be one one of these two lists to be accepted. That is, AAAAA will not be accepted as a guess by the game.

After each guess, the game responds by indicating which guessed letters were correct (green squares), and which letters appear in the word, but in a different location (yellow squares).

Strategies

The optimal strategy is easy to describe, although much harder for a human to execute in practice: each guess should aim to maximize the amount of information gained about the target word. Intuitively, it's clear that a guess like ARISE is much better than a guess of XYSTS: while learning that the target word contains an X would greatly reduce the number of candidate words, there is only a 37/2315 or 1.6% chance of this occurring.

For each guess, we can compute the hint the game will give us for each possible target word. We can then count how many words will give us each possible hint. Where ! represents a letter in the correct position, ? represents a letter in the wrong position, and _ represents a letter that does not appear in the target, we can contrast the set of possible results for ARISE:

_____: 168    ____?: 121    ____!:  61    ___?_:  80    ___??:  41    
___?!:  17    ___!_:  17    ___!?:   9    ___!!:  20    __?__: 107    
__?_?:  35    __?_!:  25    __??_:  21    __???:   4    __??!:   5
__?!_:   6    __!__:  51    __!_?:  15    __!_!:  23    __!?_:  29
__!??:   3    __!?!:  15    __!!_:   9    __!!?:   2    __!!!:   3
_?___:  64    _?__?: 100    _?__!:  25    _?_?_:  25    _?_??:  20
_?_?!:  10    _?_!_:   4    _?_!?:   1    _?_!!:   9    _??__:  20
_??_?:  34    _??_!:   5    _???_:   5    _????:   7    _??!_:   1
_??!!:   1    _?!__:   7    _?!_?:   4    _?!?_:   5    _?!??:   1
_?!?!:   2    _!___:  49    _!__?:  22    _!__!:  21    _!_!_:  10
_!_!?:   6    _!_!!:   1    _!?__:   7    _!!__:  22    _!!_?:   9
_!!_!:  17    _!!!_:   5    ?____: 154    ?___?:  79    ?___!:  47
?__?_:  61    ?__??:  10    ?__?!:  29    ?__!_:  25    ?__!?:   5
?__!!:  10    ?_?__:  35    ?_?_?:   3    ?_?_!:   2    ?_??_:  12
?_???:   1    ?_?!_:   1    ?_!__:   9    ?_!_!:   3    ?_!?_:   1
?_!!_:   2    ??___:  62    ??__?:  61    ??__!:   8    ??_?_:  23
??_??:   6    ??_?!:   5    ??_!_:   3    ??_!!:   1    ???__:  17
????_:   1    ??!__:   5    ??!!!:   1    ?!___:  35    ?!__?:  11
?!__!:  18    ?!_!_:   7    ?!_!!:   1    ?!?__:   9    ?!?_!:   1
?!!__:   4    !____:  29    !___?:  10    !___!:  20    !__?_:   2
!__??:   3    !__!_:   4    !__!!:   3    !_?__:  13    !_??!:   1
!_!__:   9    !_!_?:   1    !_!_!:   5    !_!?!:   1    !_!!_:   1
!?___:  14    !?__?:   6    !?__!:   4    !??__:   1    !??_?:   1
!?!_!:   1    !!___:   6    !!__?:   1    !!__!:   1    !!_?_:   1
!!_!_:   1    !!_!!:   1    !!!!!:   1

with those for XYSTS:

_____: 939    ___?_: 332    ___!_:  68    __?__: 282    __?_?:  10    
__?_!:  30    __??_: 138    __???:   1    __??!:   4    __?!_:  13
__!__:  23    __!_?:   5    __!_!:   1    __!?_:   7    __!??:   1    
__!!_:  10    _?___: 249    _?_?_:  22    _?_!_:  33    _??__:  40
_??_!:   1    _???_:  11    _??!_:   2    _?!__:  12    _?!_?:  10
_?!!_:  11    _!___:  19    _!_?_:   1    _!?__:   3    ?____:  22
?__?_:  10    ?_??_:   1    ?_?!_:   1    ??___:   2    ???!_:   1

The two obvious differences are that ARISE has many more possible outcomes, and that while the most common outcome for ARISE is 168 matches (_____), the most common outcome for XYSTS has 939 matches. In fact, there are three additional outcomes that have more than 168 matches! If we guess XYSTS, there's a 40.56% chance that we'll only be able to eliminate 59.43% of our possible targets. By contrast, there's a 100% chance that we'll be able to eliminate at least 92.74% of possible words by guessing ARISE.

A quick segue on information theory

We can compute the "surprise" of a given outcome with probability $p$ by computing $-\log_2(p)$. Using the negative logarithm here has some nice properties: the surprise of a certain event is $-\log_2(1) = 0$; the surprise of an impossible event is $\infty$. Given two independent events $a$ and $b$, their combined surprise is $-log(a \cdot b) = -log(a) + -\log(b)$.

We can talk about the entropy of a random variable $\mathcal{X}$ by computing its expected surprise: $H(X\mathcal{X}) = -\sum_{x \in \mathcal{X}} p(x) \log_2(p(x))$. The entropy (expected information) from a fair coin flip would be $-\left(\frac{1}{2} \log_2 \frac{1}{2} + \frac{1}{2} \log_2\frac{1}{2} \right) = -(\frac{1}{2}\cdot -1 + \frac{1}{2} \cdot -1) = 1$. By contrast, consider a coin that comes up heads $\frac{3}{4}$ of the time: the entropy of that system is $-\left( \frac{1}{4}\log_2 \frac{1}{4} + \frac{3}{4}\log_2 \frac{3}{4}\right) = -(\frac{1}{4} \cdot -2 + \frac{3}{4} -0.415) = 0.81125$. What does this mean? It means that given a perfect compression algorithm, if you wanted to encode the outcome of 1000 fair coin flips, it will take (on average) 1000 bits, but if you wanted to encode the outcome of 1000 flips of the weighted coin, it would only take (on average) 811.25 bits.

Entropy is maximized when all outcomes are equally probable.

Back to Wordle

We can use a similar approach to judge guesses: we wish to choose the word that maximizes the expected information gained about our target word. It takes around 11.177 bits to specify which of the 2315 words on the list today's target word is. Our theoretical best word would have each outcome equally probable. Since there are 243 possible outcomes, each outcome would have $2315 / 243 = 9.526$ associated words. $-\log_2 9.526 = 7.295$ bits. Unfortunately, we do not have a perfect word like that: we need to pick one from our word lists.

Computing the entropy of word is straight-forward: $-\sum_{x \in O} \frac{x}{2315} \log_2 \frac{x}{2315}$ where $O$ is the set of outcomes. The entropy for ARISE is 5.82 bits, while XYSTS is just 2.93. Looking at our entire word list, our best options are

soare (5.885960110378853)
roate (5.882779324291968)
raise (5.877909690821479)
raile (5.865709709951878)
reast (5.865457142861989)
slate (5.855775376955964)
crate (5.834874004263517)
salet (5.834581525865644)
irate (5.831396980440786)
trace (5.83054871385944)
arise (5.820939700886002)
orate (5.817161175806047)
stare (5.807280035160927)
carte (5.794557247295689)
raine (5.786709827456228)
caret (5.776713196624415)
ariel (5.775166510376183)
taler (5.770612389804919)
carle (5.770479071966417)
slane (5.770181104822014)

and our worst are

jazzy (2.3090773855798954)
fuzzy (2.305734237551105)
hyphy (2.2876424232410244)
mamma (2.268546116937502)
susus (2.2585120097284097)
gyppy (2.24670809344563)
kibbi (2.2404886429069535)
kudzu (2.237366851910215)
fuffy (2.2308746901052157)
jeeze (2.22198569876406)
pzazz (2.2096828801746575)
zoppo (2.1711585877791446)
yukky (2.1650967455771433)
jaffa (2.0942089868043468)
xylyl (2.0836690728843283)
jujus (2.027654407041208)
immix (1.982649829302991)
qajaq (1.7693053339048657)

If we wish to cheat entirely, we could filter our word lists based on the hint returned by the game, and repeat this process until the word is identified. Failing that, one could memorize the best follow-up word for each of the 127 possible outcomes to SOARE. The former feels entirely contrary to the spirit of the game, while the latter is far more effort than most people would choose to excert. Instead, let's consider some simpler options.

Best pair

Rather than memorizing a single best word, we can almost as easily memorize a pair of starting words that, together, will maximize expected information. This isn't just a matter of just picking the top two words from the list above, since the information that they provide will be corrolated. It could even be that some pair of words will outperform SOARE and any other second choice of word!

A naïve approach would be to consider each of the $\frac{12792 (12792 + 1)}{2} = 84,142,878$ possible pairs of words, and for each pair, compute their combined entropy using a similar process to above, but now with $243^2 = 59049$ possible outcomes.

best_pair = None
best_entropy = 0
for i, word_a in words.enumerate():
    for word_b in words[i+1:]:
        entropy = h(outcomes(word_a), outcomes(word_b))
        if entropy > best_entropy:
            best_entropy = entropy
            best_pair = (word_a, word_b)
return best_pair

This approach works! It's also inefficient!

One trick we can use is to take advantage of the fact that joint entropy follows the triangle inequality: $H(A, B) \le H(A) + H(B)$. Since we know the entropy of each word on its own, we can compare the sum of the entropy of our two words with our best candidates so far, and skip the pair if it's less. If we order our words by descending entropy, we have a good chance of establishing a nice high floor early on, and we can also skip large chunks of the word list: if h(word_a) + h(word_b) < best_entropy, all other word_bs in the list will be worse; and if h(word_a) * 2 < best_entropy, we can stop searching entirely, as h(word_b) <= h(word_a).

best_pair = None
best_entropy = 0
for i, (word_a, h_a) in words_with_entropy.enumerate():
    if h_a * 2 < best_entropy:
        break
    for (word_b, h_b) in words_with_entropy[i+1:]:
        if h_a + h_b < best_entropy:
            break
        entropy = h(outcomes(word_a), outcomes(word_b))
        if entropy > best_entropy:
            best_entropy = entropy
            best_pair = (word_a, word_b)
return best_pair

Our best pair of words are SOARE CLINT (9.632 bits), but our second best are ROAST CLINE (9.614 bits). The top 20 pairs are:

soare clint (9.632905965015306)
roast cline (9.614523773473017)
riant socle (9.613840808046314)
riant close (9.608770632076666)
crine loast (9.600137646174272)
saint ceorl (9.599224792545456)
trail sonce (9.592853771195916)
trice salon (9.584356906898694)
lance roist (9.584181179504565)
liart sonce (9.583972722337034)
sarin clote (9.583858296022964)
liane crost (9.58377798387247)
slice toran (9.582054286210674)
toile carns (9.574356593322323)
clart noise (9.57363017794734)
train socle (9.571990832784907)
crane toils (9.568852406814505)
trine coals (9.568852406814505)
toile crans (9.568852406814505)
carse doilt (9.566036004901079)

Unsurprisingly, the best word is also one of the best pairs of words, but despite that, it only appears in one of the pairs: it doesn't appear again until the 111st best pair.

Our 20 worst pairs are

jujus susus (2.351484613734753)
fuffy fuzzy (2.3620325345888133)
qajaq jaffa (2.4310605536723955)
jujus jukus (2.4692401329040248)
bubby buzzy (2.4790667183370108)
yummy mummy (2.4801981388321543)
zocco cocco (2.4822094945844793)
kukus jukus (2.4995449697145027)
jujus kukus (2.5025240612705453)
kukus kuzus (2.529102496555728)
yuppy puppy (2.5625704894567303)
jujus fuffs (2.583375090159306)
jujus kuzus (2.5961280846763035)
muzzy mummy (2.602530127147494)
yobby bobby (2.6097028581448374)
yummy muzzy (2.612361584488129)
jukus kuzus (2.616692884807879)
zanza zanja (2.6453169267612076)
qajaq jazzy (2.6500644008660785)
jiffy fizzy (2.650663180466939)

Unsurprisingly, the words in the worst pairs tend to be very similar to one another.

Can we do better?

Seven words

What if, instead of memorizing a one-size-fits-all pair of words, we could count the number of hits (yellow or green squares) and use that to pick a second word?

The first step is to filter the list of answer words, creating 6 new lists, one for each number of possible hits. That is, if we saw 2 hits against SOARE, the list of answers would include HEAVY, but not HOLLY. Note that we're just filtering the answers, but leaving the list of guessable words unchanged.

For each filtered answer list, we the consider the joint entropy of our first word (SOARE), alongside every other word. Because we've reduced our set of answsers, the joint entropy will be lower: this is additional information on top of the fact that we got, e.g. 2 hits. The set of possible outcomes for our first word has decreased. We then pick the word with the highest joint entropy.

Using this against SOARE, we get

0 hits: clint (5.508503429732798 bits; 183 answers remain)
1 hits: clint (7.8278920413862725 bits; 686 answers remain)
2 hits: clint (8.454302787632313 bits; 970 answers remain)
3 hits: clipt (7.295997807476919 bits; 438 answers remain)
4 hits: punch (4.918780730435343 bits; 37 answers remain)
5 hits: arose (0 bits; 1 answer remains)

Curiously, it seems that we only need to memorize five words, not seven!

I don't know if SOARE is still the best word to lead with in this case, but I suspect that it is.

(Up to) twenty-two words

What if we count green and yellow squares separately? We could have 21 words to memorize, in addition to SOARE! We can separate out our answers list just like the seven-words case above, but based on our new criteria. Things are otherwise identical.

Here are the results for SOARE:

0Y, 0G: clint (5.508503429732798 bits, 183 answers remain)
0Y, 1G: clint (6.628967920208901 bits, 282 answers remain)
0Y, 2G: clint (6.117460038647271 bits, 145 answers remain)
0Y, 3G: thilk (4.552197126358234 bits, 41 answers remain)
0Y, 4G: notch (2.321928094887362 bits, 5 answers remain)
1Y, 0G: clint (7.005691776499278 bits, 404 answers remain)
1Y, 1G: clint (7.3817745808835875 bits, 405 answers remain)
1Y, 2G: mulct (5.613291828937477 bits, 98 answers remain)
1Y, 3G: notch (2.2516291673878226 bits, 6 answers remain)
2Y, 0G: clint (6.9260182124975564 bits, 420 answers remain)
2Y, 1G: clipt (6.127816972324382 bits, 159 answers remain)
2Y, 2G: wheen (3.321928094887362 bits, 10 answers remain)
3Y, 0G: talar (5.3373043296318965 bits, 140 answers remain)
3Y, 1G: hanap (3.6644977792004614 bits, 14 answers remain)
4Y, 0G: jaspe (1 bits, 2 answers remain)
4Y, 1G: arose (only answer left)

Impossible outcomes are omitted, leaving CLINT remaining the best second word just shy of 80% of the time (79.44%)!

I will be honest that these were not the results I was expecting, although perhaps I should have been: our criteria for CLINT when searching for a single best second word would choose one that would provide the most information in the greatest number of circumstances. Certainly, being the best 80% of the time would be hard to beat.

Pretty much cheating

What if gave up on trying to memorize things, and ccompute a best follow-up word based on every possible response to SOARE. This isn't really practical (maybe you could keep it on a laminated card in your wallet?, but I'm curious to see how well CLINT will continue to perform.

_____: clint (5.509 bits, 183 answers remain)    ____?: denet (5.161 bits, 120 answers remain)
____!: guilt (5.068 bits, 79 answers remain)     ___?_: glint (4.656 bits, 57 answers remain)
___??: tined (4.387 bits, 117 answers remain)    ___?!: pudic (4.006 bits, 39 answers remain)
___!_: tichy (3.700 bits, 13 answers remain)     ___!?: feted (3.522 bits, 14 answers remain)
___!!: tawie (2.000 bits, 4 answers remain)      __?__: clint (5.171 bits, 138 answers remain)
__?_?: natal (4.537 bits, 68 answers remain)     __?_!: gault (4.012 bits, 40 answers remain)
__??_: riyal (4.223 bits, 42 answers remain)     __???: talar (3.694 bits, 61 answers remain)
__??!: larch (2.807 bits, 7 answers remain)      __?!_: humic (3.096 bits, 11 answers remain)
__?!?: taels (2.322 bits, 5 answers remain)      __?!!: wings (1.000 bits, 2 answers remain)
__!__: clink (4.596 bits, 40 answers remain)     __!_?: depth (3.749 bits, 21 answers remain)
__!_!: glitz (3.182 bits, 28 answers remain)     __!?_: clint (4.275 bits, 40 answers remain)
__!??: cadet (2.000 bits, 4 answers remain)      __!?!: ditch (2.971 bits, 19 answers remain)
__!!_: dhuti (3.547 bits, 13 answers remain)     __!!?: lathy (2.948 bits, 9 answers remain)
__!!!: flabs (2.000 bits, 4 answers remain)      _?___: clint (4.888 bits, 56 answers remain)
_?__?: lento (4.004 bits, 25 answers remain)     _?__!: clink (3.784 bits, 20 answers remain)
_?_?_: cutin (4.060 bits, 59 answers remain)     _?_??: trued (3.239 bits, 13 answers remain)
_?_?!: pwned (2.807 bits, 14 answers remain)     _?_!_: fifty (3.000 bits, 8 answers remain)
_?_!?: dsomo (1.585 bits, 3 answers remain)      _?_!!: gybed (1.000 bits, 2 answers remain)
_??__: cloot (4.406 bits, 44 answers remain)     _??_?: boxen (2.000 bits, 4 answers remain)
_??_!: bundt (3.000 bits, 8 answers remain)      _???_: maron (3.658 bits, 27 answers remain)
_??!_: blond (2.585 bits, 6 answers remain)      _??!?: opera (only answer left)
_??!!: adore (only answer left)                  _?!__: piano (only answer left)
_?!_!: ovate (only answer left)                  _?!?_: bravo (only answer left)
_?!!_: ovary (only answer left)                  _!___: culty (4.448 bits, 87 answers remain)
_!__?: meynt (3.533 bits, 26 answers remain)     _!__!: guild (3.322 bits, 10 answers remain)
_!_?_: cyton (3.840 bits, 26 answers remain)     _!_??: rewth (2.492 bits, 22 answers remain)
_!_?!: fungi (2.948 bits, 9 answers remain)      _!_!_: adult (2.585 bits, 6 answers remain)
_!?__: liman (3.690 bits, 21 answers remain)     _!??_: malty (3.000 bits, 8 answers remain)
_!?!_: cobra (only answer left)                  _!!__: lousy (2.322 bits, 5 answers remain)
_!!?_: roach (only answer left)                  _!!!_: belay (1.000 bits, 2 answers remain)
?____: hists (4.271 bits, 33 answers remain)     ?___?: teugh (3.787 bits, 19 answers remain)
?___!: cunit (3.000 bits, 8 answers remain)      ?__?_: cruft (3.750 bits, 16 answers remain)
?__??: richt (3.374 bits, 15 answers remain)     ?__?!: puton (2.807 bits, 7 answers remain)
?__!_: usurp (only answer left)                  ?_?__: linty (3.936 bits, 23 answers remain)
?_?_?: nates (2.322 bits, 5 answers remain)      ?_?_!: thump (3.140 bits, 15 answers remain)
?_??_: hoise (2.000 bits, 4 answers remain)      ?_??!: tarok (1.585 bits, 3 answers remain)
?_!__: chugs (3.325 bits, 14 answers remain)     ?_!_?: flyte (2.322 bits, 5 answers remain)
?_!_!: teuch (2.807 bits, 7 answers remain)      ?_!?_: chibs (2.807 bits, 7 answers remain)
?_!?!: erase (only answer left)                  ??___: fiscs (2.807 bits, 7 answers remain)
??__?: jaspe (1.585 bits, 3 answers remain)      ??__!: coths (2.322 bits, 5 answers remain)
??_?_: fidge (2.322 bits, 5 answers remain)      ??_??: verso (only answer left)
??_?!: prose (only answer left)                  ???__: genic (1.585 bits, 3 answers remain)
????_: arson (only answer left)                  ????!: arose (only answer left)
??!__: chaos (only answer left)                  ?!___: jumby (3.457 bits, 15 answers remain)
?!__?: wings (1.000 bits, 2 answers remain)      ?!__!: pilum (3.278 bits, 11 answers remain)
?!_?_: terns (2.000 bits, 4 answers remain)      ?!_??: ephor (1.000 bits, 2 answers remain)
?!_?!: hussy (1.585 bits, 3 answers remain)      ?!!__: tubby (1.585 bits, 3 answers remain)
?!!?_: roast (only answer left)                  !____: thilk (4.894 bits, 63 answers remain)
!___?: clipt (4.272 bits, 35 answers remain)     !___!: pling (3.816 bits, 23 answers remain)
!__?_: butch (3.170 bits, 9 answers remain)      !__??: newie (3.239 bits, 13 answers remain)
!__?!: cuppa (2.000 bits, 4 answers remain)      !__!_: thump (3.000 bits, 8 answers remain)
!__!?: jaspe (1.000 bits, 2 answers remain)      !__!!: chiro (1.000 bits, 2 answers remain)
!_?__: dault (3.892 bits, 19 answers remain)     !_?_?: dempt (2.948 bits, 9 answers remain)
!_?_!: cirls (1.585 bits, 3 answers remain)      !_??_: porty (3.000 bits, 8 answers remain)
!_???: mawns (2.252 bits, 6 answers remain)      !_!__: thilk (4.364 bits, 41 answers remain)
!_!_!: thilk (3.221 bits, 19 answers remain)     !_!?_: stair (only answer left)
!_!!_: ketch (2.732 bits, 11 answers remain)     !_!!!: notch (2.322 bits, 5 answers remain)
!?___: cloot (3.901 bits, 37 answers remain)     !?__!: knelt (2.918 bits, 12 answers remain)
!?_?_: scour (only answer left)                  !?_!_: nicht (2.846 bits, 10 answers remain)
!?_!!: notch (2.252 bits, 6 answers remain)      !??__: vampy (2.000 bits, 4 answers remain)
!???_: savor (only answer left)                  !!___: dusty (2.807 bits, 7 answers remain)
!!__!: solve (only answer left)                  !!_??: baits (1.000 bits, 2 answers remain)
!!_!_: sorry (only answer left)                  !!??_: wings (1.000 bits, 2 answers remain)
!!!__: soapy (only answer left)

CLINT was best 417 times (18.01%)

There we have it! CLINT is only best 18% of the time if we're willing to just cheat horribly!

Getting into the weeds

Single-word optimizations

The code to pick single words, or second words conditioned on a first word, all run pretty quickly even without heavy optimizations. To speed up computing the result of guessing every possible word against every possible answer, I built a look-up table: 5×26×N, where N is the number of answers. For my initial approach, every table cell held either 0 (a miss), 1 (yellow) or 2 (green). To build this, I computed a bitmap for each answer, recording which letters occurred in it, e.g. given the word, ARISE, the bitmap would be

  ZYXWVUTSRQPONMLKJIHGFEDCBA
0b00000001100000000100010001

Building the table looked like

type AnswersTable = [[Vec<i8>; 26]; 5];

fn build_answers_table(answers: &[AugmentedAnswer<'_>]) -> AnswersTable {
    let mut rv = [(); 5].map(|_| [(); 26].map(|_| Vec::with_capacity(answers.len())));
    for (pos, word_pos) in rv.iter_mut().enumerate() {
        for (letter, column) in word_pos.iter_mut().enumerate().take(26) {
            let ch = letter as u8 + b'a';
            for answer in answers {
                let answer_ch = answer.word.as_bytes()[pos];
                let result = if answer_ch == ch {
                    2
                } else if answer.bitmask & (1 << letter) > 0 {
                    1
                } else {
                    0
                };
                column.push(result);
            }
        }
    }
    rv
}

Given this table and a word, we could quickly compute the outcomes, encoded as a base-3 value between 0 and 242 inclusive:

fn outcomes_from_answers_table(guess: &str, answers_table: &AnswersTable) -> Vec<u8> {
    let mut counts = [0; 26];
    let mut outcomes = vec![0; answers_table[0][0].len()];
    for (b, answers_table) in guess.chars().zip(answers_table) {
        outcomes.iter_mut().for_each(|o| *o *= 3);

        counts[b.idx()] += 1;
        for (outcome, blah) in outcomes.iter_mut().zip(&answers_table[b.idx()]) {
            if *blah == -1 {
                *outcome += 2;
            } else if *blah >= counts[b.idx()] {
                *outcome += 1;
            }
        }
    }

    outcomes
}

Then it was just a matter of building a histogram and computing the resulting info.

Unfortunately, these outcomes disagree with Wordle! Given the a guess of DOGGY and a target of GOOSE, the response will be _!?__, rather than _!??_ returned by the algorithm above: if a letter appears in the guess more times than in the answer, the excess occurrences are counted as misses!

The new approach was to track the occurrence count of each letter instead of a bitmask. When building the table, I used -1 to represent a direct hit, and otherwise store the occurrence count of the letter (which could be 0!)

fn build_answers_table(answers: &[AugmentedAnswer<'_>]) -> AnswersTable {
    let mut rv = [(); 5].map(|_| [(); 26].map(|_| Vec::with_capacity(answers.len())));
    for (pos, word_pos) in rv.iter_mut().enumerate() {
        for (letter, column) in word_pos.iter_mut().enumerate().take(26) {
            let ch = letter as u8 + b'a';
            for answer in answers {
                let answer_ch = answer.word.as_bytes()[pos];
                let result = if answer_ch == ch {
                    -1
                } else {
                    answer.counts[letter] as i8
                };
                column.push(result);
            }
        }
    }
    rv
}

Then when computing the outcomes list:

fn uncompressed_outcomes(guess: &str, answers_table: &AnswersTable) -> Vec<u8> {
    let mut counts = [0; 26];
    let mut outcomes = vec![0; answers_table[0][0].len()];
    for (b, answers_table) in guess.chars().zip(answers_table) {
        outcomes.iter_mut().for_each(|o| *o *= 3);

        counts[b.idx()] += 1;
        for (outcome, answer_count) in outcomes.iter_mut().zip(&answers_table[b.idx()]) {
            if *answer_count == -1 {
                *outcome += 2;
            } else if *answer_count >= counts[b.idx()] {
                *outcome += 1;
            }
        }
    }

    outcomes
}

Is building the AnswersTable worth it?

After writing this article, I was wondering the same thing, but immediately noticed that things seemed slower. Adding a few timing statements, generating outcomes directly from the AugmentedAnswers was 50% slower than building the AnswersTable first. With the AnswersTable, it accesses memory very linearly, and the values it's accessing are one after another. This would play much more nicely with caches then working directly with AugmentedAnswers.

Word-pair optimizations

When it came time to evaluate pairs of words, I needed to employ a few more tricks. I mention one of them earlier: using the triangle inequality to skip candidate pairs. This helped significantly, but I felt I could do better. My main bottlenecks were in the bucket_two and info functions, which would compute the outcome histogram and compute the histogram's entropy respectively. I was originally using a hashmap to store the histogram. I found that replacing it with an array actually decreased performance, even though we were no longer hashing keys. This made me realize that if I wanted to make things faster, I would need to improve the cache locality of the data. Unfortunately, 243 * 243 u16s require 118,098 bytes, which was larger than my L1 cache.

To work around this, I realized that I could compress the outcome values: if there are only 127 distinct outcomes for a guess, I would relabel them 0..127 and record 127 as the max value. Given outcomes a and b, I could compute a joint outcome a[i] * max_a + b[i]. In practice, the range of joint outcomes was small enough that I could efficiently store them in an array which would (usually) fit in my L1 cache. This sped up computations significantly.

I tried several tricks to speed up info: I built a look-up table (it didn't help). I reordered operations so that we only had to perform one division, instead of one per bucket. I tried to write things in a way that I hoped would be more auto-vectorizable. My big speed-up came from using a faster but less accurate log function. I would use the fast_info function first, and only if the results came within some margin of beating the current best candidate would I use the regular info function to more accurately compute its entropy.