Applying Wordle-Solving Algorithms to a Finnish Morphophonemic Space
October 23, 2025
Abstract
Sanuli is a Finnish adaptation of Wordle: the task is to identify a hidden five-letter lemma using up to six guesses with positional feedback. Unlike English Wordle, Sanuli draws from Finnish lexicons and includes diacritics (Ä, Ö), which change both letter distributions and phonotactic constraints. This paper formalizes optimal play for Sanuli as a sequential decision problem; compares objective functions used by state-of-the-art solvers (entropy maximization, minimax, expected-remaining); and incorporates Finnish-specific structure such as vowel harmony, gemination, and restricted onset clusters to tighten the search. Using public documentation that Sanuli relies on open Kotus resources, and community analyses that extracted the in-game list and computed first-move entropies, we synthesize a practical, near-optimal strategy. In particular, high-entropy openers such as KASTI, KILTA, KILSA, KARSI, and SILTA emerge, then a shift to a minimax policy once the candidate set is small. We also argue that explicitly testing for duplicate phonemes earlier than in English Wordle is beneficial in Finnish due to productive length contrasts.
1. Introduction
The Wordle family of games is well modeled as sequential information gathering with deterministic feedback constraints; exact optimization is tractable at moderate scale and has been demonstrated for the English instance. Sanuli inherits the rules but operates over Finnish lemmata and orthography (including Ä, Ö), altering priors and legal string shapes. We treat "optimal" as minimizing either the expected number of guesses or the worst-case depth under the standard six-guess budget.
Sanuli's word sources are grounded in open materials from the Institute for the Languages of Finland (Kotus). Public statements and documentation point to Nykysuomen sanalista as a backbone resource, which means solver dictionaries should prefer Kotus-derived lists over ad-hoc corpus scraps.
2. Prior work: solvers and complexity
Exact optimization. For English Wordle, Bertsimas & Paskov gave an exact dynamic-programming solution over the official word lists, proving a globally optimal policy and identifying SALET as the best opener under their objective (≤5 guesses worst-case; mean ≈3.421) [1][2].
Information-theoretic heuristics. A widely used approach scores guesses by expected Shannon information of the feedback partition (entropy) [3].
State space and hardness. General Wordle optimization is NP-hard or NP-complete in the worst case, so exact global optimization is infeasible without structural leverage; this motivates heuristics that exploit language-specific constraints [2].
Community result relevant to Sanuli. A public analysis that extracted Sanuli's internal list from the WASM bundle computed first-move entropies and reported top openers in Finnish (KASTI, KILTA, KILSA, KARSI, SILTA) and an approximate candidate-set size (~3.3k) [4].
3. Objectives and algorithms
We compare three objectives used by Sanuli Solver+ and the literature:
- Entropy maximization. Choose guess g to maximize expected information of the feedback partition 𝒫(g):H(g) = −∑r ∈ 𝒫(g) p(r) log2 p(r)High-entropy guesses "flatten" bins and halve the hypothesis space quickly.
- Minimax (worst-case pruning). Choose g to minimize the largest post-feedback bucket size maxr |Cr|. This gives strong guarantees in late game when counts are already small.
- Expected remaining. Choose g to minimize ∑r p(r) |Cr|. This correlates with mean depth and is cheaper to compute than full dynamic programming.
For English, dynamic programming can certify an exact policy; for Sanuli, the same principle applies provided we adopt the correct lexicon and scoring rules. In practice, a hybrid policy — entropy first, then minimax — performs near-optimally across languages [1].
4. Finnish structure that changes the game
4.1 Vowel harmony and neutral vowels.
In Finnish, non-compound roots generally avoid mixing back vowels { a, o, u } with front vowels { ä, ö, y }; e and i are neutral [5][6]. Harmony constraints sharply reduce plausible co-occurrences and thus shape higher-quality partitions from guesses that quickly identify the harmony set.
4.2 Gemination and long vowels.
Finnish phonology contrasts short vs. long vowels and consonants, written with doubled graphemes; these are frequent and lexically meaningful [7]. This raises the prior probability that the hidden word contains duplicate letters, making it rational to test for duplication earlier than in English.
4.3 Consonant clusters and syllable templates.
Native Finnish strongly prefers simple onsets; multi-consonant onsets are mainly loan-word phenomena. The dominant syllable template is (C)V(C)(C), with tighter restrictions than English [1].
4.4 Alphabet and letter frequencies.
Sanuli keyboards include Ä and Ö; letter frequency profiles for Finnish differ from English: vowels dominate and the ranking of common consonants shifts, affecting both opener choice and tie-breaks.
Implication. Early identification of (i) the harmony class and (ii) any duplication often shrinks the search faster in Finnish than in English, because many legal English co-occurrences are phonotactically implausible in Finnish.
5. Lexicons and priors for Sanuli
Kotus's Nykysuomen sanalista enumerates >100k lemmata with inflectional metadata and is the canonical open resource underlying many Finnish tools; Sanuli draws on Kotus materials and uses a curated daily-answer subset. For solver design: use a Kotus-based candidate list for answers and allow a broader guess list for probes if permitted by the UI.
Community reverse-engineering of Sanuli's deployed bundle reports a working five-letter list on the order of ~3.3k items and provides measured first-move entropies for all candidates.
Note: If your solver weights answers by non-uniform priors (e.g., daily selection biases) then expected-remaining or entropy with priors can outperform uniform policies.
6. Opening move design for Finnish
Empirical result. The top-scoring Sanuli openers by measured entropy are: KASTI, KILTA, KILSA, KARSI, SILTA. These cover high-frequency vowels (A, I) and productive consonants (K, S, T, L, R), and they rapidly signal the harmony set.
Why they work. Finnish monogram statistics rank A, I, T, E among the most common graphemes; consonants like N, S, L, K are also frequent. Position-agnostic priors favour guesses that combine these letters while straddling potential vowel-harmony classes (e.g., A vs. Ä only if diacritics are permitted in guesses and the list supports them).
Speculative refinement. Because Ä/Ö words are common in Finnish but occupy distinct harmony classes, one could use a second diagnostic guess containing a front-vowel set (Ä/Ö/Y) if the opener's evidence tilts toward back vowels failing. This often collapses the search faster than distributing more consonants, due to harmony-based eliminations.
7. Mid-game policy: from entropy to minimax
Phase A (entropy-first). While |C| is large (>100), maximize expected information; a good probe evenly splits candidates across feedback patterns.
Phase B (duplication test). When feedback suggests a full vowel frame or a stable skeleton but progress stalls, inject a duplicate-letter probe (e.g., testing AA, EE, LL, KK) because Finnish frequently encodes length with doubling; this avoids wasting turns chasing non-duplicated hypotheses.
Phase C (minimax). Once |C| ≲ 20, switch to minimax: pick the guess that minimizes the worst remaining bucket. This shrinks the maximum depth and improves the guarantee of solving inside six guesses. If rules allow, a non-answer probe that yields a finer partition can be correct even when it cannot be the solution.
Phase D (exact resolution). With |C| ≤ 4, resolve with dictionary-consistent hard-mode play (always reuse confirmed letters and positions). Dynamic programming over the tiny residual is trivial and mirrors English exact-solve behavior.
8. Handling duplicates and positional traps
Finnish makes length phonemic: pairs like taka / takka / taakka are distinct. Consequently, a common failure mode is assuming all letters are unique; in Sanuli, that assumption is weaker than in English. Heuristically, if you have strong evidence for four positions by guess 3–4 but several candidates remain, schedule a duplicate probe next rather than chasing a rare consonant. This leverages language structure, not only corpus counts.
9. Putting it together: a Sanuli policy
Policy sketch.
- Opener. Play KASTI (or KILTA/KILSA) to maximize entropy over the Finnish list and read the harmony signal.
- Harmony confirm. If no front vowels are indicated and only back vowels remain plausible, keep { A, O, U }; otherwise pivot to a front-vowel probe containing Ä/Ö/Y if the UI and list allow them. Neutral E/I remain active in both cases.
- Entropy until small. Continue entropy-maximizing probes while |C| is large, favouring high-frequency Finnish letters and avoiding unlikely onsets.
- Duplicate test. If progress stalls or patterns suggest long phonemes, schedule an explicit double-letter test.
- Switch to minimax. At |C| ≲ 20, minimise the worst-case bucket. Use a non-answer partitioning guess if it reduces the maximum branch below the alternative.
- Finish with consistency. Apply hard-mode consistency to force convergence within six. English results show exact resolution is straightforward once the frontier is small.
This entropy→minimax hybrid aligns with formal treatments and with what Sanuli Solver+ exposes to players, while explicitly employing Finnish phonology to choose better probes than English-centric heuristics.
10. Differences from English Wordle that matter
- Harmony partitions the space early; testing front vs. back sets can eliminate large swaths of candidates with a single guess.
- Duplicates are common and must be tested sooner. English heuristics that delay duplication checks underperform on Finnish.
- Onset clusters are rarer natively; probes that exploit this (e.g., avoiding improbable initial CC sequences unless the list is rich in loans) waste fewer turns.
- Diacritics (Ä, Ö) occur with meaningful frequency and should appear in diagnostic probes when feedback suggests the front set.
11. Practical notes on lists and tooling
- Prefer Kotus-based lists for answer candidates; they align with Sanuli's stated sources and reflect Finnish morphology better than arbitrary web scrapes.
- Community repos exist that (i) parse Sanuli's deployed list and (ii) implement generic solvers using Kotus dictionaries; these are useful for reproducing the entropy/minimax rankings.
- If you weight answers by non-uniform priors (e.g., curated "daily" subset), expected-remaining or entropy with priors can outperform uniform policies, as shown for English variants.
12. Limitations and open questions
- The exact Sanuli answer list and sampling policy evolve; daily curation implies non-uniform priors that are hard to infer externally. This affects the strict notion of "optimal."
- Position-specific Finnish letter distributions for five-letter lemmas (not running text) are under-documented publicly; using running-text frequencies is an approximation.
- Formal certification of optimality for Sanuli (as achieved for English) likely requires freezing the exact answer/guess sets and reproducing the DP solve. Computationally feasible but dataset-dependent.
13. Conclusion
Optimal Sanuli play combines information-theoretic partitioning with Finnish-specific phonotactics. Start with a measured high-entropy opener (e.g., KASTI), detect the harmony set early, probe duplication sooner than you would in English, and switch to minimax once the frontier is small. This yields a strategy that is close to optimal in expectation and tight in the worst case, grounded in both solver theory and the structural facts of Finnish.
References
- Bertsimas, D. & Paskov, A. An Exact and Interpretable Solution to Wordle. MIT Sloan, 2022.
https://www.dbertsim.mit.edu/papers/an-exact-and-interpretable-solution-to-wordle - Rosenbaum, W. Finding a Winning Strategy for Wordle is NP-complete. arXiv preprint arXiv:2204.04104, 2022.
https://arxiv.org/abs/2204.04104 - Bhambri, S., Bhattacharjee, A., & Bertsekas, D. Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach. arXiv preprint arXiv:2211.10298, 2022.
https://arxiv.org/abs/2211.10298 - NipaGames. Sanuli Moguli – Sanuli Solver. GitHub repository.
https://github.com/NipaGames/sanulimoguli - Neill, J. How to Beat Your Friends at Wordle Using Maths. Lancaster University, 2023.
https://www.lancaster.ac.uk/study/undergraduate/news/how-to-beat-your-friends-at-wordle-using-maths - Ringen, C. & Heinämäki, O. Variation in Finnish Vowel Harmony: An OT Account. Nordic Journal of Linguistics, 1999.
https://doi.org/10.1017/S0332586500001739 - Sulkala, H. & Karjalainen, M. Finnish Sound Structure: Phonetics, Phonology, Phonotactics and Prosody. Oulu University Press, 2008.
https://oulurepo.oulu.fi/bitstream/handle/10024/36099/isbn978-951-42-8984-2.pdf - Uusi Kielemme. Vowel Harmony – Finnish Grammar.
https://uusikielemme.fi/finnish-grammar/vowel-harmony-vokaaliharmonia-finnish-grammar