Skip to content

Help Wanted: Word List Quality Improvements for Multiple Languages #112

@Hugo0

Description

@Hugo0

Overview

We need help improving word lists for several languages. High-quality word lists are crucial for a good Wordle experience - our analytics show that word quality is the #1 user complaint (50% of feedback).

Languages Needing Help

🔴 Priority: Large Lists (Likely Over-Inflected)

These languages have unusually large word lists that likely include too many verb conjugations and inflected forms:

Language Code Words Issue
Serbian sr 17,956 Likely includes many conjugations
Slovenian sl 11,730 Needs review
Persian fa 11,252 Needs review
Norwegian Nynorsk nn 10,522 Needs review
Greek el 10,208 Needs review
Arabic ar 10,163 May include conjugations
Korean ko 8,921 Needs review
Georgian ka 8,826 Needs review
Icelandic is 8,284 Needs review

🟡 Priority: Small Lists (Need Expansion)

These languages have too few words:

Language Code Words Issue
Vietnamese vi 738 Needs more words
Latgalian ltg 387 Very limited
Klingon tlh 269 Canonical vocabulary is limited
Kinyarwanda rw 20 Critically small

What Makes a Good Word List?

Include:

  • ✅ Common nouns
  • ✅ Common adjectives
  • ✅ Common verbs (base forms)
  • ✅ Words native speakers would reasonably guess
  • ✅ 2,000-5,000 words is ideal

Exclude:

  • ❌ Obscure verb conjugations
  • ❌ Proper nouns (names, places, brands)
  • ❌ Foreign loanwords that feel out of place
  • ❌ Vulgar/offensive words

How to Contribute

Option 1: Create a Blocklist

Add words to remove in webapp/data/languages/{lang}/{lang}_blocklist.txt:

# Comment explaining why
word1
word2

Option 2: Suggest Better Sources

Know of a good frequency list or dictionary for a language? Comment below!

Option 3: Manual Curation

Review the word list and suggest changes via PR.

Technical Notes

  • Word lists are in webapp/data/languages/{lang}/{lang}_5words.txt
  • One word per line
  • Words should be exactly 5 characters (varies by language script)
  • Run python scripts/curate_words.py apply-all-blocklists to apply blocklists

Claiming a Language

Comment below with the language you'd like to work on to avoid duplicate effort!

Language Claimed By Status
Serbian
Persian
Greek
Vietnamese
...

Thank you for helping make Wordle Global better! 🌍

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions