Skip to content

feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) #6488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
a3ba897
fix: typo fixed underscare -> underscore
m4dd0c Dec 19, 2024
062fe38
Merge branch 'master' of https://github.com/monkeytypegame/monkeytype
m4dd0c Apr 19, 2025
854e546
feat: arabic funbox added
m4dd0c Apr 21, 2025
aec0659
chore: temporarily implementation of getArabic function for arabic fu…
m4dd0c Apr 21, 2025
0a9fe57
misc!: arabic funbox name added as types,
m4dd0c Apr 21, 2025
21f8075
chore: temporarily implementation of getArabic function for arabic fu…
m4dd0c Apr 21, 2025
c37ac7e
chore: Revised Arabic FunboxMetadata
m4dd0c Apr 23, 2025
22f9f6e
feat!: GetText.getArabic fn added with gibberish arabic implementation
m4dd0c Apr 23, 2025
471e3b8
chore: Forcing Arabic language, If the Arabic funbox is activated
m4dd0c Apr 23, 2025
014fbf3
feat: Arabic funbox added
m4dd0c Apr 23, 2025
04ef924
chore: Funbox added Russian
m4dd0c Apr 23, 2025
971c166
feat: Russian (cyrillic) Funbox Metadata added
m4dd0c Apr 23, 2025
4ffd4be
feat: GetText.getRussian fn added
m4dd0c Apr 23, 2025
49b4610
feat!: Russian funbox added
m4dd0c Apr 23, 2025
e292a66
fix: circular deps removed
m4dd0c Apr 23, 2025
085968b
chore: rememberSettings function added in russian funbox
m4dd0c Apr 23, 2025
f6bc7a9
fix: rememberSettings added to updateLanguage in russian funbox
m4dd0c Apr 23, 2025
a14565c
Merge branch 'master' into master
m4dd0c Apr 23, 2025
53d08a3
Merge branch 'monkeytypegame:master' into master
m4dd0c Apr 25, 2025
4b96bd8
impr: charset field added for validation
m4dd0c Apr 25, 2025
8e4af22
impr: charset field added as optional field for langugage validation
m4dd0c Apr 25, 2025
9a04dc9
feat: charsetRange added as script unicode collection
m4dd0c Apr 25, 2025
d229136
feat: gibberish funbox logic updated to support different charsets
m4dd0c Apr 25, 2025
f16ad62
mics!: refactored json-data and charsetRange to improve typesafety
m4dd0c Apr 26, 2025
c296cad
chore: added charset field to JSON language files.
m4dd0c Apr 26, 2025
c8d21b6
Merge branch 'enhance/gibberish' into HEAD
m4dd0c Apr 26, 2025
6b02c2e
chore: ran pretty-fix
m4dd0c Apr 26, 2025
8eb3e03
fix(funbox): removed unsolicited arabic and russian funboxes
m4dd0c Apr 26, 2025
d56b79f
feat(utils): enhance charset range handling in gibberish gen
m4dd0c May 8, 2025
c0a37dd
Merge branch 'master' into master
m4dd0c May 8, 2025
6714c97
fix(conflict): json-data conflict fixed
m4dd0c May 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions frontend/scripts/json-validation.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,9 @@ function validateLanguages() {
rightToLeft: { type: "boolean" },
noLazyMode: { type: "boolean" },
bcp47: { type: "string" },
charset: {
type: "string",
},
words: {
type: "array",
items: { type: "string", minLength: 1 },
Expand Down
13 changes: 11 additions & 2 deletions frontend/src/ts/test/funbox/funbox-functions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -417,8 +417,17 @@ const list: Partial<Record<FunboxName, FunboxFunctions>> = {
},
},
gibberish: {
getWord(): string {
return GetText.getGibberish();
async withWords(words): Promise<Wordset> {
if (!words || words.length === 0) {
return new Wordset([]);
}

const lang = await JSONData.getLanguage(Config.language);
const gibberishWords = words.map(() =>
GetText.getGibberish(lang?.charset || "latin")
);

return new Wordset(gibberishWords);
},
},
ascii: {
Expand Down
85 changes: 85 additions & 0 deletions frontend/src/ts/utils/charsetRange.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
export const charsetRanges = {
arabic: [
{ start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
{ start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)
{ start: 1610, end: 1610 }, // U+064A (ي)
],
latin: [
{ start: 97, end: 122 }, // U+0061-U+007A (a to z)
],
cyrillic: [
{ start: 1072, end: 1103 }, // U+0430-U+044F (а to я)
],
devanagari: [
{ start: 2309, end: 2361 }, // U+0905–U+0939 (अ to ह)
{ start: 2366, end: 2376 }, // U+093E–U+0948 (vowel signs आ to ऐ)
],
gujarati: [
{ start: 2693, end: 2702 }, // U+0A85–U+0A94 (અ to ઔ)
{ start: 2705, end: 2745 }, // U+0A95–U+0AB9 (ક to હ)
{ start: 2750, end: 2764 }, // U+0ABE–U+0ACC (vowel signs ા to ૌ)
],
geez: [
{ start: 4768, end: 4960 }, // U+1200–U+135F (ሀ to ፟)
],
tamil: [
{ start: 2949, end: 3020 }, // U+0B85–U+0BBC (அ to ஔ)
{ start: 3006, end: 3028 }, // U+0BBE–U+0BCC (vowel signs ா to ௌ)
],
telugu: [
{ start: 3077, end: 3148 }, // U+0C05–U+0C4C (అ to ౌ)
{ start: 3158, end: 3160 }, // U+0C56–U+0C58 (additional vowels ౖ to ౘ)
],
bengali: [
{ start: 2437, end: 2489 }, // U+0985–U+09B9 (অ to হ)
{ start: 2494, end: 2508 }, // U+09BE–U+09CC (vowel signs া to ৌ)
],
malayalam: [
{ start: 3333, end: 3396 }, // U+0D05–U+0D3C (അ to ഹ)
{ start: 3398, end: 3404 }, // U+0D3E–U+0D44 (vowel signs ാ to ൄ)
],
kannada: [
{ start: 3205, end: 3268 }, // U+0C85–U+0CBC (ಅ to ಹ)
{ start: 3270, end: 3276 }, // U+0CBE–U+0CC4 (vowel signs ಾ to ೄ)
],
burmese: [
{ start: 4096, end: 4138 }, // U+1000–U+102A (က to ဪ)
],
tibetan: [
{ start: 3904, end: 3911 }, // U+0F40–U+0F47 (ཀ to ཧ)
],
sinhala: [
{ start: 3461, end: 3516 }, // U+0D85–U+0DBC (අ to හ)
{ start: 3535, end: 3551 }, // U+0DCF–U+0DDF (vowel signs ඾ to ෟ)
],
hebrew: [
{ start: 1488, end: 1514 }, // U+05D0-U+05EA (א to ת)
],
thai: [
{ start: 3585, end: 3631 }, // U+0E01–U+0E2F (ก to ๏)
],
greek: [
{ start: 945, end: 969 }, // U+03B1-U+03C9 (α to ω)
],
han: [
{ start: 19968, end: 27903 }, // U+4E00–U+6CAF (common CJK ideographs)
],
hangul: [
{ start: 44032, end: 55203 }, // U+AC00-U+D7A3 (가 to 힣)
],
khmer: [
{ start: 6016, end: 6067 }, // U+1780–U+17B3 (ក to ឳ)
],
ol_chiki: [
{ start: 7248, end: 7293 }, // U+1C5A–U+1C7D (ᱚ to ᱽ)
],
hiragana: [
{ start: 12353, end: 12438 }, // U+3041-U+3096 (あ to ん)
],
katakana: [
{ start: 12449, end: 12538 }, // U+30A1-U+30FA (ア to ン)
],
} as const;

// Charset type
export type Charset = keyof typeof charsetRanges;
22 changes: 18 additions & 4 deletions frontend/src/ts/utils/generate.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { randomIntFromRange } from "@monkeytype/util/numbers";
import * as Arrays from "./arrays";
import * as Strings from "./strings";
import { Charset, charsetRanges } from "./charsetRange";

/**
* Generates a random binary string of length 8.
Expand Down Expand Up @@ -101,15 +102,28 @@ export function getMorse(word: string): string {
* Generates a random gibberish string of lowercase letters.
* @returns The generated gibberish string.
*/
export function getGibberish(): string {
export function getGibberish(charset: Charset): string {
const randLen = randomIntFromRange(1, 7);
let ret = "";
for (let i = 0; i < randLen; i++) {
ret += String.fromCharCode(97 + randomIntFromRange(0, 25));
const ranges = charsetRanges[charset];

const chars = [];
for (let range of ranges) {
for (let i = range.start; i < range.end; i++) {
chars.push(i);
}
}

while (ret.length < randLen) {
const ch = String.fromCharCode(Arrays.randomElementFromArray(chars));

// Sanitizing the character
// keeping letters and vowels, killing viramas
// ref: https://www.regular-expressions.info/unicode.html
if (/\p{L}|\p{Mc}/u.test(ch)) ret += ch;
}
return ret;
}

/**
* Generates a random ASCII string of printable characters.
* @returns The generated ASCII string.
Expand Down
7 changes: 4 additions & 3 deletions frontend/src/ts/utils/json-data.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { FunboxName } from "@monkeytype/contracts/schemas/configs";
import { Language } from "@monkeytype/contracts/schemas/languages";
import { Accents } from "../test/lazy-mode";
import { Charset } from "./charsetRange";
import { FunboxName } from "@monkeytype/contracts/schemas/configs";

/**
* Fetches JSON data from the specified URL using the fetch API.
Expand Down Expand Up @@ -98,6 +99,7 @@ export type LanguageObject = {
noLazyMode?: boolean;
ligatures?: boolean;
orderedByFrequency?: boolean;
charset?: Charset;
words: string[];
additionalAccents: Accents;
bcp47?: string;
Expand All @@ -111,8 +113,7 @@ let currentLanguage: LanguageObject;
* @param lang The language code.
* @returns A promise that resolves to the language object.
*/
export async function getLanguage(lang: Language): Promise<LanguageObject> {
// try {
export async function getLanguage(lang: string): Promise<LanguageObject> {
if (currentLanguage === undefined || currentLanguage.name !== lang) {
currentLanguage = await cachedFetchJson<LanguageObject>(
`/languages/${lang}.json`
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/amharic.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "amharic",
"ligatures": false,
"bcp47": "am-ET",
"charset": "geez",
"words": [
"እግዚአብሔር",
"መጽሐፍ",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/amharic_1k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "amharic_1k",
"ligatures": false,
"bcp47": "am-ET",
"charset": "geez",
"words": [
"መለየት",
"ማሰብ",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/amharic_5k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "amharic_5k",
"ligatures": false,
"bcp47": "am-ET",
"charset": "geez",
"words": [
"ሙዚቀኝነት",
"የሚባለው",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/arabic.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"rightToLeft": true,
"ligatures": true,
"bcp47": "ar-SA",
"charset": "arabic",
"words": [
"أَتَمَنَّى",
"أَثِقْ",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/arabic_10k.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"rightToLeft": true,
"ligatures": true,
"bcp47": "ar-SA",
"charset": "arabic",
"words": [
" اِكْتَشَفَ",
" فَيَجِبُ",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/arabic_egypt.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"rightToLeft": true,
"ligatures": true,
"bcp47": "ar-EG",
"charset": "arabic",
"words": [
"ازيك",
"ايه",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/arabic_egypt_1k.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"rightToLeft": true,
"ligatures": true,
"bcp47": "ar-EG",
"charset": "arabic",
"words": [
"بلاش",
"بسرعة",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/armenian.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "armenian",
"noLazyMode": true,
"orderedByFrequency": false,
"charset": "armenian",
"words": [
"ազատ",
"քաղաք",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/armenian_1k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "armenian_1k",
"noLazyMode": true,
"orderedByFrequency": false,
"charset": "armenian",
"words": [
"ազատ",
"քաղաք",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/armenian_western.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "armenian_western",
"bcp47": "hyw",
"charset": "armenian",
"words": [
"կանանց",
"իրեն",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/armenian_western_1k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "armenian_western_1k",
"bcp47": "hyw",
"charset": "armenian",
"words": [
"թարգմանուած",
"անոր",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/bangla.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"ligatures": true,
"noLazyMode": true,
"bcp47": "bn-BD",
"charset": "bengali",
"words": [
"।",
"আমি",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/bangla_10k.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"ligatures": true,
"noLazyMode": true,
"bcp47": "bn_BD",
"charset": "bengali",
"words": [
"।",
"আমি",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/bangla_letters.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"ligatures": true,
"noLazyMode": true,
"bcp47": "bn-BD",
"charset": "bengali",
"words": [
"অ",
"আ",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/bashkir.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "bashkir",
"bcp-47": "ba",
"orderedByFrequency": true,
"charset": "cyrillic",
"words": [
"бер",
"һәм",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "belarusian",
"noLazyMode": true,
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"ён",
"і",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_100k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "belarusian_100k",
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"а",
"аагамія",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_10k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "belarusian_10k",
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"а",
"аазіс",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_1k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "belarusian_1k",
"noLazyMode": true,
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"ён",
"і",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_25k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "belarusian_25k",
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"а",
"аагамія",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_50k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "belarusian_50k",
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"а",
"аагамія",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/belarusian_5k.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "belarusian_5k",
"bcp47": "be-BY",
"charset": "cyrillic",
"words": [
"аазіс",
"ааліт",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/bulgarian.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"name": "bulgarian",
"noLazyMode": true,
"charset": "cyrillic",
"words": [
"а",
"аз",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/chinese_simplified.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "chinese_simplified",
"_comment": "Sourced from https://gist.github.com/indiejoseph/eae09c673460aa0b56db",
"bcp47": "zh-CN",
"charset": "han",
"words": [
"我们",
"他们",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/chinese_simplified_10k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "chinese_simplified_10k",
"_comment": "Sourced from https://gist.github.com/indiejoseph/eae09c673460aa0b56db",
"bcp47": "zh-CN",
"charset": "han",
"words": [
"我们",
"他们",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/chinese_simplified_1k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "chinese_simplified_1k",
"_comment": "Sourced from https://gist.github.com/indiejoseph/eae09c673460aa0b56db",
"bcp47": "zh-CN",
"charset": "han",
"words": [
"我们",
"他们",
Expand Down
1 change: 1 addition & 0 deletions frontend/static/languages/chinese_simplified_50k.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"name": "chinese_simplified_50k",
"_comment": "Sourced from https://gist.github.com/indiejoseph/eae09c673460aa0b56db",
"bcp47": "zh-CN",
"charset": "han",
"words": [
"我们",
"他们",
Expand Down
Loading