feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) #6488

m4dd0c · 2025-04-23T19:20:43Z

Description

Added Arabic & Russian (cyrillic) Funboxes

Added two new funboxes Arabic & Russian with gibberish word generators. Also added logic to automatically force the Arabic language if the Arabic funbox is active to prevent config issues.

Changes

Added getArabic() and getRussian() utility functions.
Integrated both into funbox-functions.ts and list.ts as Metadata.
Forced Arabic language setting in setLanguage() when needed.
Updated FunboxName types.

Closes #6181

Let me know If any changes are required.

…nbox

Arabic funbox config added

…nbox

github-actions · 2025-04-23T19:22:43Z

Continuous integration check(s) failed. Please review the failing check's logs and make the necessary changes.

frontend/src/ts/config.ts

packages/funbox/src/list.ts

byseif21 · 2025-04-24T00:48:58Z

also for the names, like here "arabic" feels too generic and misses the random word vibe, unlike gibberish and others. I propose "Fawda" (فوضي, means in arabic "mess") or "Harfiyya" (حرفية, from “letter”). Both are catchy and hint at the playful chaos. It would be much better if it was as arabic word but I think this isn't an option , idk about russian but If this is agreed I think it should be changed too.
Nice work though!

Before all of that also wait for @Miodec to approve everything.

byseif21 · 2025-04-24T01:30:39Z

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

fehmer · 2025-04-24T07:47:01Z

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

I was thinking the same

m4dd0c · 2025-04-24T07:58:25Z

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

Great Idea actually. I'd love to work on it.
Also @fehmer supporting the idea. So we are good to go I believe.

I wonder What's gonna happen to the current issue #6181 and PR #6488

frontend/src/ts/test/funbox/funbox-functions.ts

byseif21 · 2025-04-27T13:58:05Z

Hi @m4dd0c Very nice work!
Just a little curious - did you test if the Arabic is shown with connected or separated characters here?

m4dd0c · 2025-04-27T16:30:06Z

Hi @m4dd0c Very nice work! Just a little curious - did you test if the Arabic is shown with connected or separated characters here?

Thank you @byseif21
I don't understand Arabic. can you check and tell me.

byseif21 · 2025-04-27T20:24:52Z

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg

    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },

also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for navtive user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk

m4dd0c · 2025-04-28T04:40:06Z

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg
    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },
also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for native user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk

I intentionally kept the charset range minimal, since there are plenty of bloated letters that may cause issues while typing.
e.g., blank characters, unsupported characters, and letters that are not being used in any language.

Furthermore, If I go with the approach you have mentioned then I wonder how someone would select between the standard and non-standard version of different languages?

What I have in mind is, We can have 2 funboxes,

Gibberish Standard
Gibberish Extended

Thoughts?

fehmer · 2025-04-28T04:56:02Z

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

m4dd0c · 2025-04-28T06:18:15Z

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

This PR, Currently supporting minimal, common set range only as @Miodec suggested.

byseif21 · 2025-04-28T08:55:58Z

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg
    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },
also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for native user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk
I intentionally kept the charset range minimal, since there are plenty of bloated letters that may cause issues while typing. e.g., blank characters, unsupported characters, and letters that are not being used in any language.

Furthermore, If I go with the approach you have mentioned then I wonder how someone would select between the standard and non-standard version of different languages?

What I have in mind is, We can have 2 funboxes,

Gibberish Standard

Gibberish Extended

Thoughts?

No one will select anything, and we don't have to make a two different modes, the languages will just use different charset names in the language files , language eg. for the arabic with standard letters (arabic , Arabic_Egypt) will put in the charset, charset: standard_arabic
and the languages that need the extended will set it as charset: arabic , just like that

byseif21 · 2025-04-28T09:05:57Z

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

Okay that sound the best solution to not overcomplacate the things more than that here , I did the search and that may be the best ranges we can go with for now , all without the extended letters, rare characters, or problematic symbols

arabic: [
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)
      { start: 1610,  end: 1610}  // U+064A (ي)
    ],
    latin: {
      start: 97,  // a (U+0061)
      end: 122    // z (U+007A)
    },
    cyrillic: {
      start: 1072, // а (U+0430)
      end: 1103    // я (U+044F)
    },
    devanagari: [
      { start: 2309, end: 2361 }, // U+0905–U+0939 (अ to ह)
      { start: 2366, end: 2376 }  // U+093E–U+0948 (vowel signs आ to ऐ)
    ],
    geez: [
      { start: 4768, end: 4960 } // U+1200–U+135F (ሀ to ፟)
    ],
    tamil: [
      { start: 2949, end: 3020 }, // U+0B85–U+0BBC (அ to ஔ)
      { start: 3006, end: 3028 }  // U+0BBE–U+0BCC (vowel signs ா to ௌ)
    ],
    telugu: [
      { start: 3077, end: 3148 }, // U+0C05–U+0C4C (అ to ౌ)
      { start: 3158, end: 3160 }  // U+0C56–U+0C58 (additional vowels ౖ to ౘ)
    ],
    bengali: [
      { start: 2437, end: 2489 }, // U+0985–U+09B9 (অ to হ)
      { start: 2494, end: 2508 }  // U+09BE–U+09CC (vowel signs া to ৌ)
    ],
    malayalam: [
      { start: 3333, end: 3396 }, // U+0D05–U+0D3C (അ to ഹ)
      { start: 3398, end: 3404 }  // U+0D3E–U+0D44 (vowel signs ാ to ൄ)
    ],
    kannada: [
      { start: 3205, end: 3268 }, // U+0C85–U+0CBC (ಅ to ಹ)
      { start: 3270, end: 3276 }  // U+0CBE–U+0CC4 (vowel signs ಾ to ೄ)
    ],
    burmese: [
      { start: 4096, end: 4138 } // U+1000–U+102A (က to ဪ)
    ],
    tibetan: [
      { start: 3904, end: 3911 } // U+0F40–U+0F47 (ཀ to ཧ)
    ],
    sinhala: [
      { start: 3461, end: 3516 }, // U+0D85–U+0DBC (අ to හ)
      { start: 3535, end: 3551 }  // U+0DCF–U+0DDF (vowel signs ඾ to ෟ)
    ],
    hebrew: {
      start: 1488, // א (U+05D0)
      end: 1514    // ת (U+05EA)
    },
    thai: [
      { start: 3585, end: 3631 } // U+0E01–U+0E2F (ก to ๏)
    ],
    greek: {
      start: 945,  // α (U+03B1)
      end: 969     // ω (U+03C9)
    },
    han: [
      { start: 19968, end: 27903 } // U+4E00–U+6CAF (common CJK ideographs)
    ],
    hangul: {
      start: 44032, // 가 (U+AC00)
      end: 55203    // 힣 (U+D7A3)
    },
    khmer: [
      { start: 6016, end: 6067 } // U+1780–U+17B3 (ក to ឳ)
    ],
    ol_chiki: [
      { start: 7248, end: 7293 } // U+1C5A–U+1C7D (ᱚ to ᱽ)
    ],
    hiragana: {
      start: 12353, // あ (U+3041)
      end: 12438    // ん (U+3096)
    },
    katakana: {
      start: 12449, // ア (U+30A1)
      end: 12538    // ン (U+30FA)
    }
  };

m4dd0c · 2025-04-28T12:19:00Z

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

Okay that sound the best solution to not overcomplacate the things more than that here , I did the search and that may be the best ranges we can go with for now , all without the extended letters, rare characters, or problematic symbols
arabic: [
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)
      { start: 1610,  end: 1610}  // U+064A (ي)
    ],
....
 

Well, I like the idea. I'll make proposed changes.
But not sure what @Miodec 's take on this.

I will update the getGibberish logic to generate combined letters (supporting matras) for certain scripts e.g., devanagari.

Refactor charsetRanges to support multiple ranges per charset. Updated getGibberish to utilize the new structure for generating random gibberish strings. This improves flexibility and allows handling of complex charsets with multiple ranges. BREAKING CHANGE: charsetRanges structure has been modified to an array of ranges instead of a single range object. Update any dependent code accordingly.

github-actions · 2025-05-08T18:32:33Z

Continuous integration check(s) failed. Please review the failing check's logs and make the necessary changes.

m4dd0c and others added 14 commits December 19, 2024 12:31

fix: typo fixed underscare -> underscore

a3ba897

Merge branch 'master' of https://github.com/monkeytypegame/monkeytype

062fe38

feat: arabic funbox added

854e546

chore: temporarily implementation of getArabic function for arabic fu…

aec0659

…nbox

misc!: arabic funbox name added as types,

0a9fe57

Arabic funbox config added

chore: temporarily implementation of getArabic function for arabic fu…

21f8075

…nbox

chore: Revised Arabic FunboxMetadata

c37ac7e

feat!: GetText.getArabic fn added with gibberish arabic implementation

22f9f6e

chore: Forcing Arabic language, If the Arabic funbox is activated

471e3b8

feat: Arabic funbox added

014fbf3

chore: Funbox added Russian

04ef924

feat: Russian (cyrillic) Funbox Metadata added

971c166

feat: GetText.getRussian fn added

4ffd4be

feat!: Russian funbox added

49b4610

monkeytypegeorge added frontend User interface or web stuff packages Changes in local packages labels Apr 23, 2025

github-actions bot added the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 23, 2025

m4dd0c changed the title ~~feat(funbox): added support for cyrillic and arabic charset @m4dd0c~~ feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) Apr 23, 2025

github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 23, 2025

m4dd0c and others added 4 commits April 24, 2025 01:43

fix: circular deps removed

e292a66

chore: rememberSettings function added in russian funbox

085968b

fix: rememberSettings added to updateLanguage in russian funbox

f6bc7a9

Merge branch 'master' into master

a14565c

m4dd0c commented Apr 23, 2025

View reviewed changes

frontend/src/ts/config.ts Outdated Show resolved Hide resolved

byseif21 reviewed Apr 24, 2025

View reviewed changes

packages/funbox/src/list.ts Outdated Show resolved Hide resolved

Merge branch 'enhance/gibberish' into HEAD

c8d21b6

monkeytypegeorge added the assets Languages, themes, layouts, etc. label Apr 26, 2025

chore: ran pretty-fix

6b02c2e

Miodec requested changes Apr 26, 2025

View reviewed changes

frontend/src/ts/test/funbox/funbox-functions.ts Outdated Show resolved Hide resolved

github-actions bot added the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 26, 2025

fix(funbox): removed unsolicited arabic and russian funboxes

8eb3e03

github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 26, 2025

m4dd0c requested a review from Miodec April 26, 2025 14:40

github-actions bot added the waiting for review Pull requests that require a review before continuing label Apr 26, 2025

m4dd0c marked this pull request as draft May 5, 2025 07:43

github-actions bot removed the waiting for review Pull requests that require a review before continuing label May 5, 2025

m4dd0c and others added 2 commits May 8, 2025 23:43

Merge branch 'master' into master

c0a37dd

m4dd0c marked this pull request as ready for review May 8, 2025 18:27

github-actions bot added the waiting for review Pull requests that require a review before continuing label May 8, 2025

github-actions bot added waiting for update Pull requests or issues that require changes/comments before continuing and removed waiting for review Pull requests that require a review before continuing labels May 8, 2025

fix(conflict): json-data conflict fixed

6714c97

github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label May 8, 2025

m4dd0c marked this pull request as draft May 8, 2025 18:43

Uh oh!

feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) #6488

Are you sure you want to change the base?

feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) #6488

Uh oh!

Conversation

m4dd0c commented Apr 23, 2025

Description

Added Arabic & Russian (cyrillic) Funboxes

Changes

Uh oh!

github-actions bot commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

byseif21 commented Apr 24, 2025

Uh oh!

byseif21 commented Apr 24, 2025

Uh oh!

fehmer commented Apr 24, 2025

Uh oh!

m4dd0c commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

byseif21 commented Apr 27, 2025

Uh oh!

m4dd0c commented Apr 27, 2025

Uh oh!

byseif21 commented Apr 27, 2025

Uh oh!

m4dd0c commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fehmer commented Apr 28, 2025

Uh oh!

m4dd0c commented Apr 28, 2025

Uh oh!

byseif21 commented Apr 28, 2025

Uh oh!

byseif21 commented Apr 28, 2025

Uh oh!

m4dd0c commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

Uh oh!

m4dd0c commented Apr 24, 2025 •

edited

Loading

m4dd0c commented Apr 28, 2025 •

edited

Loading

m4dd0c commented Apr 28, 2025 •

edited

Loading