Skip to content

feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) #6488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

m4dd0c
Copy link
Contributor

@m4dd0c m4dd0c commented Apr 23, 2025

Description

Added Arabic & Russian (cyrillic) Funboxes

Added two new funboxes Arabic & Russian with gibberish word generators. Also added logic to automatically force the Arabic language if the Arabic funbox is active to prevent config issues.

Changes
  • Added getArabic() and getRussian() utility functions.
  • Integrated both into funbox-functions.ts and list.ts as Metadata.
  • Forced Arabic language setting in setLanguage() when needed.
  • Updated FunboxName types.

Closes #6181

Let me know If any changes are required.

@monkeytypegeorge monkeytypegeorge added frontend User interface or web stuff packages Changes in local packages labels Apr 23, 2025
Copy link
Contributor

Continuous integration check(s) failed. Please review the failing check's logs and make the necessary changes.

@github-actions github-actions bot added the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 23, 2025
@m4dd0c m4dd0c changed the title feat(funbox): added support for cyrillic and arabic charset @m4dd0c feat(funbox): added support for cyrillic and arabic charset (@m4dd0c) Apr 23, 2025
@github-actions github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 23, 2025
@byseif21
Copy link
Contributor

also for the names, like here "arabic" feels too generic and misses the random word vibe, unlike gibberish and others. I propose "Fawda" (فوضي, means in arabic "mess") or "Harfiyya" (حرفية, from “letter”). Both are catchy and hint at the playful chaos. It would be much better if it was as arabic word but I think this isn't an option , idk about russian but If this is agreed I think it should be changed too.
Nice work though!

Before all of that also wait for @Miodec to approve everything.

@byseif21
Copy link
Contributor

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

@fehmer
Copy link
Member

fehmer commented Apr 24, 2025

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

I was thinking the same

@m4dd0c
Copy link
Contributor Author

m4dd0c commented Apr 24, 2025

also idea , since those are nearly gibberish again , instead of making those a new funboxes shouldn't we just try to make the gibberish mode support different lanuages ? like selecting the gibberish funbox using english,to generate a latin letters , if with using arabic to generate and show arabic letters and so on !?

Great Idea actually. I'd love to work on it.
Also @fehmer supporting the idea. So we are good to go I believe.

I wonder What's gonna happen to the current issue #6181 and PR #6488

@monkeytypegeorge monkeytypegeorge added the assets Languages, themes, layouts, etc. label Apr 26, 2025
@github-actions github-actions bot added the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 26, 2025
@github-actions github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label Apr 26, 2025
@m4dd0c m4dd0c requested a review from Miodec April 26, 2025 14:40
@github-actions github-actions bot added the waiting for review Pull requests that require a review before continuing label Apr 26, 2025
@byseif21
Copy link
Contributor

Hi @m4dd0c Very nice work!
Just a little curious - did you test if the Arabic is shown with connected or separated characters here?

@m4dd0c
Copy link
Contributor Author

m4dd0c commented Apr 27, 2025

Hi @m4dd0c Very nice work! Just a little curious - did you test if the Arabic is shown with connected or separated characters here?

Thank you @byseif21
I don't understand Arabic. can you check and tell me.

@byseif21
Copy link
Contributor

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg

    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },

also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for navtive user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk

@m4dd0c
Copy link
Contributor Author

m4dd0c commented Apr 28, 2025

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg

    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },

also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for native user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk

I intentionally kept the charset range minimal, since there are plenty of bloated letters that may cause issues while typing.
e.g., blank characters, unsupported characters, and letters that are not being used in any language.

Furthermore, If I go with the approach you have mentioned then I wonder how someone would select between the standard and non-standard version of different languages?

What I have in mind is, We can have 2 funboxes,

  1. Gibberish Standard
  2. Gibberish Extended

Thoughts?

@fehmer
Copy link
Member

fehmer commented Apr 28, 2025

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

@m4dd0c
Copy link
Contributor Author

m4dd0c commented Apr 28, 2025

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

This PR, Currently supporting minimal, common set range only as @Miodec suggested.

@byseif21
Copy link
Contributor

I checked and they are fine connected as it should, however I noticed another thing here that the arabic ranges have the extended Arabic letters and they aren't typeable for the Arabic languages (arabic , Arabic_Egypt) as they use standard characters only, but the extended letters may be used in different languages l(e.g., Persian, Kurdish, Urdu) so I think we should separate the range to be at least like two , one as standard for the language and a general one, eg

    standard_arabic: [                                        // this witout the extended
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)  
      { start: 1610,  end: 1610}  // U+064A (ي)
     arabic: {                                                // this with
        start: 1569, // ء (U+0621)
        end: 1610   // ي (U+064A)
      },

also I think some other languages may have that problem too, so doing the same approach ( one without the extended & one with the extended) or doing range for each language separately will be better I guess , we can just leave the rest and wait for a native user to put them correctly . or we can leave them like that generally and wait for native user for each language to notice and if he felt the need to change/fix the range, he do it himself or open an issue with it, idk

I intentionally kept the charset range minimal, since there are plenty of bloated letters that may cause issues while typing. e.g., blank characters, unsupported characters, and letters that are not being used in any language.

Furthermore, If I go with the approach you have mentioned then I wonder how someone would select between the standard and non-standard version of different languages?

What I have in mind is, We can have 2 funboxes,

  1. Gibberish Standard
  2. Gibberish Extended

Thoughts?

No one will select anything, and we don't have to make a two different modes, the languages will just use different charset names in the language files , language eg. for the arabic with standard letters (arabic , Arabic_Egypt) will put in the charset, charset: standard_arabic
and the languages that need the extended will set it as charset: arabic , just like that

@byseif21
Copy link
Contributor

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

Okay that sound the best solution to not overcomplacate the things more than that here , I did the search and that may be the best ranges we can go with for now , all without the extended letters, rare characters, or problematic symbols

arabic: [
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)
      { start: 1610,  end: 1610}  // U+064A (ي)
    ],
    latin: {
      start: 97,  // a (U+0061)
      end: 122    // z (U+007A)
    },
    cyrillic: {
      start: 1072, // а (U+0430)
      end: 1103    // я (U+044F)
    },
    devanagari: [
      { start: 2309, end: 2361 }, // U+0905–U+0939 (अ to ह)
      { start: 2366, end: 2376 }  // U+093E–U+0948 (vowel signs आ to ऐ)
    ],
    geez: [
      { start: 4768, end: 4960 } // U+1200–U+135F (ሀ to ፟)
    ],
    tamil: [
      { start: 2949, end: 3020 }, // U+0B85–U+0BBC (அ to ஔ)
      { start: 3006, end: 3028 }  // U+0BBE–U+0BCC (vowel signs ா to ௌ)
    ],
    telugu: [
      { start: 3077, end: 3148 }, // U+0C05–U+0C4C (అ to ౌ)
      { start: 3158, end: 3160 }  // U+0C56–U+0C58 (additional vowels ౖ to ౘ)
    ],
    bengali: [
      { start: 2437, end: 2489 }, // U+0985–U+09B9 (অ to হ)
      { start: 2494, end: 2508 }  // U+09BE–U+09CC (vowel signs া to ৌ)
    ],
    malayalam: [
      { start: 3333, end: 3396 }, // U+0D05–U+0D3C (അ to ഹ)
      { start: 3398, end: 3404 }  // U+0D3E–U+0D44 (vowel signs ാ to ൄ)
    ],
    kannada: [
      { start: 3205, end: 3268 }, // U+0C85–U+0CBC (ಅ to ಹ)
      { start: 3270, end: 3276 }  // U+0CBE–U+0CC4 (vowel signs ಾ to ೄ)
    ],
    burmese: [
      { start: 4096, end: 4138 } // U+1000–U+102A (က to ဪ)
    ],
    tibetan: [
      { start: 3904, end: 3911 } // U+0F40–U+0F47 (ཀ to ཧ)
    ],
    sinhala: [
      { start: 3461, end: 3516 }, // U+0D85–U+0DBC (අ to හ)
      { start: 3535, end: 3551 }  // U+0DCF–U+0DDF (vowel signs ඾ to ෟ)
    ],
    hebrew: {
      start: 1488, // א (U+05D0)
      end: 1514    // ת (U+05EA)
    },
    thai: [
      { start: 3585, end: 3631 } // U+0E01–U+0E2F (ก to ๏)
    ],
    greek: {
      start: 945,  // α (U+03B1)
      end: 969     // ω (U+03C9)
    },
    han: [
      { start: 19968, end: 27903 } // U+4E00–U+6CAF (common CJK ideographs)
    ],
    hangul: {
      start: 44032, // 가 (U+AC00)
      end: 55203    // 힣 (U+D7A3)
    },
    khmer: [
      { start: 6016, end: 6067 } // U+1780–U+17B3 (ក to ឳ)
    ],
    ol_chiki: [
      { start: 7248, end: 7293 } // U+1C5A–U+1C7D (ᱚ to ᱽ)
    ],
    hiragana: {
      start: 12353, // あ (U+3041)
      end: 12438    // ん (U+3096)
    },
    katakana: {
      start: 12449, // ア (U+30A1)
      end: 12538    // ン (U+30FA)
    }
  };
 

@m4dd0c
Copy link
Contributor Author

m4dd0c commented Apr 28, 2025

I think mios idea was to not overcomplicate this feature. #6488 (comment)

With latin we use the most basic alphabet a-z. For like italian we would have unused letters (like j and k) and are missing letters like (è).

Can we do the same for the other languages, find a minimal, common set which should be typeable?

Okay that sound the best solution to not overcomplacate the things more than that here , I did the search and that may be the best ranges we can go with for now , all without the extended letters, rare characters, or problematic symbols

arabic: [
      { start: 1569, end: 1594 }, // U+0621–U+063A (ء to غ)
      { start: 1601, end: 1608 }, // U+0641–U+0648 (ف to و)
      { start: 1610,  end: 1610}  // U+064A (ي)
    ],
....
 

Well, I like the idea. I'll make proposed changes.
But not sure what @Miodec 's take on this.

I will update the getGibberish logic to generate combined letters (supporting matras) for certain scripts e.g., devanagari.

@m4dd0c m4dd0c marked this pull request as draft May 5, 2025 07:43
@github-actions github-actions bot removed the waiting for review Pull requests that require a review before continuing label May 5, 2025
m4dd0c and others added 2 commits May 8, 2025 23:43
Refactor charsetRanges to support multiple ranges per charset. Updated
getGibberish to utilize the new structure for generating random gibberish
strings. This improves flexibility and allows handling of complex charsets
with multiple ranges.

BREAKING CHANGE: charsetRanges structure has been modified to an array
of ranges instead of a single range object. Update any dependent code
accordingly.
@m4dd0c m4dd0c marked this pull request as ready for review May 8, 2025 18:27
@github-actions github-actions bot added the waiting for review Pull requests that require a review before continuing label May 8, 2025
Copy link
Contributor

github-actions bot commented May 8, 2025

Continuous integration check(s) failed. Please review the failing check's logs and make the necessary changes.

@github-actions github-actions bot added waiting for update Pull requests or issues that require changes/comments before continuing and removed waiting for review Pull requests that require a review before continuing labels May 8, 2025
@github-actions github-actions bot removed the waiting for update Pull requests or issues that require changes/comments before continuing label May 8, 2025
@m4dd0c m4dd0c marked this pull request as draft May 8, 2025 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assets Languages, themes, layouts, etc. frontend User interface or web stuff packages Changes in local packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for different character set to funboxes
5 participants