-
-
Notifications
You must be signed in to change notification settings - Fork 977
feat(locale): Add ku locale #3441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for fakerjs ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## next #3441 +/- ##
=======================================
Coverage 99.97% 99.97%
=======================================
Files 2817 2822 +5
Lines 217493 217580 +87
Branches 950 952 +2
=======================================
+ Hits 217438 217525 +87
Misses 55 55
🚀 New features to boost your workflow:
|
According to Wikipedia https://en.wikipedia.org/wiki/Kurdish_language
So I wonder if we should give this a script suffix similar to we did for Uzbek and Serbian https://fakerjs.dev/guide/localization.html#available-locales to distinguish Kurdish written in Latin characters and Arabic characters? |
Kurdish has multiple dialects, but the main ones are Kurmanji and Sorani. Kurmanji can be written in both Latin and Arabic scripts. I plan to add these three variations ku_KMR_latin → Kurmanji (Latin) Would this be the correct approach for Faker.js? I want to confirm before proceeding. @matthewmayer |
Hmm I don't think we yet have any other locales where it's necessary to disambiguate by ISO 639-3 code for different dialects. I think it might be confusing to put the language code where we would otherwise put the country code so it probably makes sense to put it as part of the variant suffix. Let's also check how other localisation of open source software handles this. |
I searched for a solution but couldn’t find anything that exactly fits our needs. However, I believe we can handle this using the following approach: ku_IQ_Arab: Sorani (Arabic) IQ represents Iraq, where most Sorani speakers live. This approach seems like our best fit. However, I don’t like using IQ, TR, and SY because Kurdish people are divided among four countries, including Iran. I wish I could use KU instead, but unfortunately, it is not a standard code. |
I agree the country codes aren't ideal. It partly depends if you expect to add data for eg phone module which has country codes , location module which has cities etc. if those are all from one country then a country code would be appropriate. If not then we should try to find a generic solution. |
Wikipedia uses ku for Kurmanji (with a toggle between ku-latn and ku-arab) and ckb for Sorani |
I think the cleanest would probably be:
|
i though we should use only contry code like IQ after the ku_ , but if it is fine to use kmr and ckb like you said that would be the best way , thank you , i will start with ku_ckb |
@matthewmayer I’ve updated the changes and pushed them. Do I need to do anything else besides adding more modules? |
For ease for review please don't add any more modules for now. We prefer to get a small PR reviewed first, once it is approved you can follow up with additional PRs for other dialects and modules. Nothing to do for now, I will let the other maintainers take a look. Please bear with us it can take a few days or weeks to get initial PR approved. |
Hello 🙂 Thanks to @matthewmayer to drive this PR in review process for us 🚀 Otherwise I would give already an approve ✅ |
i think en_BORK is the only other example of a locale with a language and variant but no country at the moment. (Admittedly a silly example, and "BORK" should probably be lowercase but i dont want to introduce a breaking change for a silly Easter egg locale). But in this case when people speaking Kurdish are spread over several countries, it seems more appropriate. |
These are exactly also my background thoughts 👍 |
I think we don't have a really good definition for what goes in "variant" at the moment, but i think its basically "the minimum needed to disambiguate versions of a macro-language", which might include a part 3 code from https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes (e.g. ckb) and/or a script (e.g. latin, arab) and/or another descriptor if there's no standard code (en_GB_cockney_rhyming_slang)? |
@arentalb I'll be honest with you here. We discussed this PR during the last two weekly meetings and still haven't come to a conclusion. The case you present is quite the hard one. For starters, lets have a look into our current definition for locale names from our website: https://fakerjs.dev/guide/localization.html#locale-codes.
One could argue that the "Optionally" indicates that the country code is optionally entirely, while another one could say that it is only optionally when considering the base locale code (from the first paragraph). By making the country code required, we could at least somewhat restrict the maximum amount of possible faker locales. Furthermore, a lot of modules (example internet, location) are only usful if they are paired with a specific country. Having to maintain an indefinite amount of locales is sadly a fact we need to consider as maintainers. We totally see the reasoning of your disliking regarding the implementation by definition. But by the current definition, this might be the best option. |
Thanks for reviewing the PR — much appreciated! Regarding your suggestion, yes — I can definitely work on a generic ku locale. I believe the best candidate for a base locale would be Kurdish Sorani (Central Kurdish) in Arabic script. It's my native dialect and also the most widely understood among Kurdish speakers. do you agree with using Sorani Arabic script for the generic ku, or would you recommend a different approach? |
First of all, thank you for your kind and considered answer. 🙏 According to wikipedia the largest dialect group is Kurmanji (Northern Kurdish) with 15 to 20 million speakers. Sorani (Central Kurdish) has "only" 6 to 7 million speakers. |
I’ll talk to some of my Kurmanji friends and let you know what makes the most sense |
Thank you for your cooperation. I'm sorry that your first contribution is that complicated 😐 |
In theory could a generic Kurdish locale just contain things like phone numbers and domain names which would be the same in both scripts? |
Sorry for the late reply I’ve translated a set of base words to compare Sorani and Kurmanji - as shown in the file, there are noticeable differences between the two dialects. While a few words are shared, the overlap doesn’t seem strong enough to build a reliable base fallback, in my opinion. So I don’t think a generic Kurdish locale is the right approach. That leaves us with two possible options: 1- Fallback to one dialect - I’d suggest Sorani. In my experience, Kurmanji speakers usually understand Sorani, but the reverse isn’t true - Sorani speakers typically understand only about 20–30% of Kurmanji. 2- Treat them as separate locales - This might be the better option overall, especially since Kurmanji has two scripts (Latin and Arabic). |
This PR adds Kurdish support for the lorem module in Faker.js. This is the first step towards adding full Kurdish locale support, with more modules to come in future updates.
Ran pnpm run preflight with no issues
Generated locales with pnpm run generate:locales
Formatted code (pnpm run format) and passed linting (pnpm run lint)
Verified tests pass
Let me know if any changes are needed!