Skip to content

Migration from webL10n to i18next #4459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from

Conversation

ac-mmi
Copy link
Contributor

@ac-mmi ac-mmi commented Feb 24, 2025

@walterbender I have successfully migrated the internationalization system in Music Blocks from webL10n to i18next. I have tested the implementation, and it is working as expected. The only remaining task is to update the test suites, which are currently tailored for webL10n, to support the new i18next standards.

I made the following changes:

  • In loader.js: Added i18next initialization with i18nextHttpBackend to load translations from JSON files stored in the locales folder. This change allows the application to switch languages dynamically.
  • In Multiple JavaScript Files: Replaced instances of webL10n functionality with i18next's function for translations. This change was made across several files(activity.js,block.js,blocks.js,ActionBlocks.js,BooleanBlocks.js and more...)
  • Created locales Folder: Added a new folder named locales containing converted PO files into JSON format, supporting i18next's structure for managing translations.

Copy link

❌ Some Jest tests failed. Please check the logs and fix the issues before merging.

Failed Tests:

synthutils.test.js
SaveInterface.test.js
VolumeActions.test.js
musicutils.test.js
ToneActions.test.js
PitchActions.test.js
MeterActions.test.js
rubrics.test.js
turtle-singer.test.js
DrumActions.test.js
languagebox.test.js
DictActions.test.js
themebox.test.js
turtledefs.test.js
macros.test.js

@walterbender
Copy link
Member

I am not sure why you changed the _() function to t(). It is quite an invasive and seemingly unnecessary change.
Is there a standard way to convert the PO files to their JSON representation?
It seems (at least) the es file has some encoding issues.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Feb 24, 2025

@walterbender I switched from _() to t() because it's the standard function used by i18next. If preferred, I can alias _() to t() to minimize changes, make a wrapper function.
For the PO to JSON conversion, I used a custom Python script, ensuring comments were ignored and keys matched the original msgids. If there’s a preferred method, I’m open to adjustments.
I’ll check the encoding issues in the es file.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Feb 24, 2025

@walterbender If you were referring to encoding issues in es.po (if I assumed that correctly), I ran a thorough check on the PO files, including es.po, and didn't find any encoding issues using a custom script that detects unescaped quotes and encoding errors. The only issues found were in other files, like ayc.po, quz.po, and te.po, where unescaped double quotes were present.

@walterbender
Copy link
Member

I think keeping the _() wrapper is a good idea.
I am fine with a simple Python script for conversion of the PO files, but it would be good to test some of the more language-specific PO file features which we deliberately avoided with WebL10n, such as handling plurals and word order.
Please do look into the encoding issue. I only noticed it with ES but it could be elsewhere as well.

@walterbender
Copy link
Member

look t your es.json file and you'll see that it is no longer utf-8. It is some weird encoding.

"Are you sure you want to clear the workspace?": "¿Estás segura de que quieres borrar el espacio de trabajo?",

Should be:

"Are you sure you want to clear the workspace?": "¿Estás segura de que quieres borrar el espacio de trabajo?"",

Copy link

❌ Some Jest tests failed. Please check the logs and fix the issues before merging.

Failed Tests:

synthutils.test.js
VolumeActions.test.js
SaveInterface.test.js
musicutils.test.js
ToneActions.test.js
PitchActions.test.js
MeterActions.test.js
rubrics.test.js
turtle-singer.test.js
DrumActions.test.js
languagebox.test.js
DictActions.test.js
themebox.test.js
turtledefs.test.js
macros.test.js

@walterbender
Copy link
Member

Not sure what you are doing to fix the encodings, but it is not correct. There should not be any strings with "Â" in them. It is not a glyph used in Spanish.

Copy link

❌ Some Jest tests failed. Please check the logs and fix the issues before merging.

Failed Tests:

synthutils.test.js
VolumeActions.test.js
SaveInterface.test.js
musicutils.test.js
ToneActions.test.js
PitchActions.test.js
MeterActions.test.js
rubrics.test.js
DrumActions.test.js
turtle-singer.test.js
languagebox.test.js
DictActions.test.js
themebox.test.js
turtledefs.test.js
macros.test.js

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Feb 24, 2025

@walterbender Could you please let me know the encoding format used in the PO files? This will help me improve the Python script for conversion and avoid similar issues in the future.

@walterbender
Copy link
Member

Everything should be utf-8

@omsuneri
Copy link
Member

@ac-mmi I saw many tests failures please try to resolve those along with the PR only.

@omsuneri
Copy link
Member

omsuneri commented Feb 24, 2025

@walterbender do we really need this change currently as I don't feel so
as this is part of one of the projects we have in GSOC let's wait till the best ideas we get as it is very early to decide the changes at this moment also @ac-mmi I feel we need a discussion before moving to i18n right now..

@omsuneri
Copy link
Member

@walterbender do we really need this change currently as I don't feel so currently
as this is part of one of the projects we have in GSOC let's wait till the best ideas we get as it is very early to decide the changes at this moment also @ac-mmi I feel we need a discussion before moving to i18n right now..

@walterbender
Copy link
Member

It needs work.

Copy link

✅ All Jest tests passed! This PR is ready to merge.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Feb 24, 2025

@walterbender i have made changes in the _() functionality catering to i18next functions and also passing the test suits of the repo

@walterbender
Copy link
Member

ES is still broken. Now all of the non-ASCII characters are just missing.
We need to discuss a number of other details, including how to handle the Kana/Kanji situation. And language detection in general.

Copy link

✅ All Jest tests passed! This PR is ready to merge.

….js so that renderLanguageSelectIcon() works correctly
@walterbender
Copy link
Member

did you look at the quality report? it looks pretty nonsensical.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 4, 2025

@walterbender In the current report, I included:

  1. The original English sentence
  2. The back-translated English (from Catalan)
  3. A lexical similarity score and confidence value
  4. A flag to highlight low-confidence matches

@walterbender
Copy link
Member

The fields are correct. The data in the fields is not.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 4, 2025

@walterbender I'm flagging translations based on a combination of back-translation, lexical similarity with the original English, and spelling errors in the back-translation.

For example:

"Refresh your browser to change your language preference.": {
  "Catalan translation":"Actualitzeu el navegador per canviar la vostra preferència d’idioma."
  "back_translated": "Update your browser to change your language preference.",
  "lexical_score": 0.81,
  "confidence_score": 81.0,
  "flag_for_review": false
}

Here, it’s a semantic match, but the lexical similarity is a bit low due to the word "Refresh" → "Update".

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 4, 2025

@walterbender this is the new report
translation_quality_report_ca.json

@walterbender
Copy link
Member

This report is starting to make sense.

Note that we may want to always flag anything that is ambiguous (and maybe it will help us improve the translator notes as well. For example, I noticed that "duck" was not flagged for review, but do we know if it is duck the noun or duck the verb?

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 4, 2025

@walterbender for this "duck" situation i am going for POS tagging what do you say ?

@walterbender
Copy link
Member

I don't think it is reasonable to tag every string in the code with POS. It is reasonable to expect decent translator hints.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 5, 2025

@walterbender I created a separate file, en_translation_hints.json based on en.json, that pairs each English string with a short context hint. It includes things like:

  • Whether the string is a button label, an error message, or a tooltip
  • If it’s a full sentence, a short label, a placeholder, etc.
  • Clarifying terms like "Project undefined".

Context file: en_translation_hints.json

@walterbender
Copy link
Member

Is the separate file for the AI? The hints should be in the PO files.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender but for i18next i am using .json files

@walterbender
Copy link
Member

I thought that we were going to use the PO framework for translators and i18next for the implementation. I assumed that the AI translation would show up in the PO files so that the translators can review them.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender I initially converted all existing PO files into JSON(present in locales folder), and the AI-generated translations now work directly with JSON files.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender in my ai translation code both .po and .json files are created

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender I have sent you the po file for catalan in the chat with the context included

@walterbender
Copy link
Member

it is fine that the AI uses JSON and that you put the hints into JSON as long as the source of truth remains the POs -- since there is extensive PO file support in the translator community.
And we can ditch altogether the localization.ini all together.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender Also have you seen my proposal ,is there anything i am lacking or should research on ?

@walterbender
Copy link
Member

walterbender commented Apr 6, 2025 via email

@ac-mmi
Copy link
Contributor Author

ac-mmi commented Apr 6, 2025

@walterbender Thank you, I’ve submitted the proposal.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented May 5, 2025

@walterbender this is the updated context file Also the word 'in' in the translation refers to the Japanese scale ?
en_context_updated.json

@walterbender
Copy link
Member

Not sure how you are generating your context document. Ideally it would come directly from translator hints as they same context is needed by human translators.

Yes, "in" is the name of a Japanese mode.

Lots of strings are simply tagged "General UI String". Not sure the utility of that.

@ac-mmi
Copy link
Contributor Author

ac-mmi commented May 5, 2025

@walterbender I just provided you with the basic outline of the context file. Yes, it does need human intervention. I’ve been generating the context using the GPT API, which is focused on giving basic context

@walterbender
Copy link
Member

What are you basing the prompt(s) on?

@ac-mmi
Copy link
Contributor Author

ac-mmi commented May 5, 2025

@walterbender I described the application then provided chunks of the JSON translation file to the GPT API. I asked it to generate context for each string based on the app’s purpose and usage. Since the translation file is large, I’m passing it in chunks to make the process manageable.

@ac-mmi ac-mmi closed this May 8, 2025
@ac-mmi ac-mmi reopened this May 9, 2025
Copy link

github-actions bot commented Jul 8, 2025

This pull request has been open for more than 60 days without any activity. It will be closed in 3 days unless the stale label is removed or commented on.

@github-actions github-actions bot added the Stale label Jul 8, 2025
Copy link

Closed pull request due to inactivity for more than 63 days.

@github-actions github-actions bot closed this Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants