-
-
Notifications
You must be signed in to change notification settings - Fork 670
Description
Steps To Reproduce
- Navigate to some Chinese language websites
- Enable page translation with default settings (auto-detect source language)
- Observe that only portions of the page are translated while many Chinese text sections remain untranslated
Expected Result
The entire page should be translated from Chinese to the target language (e.g., English), with all Chinese text converted to the target language.
Actual Result
Only partial page translation occurs. Large portions of Chinese text remain untranslated. When examining the console output, Google's new API (translate-pa.googleapis.com/v1/translateHtml) returns detection results in response[1] where most Chinese text chunks are incorrectly identified as "en" (English) instead of "zh-CN" (Chinese Simplified). Text detected as "en" is not translated since the API assumes it's already in English.
Screenshots or Videos
Additional Context
Google's new translateHtml API has faulty language detection when sourceLanguage: "auto" is used. The per-chunk detection in the response array incorrectly classifies Chinese characters as English.
Workaround that confirms the issue: Manually setting the page language to "Chinese Simplified" in extension settings completely resolves the problem, as this bypasses auto-detection.
Proposed fix: Override faulty auto-detection by implementing Chinese character detection before sending requests:
// In translationService.js, modify the Google service translate method:
async translate(sourceLanguage, targetLanguage, sourceArray2d, dontSaveInPersistentCache, dontSortResults = false) {
// Language replacements
const replacements = [{
search: "prs",
replace: "fa-AF"
}];
replacements.forEach(r => {
if (targetLanguage === r.search) targetLanguage = r.replace;
if (sourceLanguage === r.search) sourceLanguage = r.replace;
});
await GoogleHelper_v2.findAuth();
if (!GoogleHelper_v2.translateAuth) return;
// Override detection for Chinese text when sourceLanguage is "auto"
if (sourceLanguage === "auto") {
// Sample some text to check if it's Chinese
const sampleTexts = sourceArray2d.slice(0, 3).map(arr => arr.join(" ")).join(" ");
if (this.isChineseText(sampleTexts)) {
console.log("Detected Chinese text, overriding sourceLanguage to zh-CN");
sourceLanguage = "zh-CN";
}
}
return await super.translate(sourceLanguage, targetLanguage, sourceArray2d, dontSaveInPersistentCache, dontSortResults);
}
// Add Chinese detection helper method
isChineseText(text) {
// Check for Chinese characters (CJK Unified Ideographs)
const chineseRegex = /[\u4e00-\u9fff]/;
const chineseChars = (text.match(/[\u4e00-\u9fff]/g) || []).length;
const totalChars = text.replace(/\s/g, '').length;
// If more than 30% of non-space characters are Chinese, consider it Chinese text
return totalChars > 0 && (chineseChars / totalChars) > 0.3;
}
Note: This issue may affect other languages with non-Latin scripts. Consider using the page's detected language (originalTabLanguage) instead of per-chunk auto-detection.
Operating System
Linux
Operating System Version
Linux Mint, CachyOS, Windows 10
Web Browser
Brave
Browser Version
No response
Extension Version
All versions using the new google API.
What are you doing?
- I am reporting a bug