Skip to content

[Bug] Google API incorrectly detects Chinese (and possibly other languages) #962

@Alex-Solid

Description

@Alex-Solid

Steps To Reproduce

  1. Navigate to some Chinese language websites
  2. Enable page translation with default settings (auto-detect source language)
  3. Observe that only portions of the page are translated while many Chinese text sections remain untranslated

Expected Result

The entire page should be translated from Chinese to the target language (e.g., English), with all Chinese text converted to the target language.

Actual Result

Only partial page translation occurs. Large portions of Chinese text remain untranslated. When examining the console output, Google's new API (translate-pa.googleapis.com/v1/translateHtml) returns detection results in response[1] where most Chinese text chunks are incorrectly identified as "en" (English) instead of "zh-CN" (Chinese Simplified). Text detected as "en" is not translated since the API assumes it's already in English.

Screenshots or Videos

Image Image

Additional Context

Google's new translateHtml API has faulty language detection when sourceLanguage: "auto" is used. The per-chunk detection in the response array incorrectly classifies Chinese characters as English.

Workaround that confirms the issue: Manually setting the page language to "Chinese Simplified" in extension settings completely resolves the problem, as this bypasses auto-detection.

Proposed fix: Override faulty auto-detection by implementing Chinese character detection before sending requests:

// In translationService.js, modify the Google service translate method:
async translate(sourceLanguage, targetLanguage, sourceArray2d, dontSaveInPersistentCache, dontSortResults = false) {
  // Language replacements
  const replacements = [{
    search: "prs",
    replace: "fa-AF"
  }];
  
  replacements.forEach(r => {
    if (targetLanguage === r.search) targetLanguage = r.replace;
    if (sourceLanguage === r.search) sourceLanguage = r.replace;
  });

  await GoogleHelper_v2.findAuth();
  if (!GoogleHelper_v2.translateAuth) return;

  // Override detection for Chinese text when sourceLanguage is "auto"
  if (sourceLanguage === "auto") {
    // Sample some text to check if it's Chinese
    const sampleTexts = sourceArray2d.slice(0, 3).map(arr => arr.join(" ")).join(" ");
    if (this.isChineseText(sampleTexts)) {
      console.log("Detected Chinese text, overriding sourceLanguage to zh-CN");
      sourceLanguage = "zh-CN";
    }
  }

  return await super.translate(sourceLanguage, targetLanguage, sourceArray2d, dontSaveInPersistentCache, dontSortResults);
}

// Add Chinese detection helper method
isChineseText(text) {
  // Check for Chinese characters (CJK Unified Ideographs)
  const chineseRegex = /[\u4e00-\u9fff]/;
  const chineseChars = (text.match(/[\u4e00-\u9fff]/g) || []).length;
  const totalChars = text.replace(/\s/g, '').length;
  
  // If more than 30% of non-space characters are Chinese, consider it Chinese text
  return totalChars > 0 && (chineseChars / totalChars) > 0.3;
}

Note: This issue may affect other languages with non-Latin scripts. Consider using the page's detected language (originalTabLanguage) instead of per-chunk auto-detection.

Operating System

Linux

Operating System Version

Linux Mint, CachyOS, Windows 10

Web Browser

Brave

Browser Version

No response

Extension Version

All versions using the new google API.

What are you doing?

  • I am reporting a bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions