Skip to content

feat/fix: update gemini-translations#17836

Open
wackerow wants to merge 25 commits intodevfrom
gemini-v3
Open

feat/fix: update gemini-translations#17836
wackerow wants to merge 25 commits intodevfrom
gemini-v3

Conversation

@wackerow
Copy link
Copy Markdown
Member

Description

  • Code comment translation bug fixes: fence language labels with metadata (e.g., sh copy) now correctly map to their comment syntax family instead of falling through to JS default. Translated code comments now replace English originals instead of being prepended alongside them.
  • Gemini API logging: timestamped REQUEST/RESPONSE logging with model, duration, and token counts. Per-language timing and formatted token usage summary table at pipeline end. Verbose prompt logging behind VERBOSE flag.

Bug Fixes

sh copy fence label misidentified as JS (code-block-extractor.ts)

getCommentSyntax("sh copy") fell through to the default "js" family because the full string didn't match "sh". This caused // in URLs like https://github.com/... inside shell code blocks to be treated as line comments and corrupted during translation.

Fix: Strip metadata after the language name before matching: language.split(/\s+/)[0]

Translated comments appended instead of replacing English (gemini-translate.ts)

restoreComments() was called with the original code block content (English comments intact) instead of the stripped version (comments removed). This produced duplicate comment blocks -- translated Indonesian prepended above the original English.

Fix: Pass strippedCode (from extractComments()) to restoreComments() instead of block.content.

Test plan

  • Verify getCommentSyntax("sh copy") returns "shell" (new test)
  • Verify multi-line comment restore into stripped code has no English duplication (new test)
  • Verify extract-translate-restore round trip produces clean NatSpec (new test)
  • Run full test suite: npx playwright test --project=unit tests/unit/sanitizer/code-block-extractor.spec.ts
  • Run translation on a test file with sh copy fence to confirm no URL corruption

🤖 Generated with Claude Code

myelinated-wackerow and others added 2 commits March 23, 2026 21:37
Add timestamped REQUEST/RESPONSE logging to Gemini
API calls with model, duration, and token counts.
Add verbose prompt logging behind VERBOSE flag.
Add per-language timing and formatted token usage
summary table at end of pipeline run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Strip metadata from fence language tag before syntax lookup so "sh copy" maps to shell, not js (avoids treating // in URLs as comments). Use strippedCode instead of original block content when restoring translated comments to prevent duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 24, 2026

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit e686b82
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/69c897b181bdb00008b8eac0
😎 Deploy Preview https://deploy-preview-17836.ethereum.it
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 57 (🟢 up 2 from production)
Accessibility: 94 (no change from production)
Best Practices: 100 (no change from production)
SEO: 98 (🔴 down 1 from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions bot added the tooling 🔧 Changes related to tooling of the project label Mar 24, 2026
@wackerow wackerow changed the title Gemini v3 feat: update gemini-translations Mar 24, 2026
@wackerow wackerow changed the title feat: update gemini-translations feat/fix: update gemini-translations Mar 24, 2026
myelinated-wackerow and others added 18 commits March 24, 2026 16:00
- Remove duplicate "Creating Pull Request" banner
- Wrap verbose prompt output in collapsible groups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Support comma-separated exclude paths to skip
specific files or directories from translation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Relative paths passed to runSanitizer caused English
source lookups to fail silently in GitHub Actions,
skipping all English-comparison fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
JSX attribute translation runs after the sanitizer
and can reintroduce issues. Add a second sanitizer
pass after Phase 4 to catch these.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Replace per-language sequential processing with a
single shared Gemini concurrency pool. Languages
dispatch files simultaneously; commits serialized
via SharedCommitter then squashed one-per-language.
Bump default concurrency from 6 to 16. Add parallel
JSX attribute translation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Pass English content map (fetched from BASE_BRANCH
via GitHub API) to the sanitizer instead of reading
from disk. Ensures English comparison matches the
same branch used for translation, not whatever the
CI runner checked out. Disk fallback preserved for
local/CLI usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Restore closing code fence indentation lost during
code block extraction/restoration. Explicitly instruct
Gemini to translate frontmatter values and transliterate
author names for non-Latin scripts. Add validation to
reject untranslated frontmatter. Collapse blank lines
left by multi-line comment restoration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The prompt told Gemini to keep `lang` unchanged, preserving
`lang: en` from the English source. Now explicitly instructs
it to set the lang field to the target language code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Add fixFrontmatterLang() as a deterministic backup that forces
the frontmatter `lang` field to match the locale derived from
the file path (public/content/translations/LANG_CODE/**/*.md).

- 10 unit tests covering edge cases
- Exported via _testOnly for testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Gemini was dropping CODE_BLOCK placeholders and hallucinating
replacement code from training data, producing wrong language
tags and modified code content.

- Prompt: tell Gemini placeholders are sacrosanct
- Prompt: fallback rules if a real fence slips through
- Validation: reject output with missing placeholders
- Validation: reject output with hallucinated code fences

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
warnCodeFenceContentDrift was flagging every code block with
translated comments as "differs from English" -- noise that
obscured real code corruption. Now strips comments (// /* */
# and docstrings) before comparing, so only functional code
differences trigger the warning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
When files fail to translate, post the list as a comment
on the newly created PR for easy follow-up tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Adds batching for large JSON translation files (~100 keys
per Gemini request) and pre-translation HTML placeholder
extraction/restoration for values with embedded HTML tags.

This targets the 8+ language failures on glossary.json (406
keys, 595 HTML tags) and learn-quizzes.json (696 keys).

Also updates the translation roadmap with the agreed-upon
priority plan for v3 fixes and v4 infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Adds BLOCK_NONE safety settings for all harm categories to
prevent Gemini from silently returning empty responses for
educational blockchain content (mining, attacks, etc.).

Inspects response candidates, finishReason, and safetyRatings
before accessing response.text, logging detailed diagnostics
when non-STOP finish reasons are detected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@github-actions github-actions bot added the documentation 📖 Change or add documentation label Mar 28, 2026
myelinated-wackerow and others added 3 commits March 28, 2026 04:35
Use HarmCategory and HarmBlockThreshold enums from @google/genai
instead of plain strings. Fixes TS2322 type error in CI where
the SDK types are available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The sanitizer was pushing absolute filesystem paths into
changedFiles, causing GitHub tree API to reject them with
"tree.path cannot start with a slash". Uses path.relative()
to convert to repo-relative paths, matching the pattern
already used for logging in the same file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Technical titles like "Ethash", "JSON-RPC API", "PeerDAS"
are legitimately kept in English. The previous check failed
if either title or description matched English. Now only
fails when BOTH are identical, catching genuinely untranslated
output while allowing technical/proper-noun titles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Same slash bug as the sanitizer -- absolute filesystem paths
passed to GitHub tree API. Applies path.relative() at the
commit point in jsx-translation.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation 📖 Change or add documentation tooling 🔧 Changes related to tooling of the project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants