-
Notifications
You must be signed in to change notification settings - Fork 18
Add translation support for LaTeX section, caption, item, and table commands #890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add translation support for LaTeX section, caption, item, and table commands #890
Conversation
|
Thank you so much for contributing to Blueprints! Now that you've created your pull request, please don't go away; take a look at the bottom of this page for the automated checks that should already be running. If they pass, great! If not, please click on 'Details' and see if you can fix the problem they've identified. A maintainer should be along shortly to review your pull request and help get it added! |
📝 WalkthroughWalkthroughMoved and expanded LaTeX translation logic into a new Changes
Sequence Diagram(s)(omitted — changes are internal to translation logic and do not introduce a multi-component sequential flow suitable for a diagram) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds translation support for additional LaTeX commands beyond the existing text formatting commands. It extends the translation functionality to handle structural elements like sections, captions, list items, and table content, enabling more comprehensive document translation.
Key changes:
- Added translation support for section commands (\section, \subsection, \subsubsection, \title)
- Added translation support for captions (\caption) and list items (\item)
- Added translation support for table cell content
- Updated decimal separator logic to protect figure environments from period-to-comma conversion
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/language/test_translate.py | Added comprehensive test cases covering figure captions, section commands, table translation, and list item translation |
| blueprints/language/translate.py | Implemented extraction and replacement methods for section, caption, item, and table commands; updated decimal separator protection logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #890 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 404 404
Lines 12798 12933 +135
==========================================
+ Hits 12798 12933 +135 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
blueprints/language/translate.py (1)
186-211: Consider aligning method signature with other_replace_*methods.
_replace_text_commandsimplicitly usesself.original, while the new methods (_replace_section_commands,_replace_caption_commands, etc.) accepttextas a parameter. This works in the current flow but creates an inconsistent API surface.🔎 Suggested refactor for consistency
- def _replace_text_commands(self, replacements: list) -> str: + def _replace_text_commands(self, text: str, replacements: list) -> str: # ... docstring updates ... - return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, self.original) + return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, text)Then update the call site in
_translate_latex:- replaced = self._replace_text_commands(text_translations) if texts else self.original + replaced = self._replace_text_commands(self.original, text_translations) if texts else self.original
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
blueprints/language/translate.pytests/language/test_translate.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
🔇 Additional comments (11)
blueprints/language/translate.py (7)
213-240: LGTM!The implementation correctly handles section command replacement with proper regex matching for innermost content.
242-268: LGTM!The caption replacement logic is straightforward and correctly handles the
\caption{}command.
270-299: LGTM!The item replacement correctly preserves the whitespace immediately after
\item. The regex pattern appropriately stops at the next LaTeX command.
385-396: LGTM!Good addition to protect figure blocks from decimal separator conversion. This prevents unintended modifications to file paths and other content within figures.
415-425: LGTM!Clean extraction with proper handling of nested text commands by excluding content that contains nested patterns.
427-486: LGTM!The extraction methods are well-structured and consistent. Note that
_extract_table_cellscorrectly handles missing\midruleat line 470, which should be mirrored in_replace_table_cells.
501-544: LGTM!The translation orchestration correctly extracts all text types, performs bulk translation, and applies replacements in the proper sequence. The index-based slicing to split translations back to their categories is well-implemented.
tests/language/test_translate.py (4)
112-123: LGTM!Good test coverage for figure caption translation with appropriate guarding against network-dependent translation failures.
125-151: LGTM!Comprehensive test coverage for all sectioning command types (
\title,\section,\subsection,\subsubsection) with consistent structure and proper failure guards.
153-181: Good coverage, but consider adding a test for tables without\midrule.The tests cover tables with content and empty tables, but a table without a
\midruleseparator would trigger theIndexErroridentified in_replace_table_cells. Adding such a test would catch the bug.def test_table_without_midrule(self) -> None: """Test TranslateLatex with table that has no midrule.""" example_latex = ( r"\begin{tabular}{ll} A & B \\ C & D \\ \end{tabular}" ) # This should not crash result_nl = TranslateLatex(example_latex, "nl") assert result_nl.translated is not None
183-195: LGTM!Good test coverage for both
\itemizeand\enumeratelist environments. The expected output correctly reflects the whitespace handling behavior of the replacement regex.
…ze and enumerate translations
…nline math, and add corresponding tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/language/translate.py
🔇 Additional comments (5)
blueprints/language/translate.py (5)
213-268: LGTM - Section and caption replacement methods follow consistent patterns.The
_replace_section_commandsand_replace_caption_commandsmethods follow the same clean pattern as_replace_text_commands, using regex substitution with a closure to track replacement indices. The approach is straightforward and appropriate for non-nested LaTeX commands.
365-395: LGTM - Decimal separator now correctly handles inline math and display equations.The refactored
_check_decimal_separatorproperly extends period-to-comma replacement to inline math ($...$) and\begin{equation}...\end{equation}environments, while correctly preserving periods inside\tag{}commands via the_replace_excluding_taghelper.
414-487: LGTM - Extraction methods implement consistent filtering patterns.The five extraction methods follow clean, consistent patterns:
- Use
_extract_balanced_contentfor robust brace matching- Apply appropriate filters (e.g., line 422 excludes nested text commands, lines 455 and 482 use negative lookahead to skip non-text LaTeX commands)
- Line 471 safely handles tables without
\midruleviaparts[1] if len(parts) >= 2 else tabular_contentThe filtering logic ensures only translatable plain-text content is extracted while preserving LaTeX structure.
489-545: Translation orchestration is well-structured, but sequential replacement order introduces risk.The orchestration logic is clean and efficient:
- Bulk extraction (lines 503-507): All translatable content extracted from
self.originalin one pass- Bulk translation (line 514): Single API call translates all content
- Precise slicing (lines 518-527): Translations correctly split back using index arithmetic—logic verified, no off-by-one errors
- Sequential replacement (lines 530-542): Applies translations in order
However, the sequential replacement order creates the synchronization issue flagged in the
_replace_table_cellscomment above. If items or table cells extracted fromself.originalcontain\text{}commands, those\text{}regions will already be translated when_replace_item_commandsor_replace_table_cellsruns, causing the original extracted strings to no longer match.Recommendation: Verify with tests or usage patterns that items and table cells in BP LatexReport output don't contain
\text{},\txt{}, etc., or refactor to extract only plain-text portions (excluding already-handled command content).
270-299: Item replacement regex pattern prevents capturing inline formatting, contradicting extraction filter logic.The regex pattern
[^\\]+?in line 299 only captures item content up to the first backslash. However, the extraction filter (line 453) explicitly allows whitelisted formatting commands (\text{},\txt{},\textbf{},\textit{}), creating an inconsistency: the filter suggests items can contain inline formatting, but the regex cannot properly extract it.For example,
\item Some text \text{formula}would extract only "Some text ", losing the\text{formula}part. The test cases (lines 185, 192) only contain plain text items without inline formatting, so this gap is not currently exercised. If items need to support inline formatting, the regex pattern should be adjusted to capture the full content including whitelisted commands.
…e processed before text commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
blueprints/language/translate.py (1)
186-211: Inconsistent method signature causes replacement chain to break.The
_replace_text_commandsmethod differs from all other_replace_*methods by not accepting atextparameter and always operating onself.original(line 211). This breaks the replacement chain in_translate_latexwhere line 537 assigns the result toreplaced, inadvertently discarding all prior replacements (table cells, processed at line 534).🔎 Proposed fix
- def _replace_text_commands(self, replacements: list) -> str: + def _replace_text_commands(self, text: str, replacements: list) -> str: r""" Replace all \text{...}, \txt{...}, \textbf{...}, and \textit{...} in the string with the corresponding replacements. Only captures the innermost text content when commands are nested. Parameters ---------- + text : str + The LaTeX string to process. replacements : list[str] The list of replacement strings. Returns ------- str The string with \text{...}, \txt{...}, \textbf{...}, and \textit{...} replaced by the corresponding replacements. """ replacement_index = 0 def _repl(match: re.Match) -> str: nonlocal replacement_index command = match.group(1) # Captures 'text', 'txt', 'textbf', or 'textit' replacement = replacements[replacement_index] replacement_index += 1 return f"\\{command}{{{replacement}}}" # Apply all replacements in one pass (since we only extract innermost text) - return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, self.original) + return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, text)And update the call site at line 537:
- replaced = self._replace_text_commands(text_translations) + replaced = self._replace_text_commands(replaced, text_translations)
♻️ Duplicate comments (1)
blueprints/language/translate.py (1)
301-363: Potential synchronization issue remains despite reordering.While lines 353-355 safely assume
\midruleis present (per learnings), a subtle duplication persists:
_extract_table_cells(line 486) extracts full cells including\text{}commands, e.g.,"Length: \text{m}"_extract_text_commands(line 423) separately extracts"m"from the same\text{m}command- Both are translated in the bulk operation (line 514-515)
- Even after the fix to line 537, when table cells are replaced first, the cell becomes
"Lengte: \text{m}"(translated)- Then
_replace_text_commandstries to match and replace\text{m}commands, but the context may have changedThis could lead to inconsistent translations or missed replacements. Consider either:
- Option A: Modify
_extract_table_cellsto extract only plain-text portions (excluding\text{}content) to avoid duplication, OR- Option B: After replacing table cells, skip
_replace_text_commandsfor content within tablesBased on learnings, BP uses
\text{}as a semantic marker, so tables generated by BP may frequently contain such commands, making this scenario non-trivial.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
- GitHub Check: Code coverage report
🔇 Additional comments (10)
blueprints/language/translate.py (10)
213-240: LGTM!The implementation correctly handles section command replacements with proper parameter passing and regex pattern matching. The pattern at line 240 matches the extraction pattern in
_extract_section_commands, ensuring consistency.
242-268: LGTM!The caption replacement logic is correct and follows the same pattern as other replacement methods.
270-299: LGTM!The item replacement logic correctly handles simple list items. The regex pattern at line 299 is consistent with the extraction pattern at line 449, ensuring extraction and replacement alignment. Note that the pattern treats backslashes as command delimiters, so items with escaped characters (e.g.,
\%) would be split at those points, but this appears intentional for the BP use case.
365-395: LGTM!The expanded decimal separator logic correctly handles inline math (line 394) and equation environments (line 395) while preserving
\tag{}content via the_replace_excluding_taghelper. The negative lookbehind(?<!\\)at line 394 properly excludes escaped dollar signs.
414-424: LGTM!The extraction logic correctly identifies innermost text commands using balanced brace matching and filters out nested commands at line 422. This ensures only leaf-level content is extracted for translation.
426-434: LGTM!The section extraction logic is straightforward and correct, using balanced brace matching to extract section titles.
436-444: LGTM!Caption extraction is implemented correctly using balanced brace matching.
446-460: LGTM with caveat.The item extraction logic correctly handles simple list items and preserves trailing spaces (line 453). Note that items containing
\text{}commands face the same duplication issue as table cells (already flagged separately), where both the full item text and the\text{}content are extracted for translation.
462-487: LGTM!Table cell extraction is implemented robustly, with line 470-471 gracefully handling tables without
\midrule. The negative lookahead at line 482 correctly allows\text{}commands while filtering out other LaTeX constructs. Note that the duplication concern with\text{}content (already flagged under_replace_table_cells) applies here as well.
489-549: Well-structured orchestration with one critical bug.The translation orchestration is well-designed:
- Lines 503-507: Extract all LaTeX construct types
- Line 514: Efficient bulk translation of all texts
- Lines 517-527: Clean slicing to distribute translations
- Lines 529-530: Clear comment explaining replacement order intent
However, the critical bug at line 537 (already flagged) prevents the intended ordering from working correctly. Once that's fixed, the orchestration will correctly chain replacements. The synchronization concern with
\text{}commands in table cells/items (also flagged separately) remains a broader design issue beyond this method.After applying the fix at line 537, verify that table cells containing
\text{}commands translate correctly by testing with:from blueprints.utils.report import LatexReport latex = LatexReport().add_table(headers=["Column"], rows=[["Length: \\text{m}"]]).to_document()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/language/translate.py
🔇 Additional comments (5)
blueprints/language/translate.py (5)
186-214: LGTM! Signature change properly implements parameterized text processing.The method now correctly accepts and operates on the
textparameter instead ofself.original, which allows it to be used in the sequential replacement pipeline at line 539.
215-302: LGTM! New replacement methods follow consistent patterns.The three new replacement methods (
_replace_section_commands,_replace_caption_commands,_replace_item_commands) correctly follow the same pattern as_replace_text_commands, using closures to track replacement indices and regex patterns appropriate for each LaTeX construct.
367-397: LGTM! Decimal separator replacement correctly handles inline math and equations.The updated
_check_decimal_separatorproperly:
- Excludes
\tag{}content from period-to-comma replacement (lines 388-393)- Handles inline math with negative lookbehind for unescaped
$(line 396)- Handles equation environments with DOTALL flag (line 397)
416-462: LGTM! Extraction methods correctly extract translatable content from various LaTeX constructs.The new extraction methods properly:
- Use
_extract_balanced_contentfor handling nested braces- Filter for innermost content to avoid double-extraction (line 424)
- Use consistent negative lookahead patterns to allow
\text{}/\txt{}/\textbf{}/\textit{}while excluding other commands (lines 457, 459)- Handle edge cases like empty content and nested structures
491-551: Well-structured translation pipeline with efficient bulk processing.The updated
_translate_latexmethod correctly:
- Extracts content from all supported LaTeX constructs (lines 504-509)
- Performs a single bulk translation operation for efficiency (line 517)
- Accurately splits translations using index arithmetic (lines 520-529)
- Applies replacements in a defined order (lines 535-548)
The comment at lines 531-532 demonstrates awareness of the replacement order importance, though the critical synchronization issue flagged separately still requires resolution.
… plain text extraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
blueprints/language/translate.py (1)
303-368: Synchronization fix looks good, but minor inconsistency with extraction method.The synchronization issue from past reviews is correctly addressed:
- Line 489 in
_extract_table_cellsextracts only plain text portions (excluding\text{}commands)- Line 335-340 here replaces only those plain text portions
- This prevents the mismatch that occurred when table cells were replaced after text commands modified the
\text{}contentHowever, there's a minor inconsistency: the extraction method at line 476 defensively checks
len(parts) >= 2before accessing the content after\midrule, but lines 358-360 here directly accessparts[1]andparts[2]without checking. While the learning confirms LatexReport tables always include\midrule, consider adding the same defensive check for consistency.🔎 Optional: Add defensive length check for consistency
# Skip header lines (before \midrule) parts = re.split(r"(\\midrule)", tabular_content, maxsplit=1) - header_part = parts[0] - midrule = parts[1] - content_part = parts[2] - - # Process rows in content part - processed_content = re.sub(r"([^\\].*?)(?=\\\\|\\bottomrule|\\end)", _repl_row, content_part, flags=re.DOTALL) - - return tabular_start + header_part + midrule + processed_content + tabular_end + if len(parts) >= 3: + header_part = parts[0] + midrule = parts[1] + content_part = parts[2] + # Process rows in content part + processed_content = re.sub(r"([^\\].*?)(?=\\\\|\\bottomrule|\\end)", _repl_row, content_part, flags=re.DOTALL) + return tabular_start + header_part + midrule + processed_content + tabular_end + else: + # No \midrule found (shouldn't happen with LatexReport, but be defensive) + return match.group(0) # Return unchangedBased on learnings about LatexReport table structure.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
🔇 Additional comments (5)
blueprints/language/translate.py (5)
186-214: Good fix: signature now accepts text parameter.The updated signature correctly addresses the past review comment about operating on a passed text parameter instead of
self.original. The replacement logic preserves the LaTeX command structure while replacing content.
215-301: LGTM: New replacement methods follow consistent patterns.The three new replacement methods (
_replace_section_commands,_replace_caption_commands,_replace_item_commands) follow the established pattern from_replace_text_commands. The item method correctly preserves whitespace (line 297), and all methods appropriately use the nonlocal replacement index to track progress through the replacements list.
370-400: LGTM: Decimal separator handling correctly extended.The update properly extends decimal separator replacement to inline math (
$...$) and display equations (\begin{equation}...\end{equation}), while correctly preserving\tag{}content via the helper function. The negative lookbehind(?<!\\)at line 399 correctly avoids matching escaped dollar signs.
419-492: LGTM: Extraction methods correctly isolate translatable content.All five extraction methods properly:
- Use
_extract_balanced_contentfor brace matching- Apply appropriate filtering (e.g., line 427 excludes nested text commands, line 460 excludes complex LaTeX in items)
- Handle the synchronization issue: line 489 extracts only plain text portions from table cells, excluding
\text{}commands that are separately handled by_extract_text_commandsThe filtering patterns are consistent between items (line 460) and tables (line 487), both using the same negative lookahead pattern.
494-554: LGTM: Orchestration correctly implements bulk translation with proper ordering.The method correctly:
- Extracts all translatable content from multiple LaTeX constructs (lines 508-512)
- Performs a single bulk translation operation (line 520) for efficiency
- Splits translations back by count (lines 523-532)
- Critically: applies replacements in the correct order (lines 538-542) - table cells before text commands to prevent the synchronization issue where text commands within cells would be replaced first, causing mismatches
The replacement order at lines 538-551 ensures that when a table cell contains
\text{m}, the entire cell content (including\text{m}) is replaced first, preventing the later text command replacement from modifying it.
…h-new-report-functions
…h-new-report-functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @blueprints/utils/language/translate.py:
- Around line 207-218: The coroutine handling is inverted: calling
loop.run_until_complete() when get_running_loop() succeeds will raise
RuntimeError; instead, when asyncio.iscoroutine(translations) is true, call
asyncio.get_running_loop() and if that raises RuntimeError (no running loop) use
translations = asyncio.run(translations), otherwise (a loop is running) submit
the coroutine to that loop with asyncio.run_coroutine_threadsafe(translations,
loop).result() and assign its result to translations; update the block around
the translations variable (the coroutine handling that currently uses
loop.run_until_complete) accordingly.
🧹 Nitpick comments (1)
blueprints/utils/language/translate.py (1)
346-348: Potential edge case with item content matching.The regex pattern
\\item(\s+)([^\\]+?)(?=\\|$)uses a non-greedy match that stops at any backslash. This could prematurely truncate item content if the text contains escaped characters like\%or other LaTeX escapes within the item text.Consider whether escaped characters in item content are a realistic use case. If so, the pattern may need adjustment to allow specific escape sequences.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
blueprints/utils/language/translations.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
blueprints/utils/language/__init__.pyblueprints/utils/language/translate.pytests/language/test_translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/utils/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/utils/language/translate.py
🧬 Code graph analysis (1)
tests/language/test_translate.py (1)
blueprints/utils/language/translate.py (2)
LatexTranslator(21-628)text(606-617)
🔇 Additional comments (16)
tests/language/test_translate.py (6)
1-16: LGTM!Imports are clean and the test class setup is appropriate. The use of
os.path.joinwith__file__for locating the test CSV is a reliable approach.
18-66: LGTM!The CSV-based translation tests are comprehensive, covering basic translations, wildcards, special characters, and edge cases like mismatched wildcard counts. The test structure is clear and assertions are appropriate.
254-276: LGTM!The itemize and enumerate translation tests are well-structured with proper mocking. The expected outputs correctly preserve the whitespace after
\itemcommands, which addresses the concerns from past review comments about inconsistent whitespace handling.
284-310: LGTM!The non-English source language test is well-designed. It properly creates a temporary CSV with multiple language pairs, tests various translation directions, and ensures cleanup in a
finallyblock. This provides good coverage for thesource_languageparameter.
312-340: LGTM!The error handling tests properly verify graceful fallback behavior when Google Translate fails. Using
side_effectto simulate exceptions is the correct approach, and asserting that the original text is preserved demonstrates robust error handling.
101-105: No issue found. The test correctly uses the wildcard pattern system.The CSV contains
"With formula **:" → "Met formule **:"with**as a wildcard placeholder. TheLatexTranslator._wildcard_match()method (lines 131-152) properly handles this pattern: it splits the source string by**, builds a regex to match fixed parts with variable content, and replaces**in the target with the captured groups. The test input"With formula 6.83:"correctly matches this pattern and translates to"Met formule 6.83:"without requiring mocks or CSV literals. The implementation works as designed.Likely an incorrect or invalid review comment.
blueprints/utils/language/translate.py (10)
1-33: LGTM!The imports are appropriate, and the
ImportErrorhandling provides clear installation instructions. The class docstring effectively communicates the translation behavior, fallback mechanism, and important warnings about Google Translate usage.
35-83: LGTM!The constructor is well-designed with sensible defaults, comprehensive documentation, and lazy translation evaluation. The CSV format documentation in the docstring is particularly helpful for users providing custom translation files.
85-129: LGTM!The CSV loading logic is robust with proper error handling for missing files, missing language columns, and short rows. The "-" handling for preserving source text is a useful feature.
131-152: LGTM!The wildcard matching implementation is solid. Using
re.escapeon the fixed parts ensures special regex characters are handled correctly. The warning for mismatched wildcard counts prevents silent failures.
404-413: Table processing assumes\midruleis always present.As discussed in past reviews and recorded in learnings, this assumption is valid because tables are generated by BP LatexReport which always includes
\midrule. The code is correctly scoped for this input format. Based on learnings, this is intentional design.
418-448: LGTM!The decimal separator conversion is well-implemented. The language lists are comprehensive, and the logic correctly preserves
\tag{}content while converting periods to commas in mathematical expressions.
450-466: LGTM!The balanced brace extraction is a clean implementation that correctly handles nested braces. This is essential for properly parsing LaTeX commands with complex content.
500-514: Consistent handling of text command exclusions.The pattern
\\(?!text\{|txt\{|textbf\{|textit\{)is used consistently across extraction and replacement methods to allow text formatting commands while filtering out other LaTeX commands. This is a reasonable approach for the use case.
583-600: LGTM!The replacement ordering is well-documented. Processing table cells before text commands prevents index misalignment when cells contain
\text{}commands. This is a thoughtful design decision.
605-628: LGTM!The lazy evaluation pattern with
_translationcaching is efficient and the__str__implementation provides a clean interface for using the translator in string contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 5 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
blueprints/utils/language/translate.py (2)
97-97: Simplify the redundant phrasing.The phrase "may probably not contain" is grammatically awkward and redundant.
🔎 Proposed fix
- This file may probably not contain translations for your specific use case and language pair. + This file may not contain translations for your specific use case and language pair.
487-503: Consider handling escaped braces in balanced content extraction.The
_extract_balanced_contentmethod doesn't distinguish between structural braces ({,}) and escaped literal braces (\{,\}). If LaTeX content contains escaped braces, the depth counting will be incorrect, potentially causing extraction to stop prematurely or continue too far.This is unlikely to cause issues with Blueprints-generated LaTeX, but could fail with user-provided content containing literal braces.
🔎 Proposed fix
def _extract_balanced_content(text: str, start_pos: int) -> tuple[str, int]: r""" Extract content from balanced braces starting at start_pos. Returns (content, end_position). """ depth = 0 i = start_pos while i < len(text): - if text[i] == "{": - depth += 1 - elif text[i] == "}": - depth -= 1 - if depth == 0: - break + ch = text[i] + if ch in "{}" : + # Check if this brace is escaped (preceded by odd number of backslashes) + backslash_count = 0 + j = i - 1 + while j >= 0 and text[j] == "\\": + backslash_count += 1 + j -= 1 + # If odd number of backslashes, the brace is escaped + if backslash_count % 2 == 0: # Not escaped + if ch == "{": + depth += 1 + else: # ch == "}" + depth -= 1 + if depth == 0: + break i += 1 return text[start_pos + 1 : i], i
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blueprints/utils/language/translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
Applied to files:
blueprints/utils/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.
Applied to files:
blueprints/utils/language/translate.py
🔇 Additional comments (1)
blueprints/utils/language/translate.py (1)
240-255: No issue found—googletrans API behavior is correct.The code correctly assumes that
translator.translate(missing, dest=...)returns a list ofTranslatedobjects, each with a.textattribute. This matches the documented googletrans 4.0.2+ behavior and is confirmed by the existing test mocks in the codebase. Line 255's list comprehension[tr.text for tr in translations]is the correct way to extract translated text.
Description
update for new report functions. also deleted one logging as there is no need to give a warning when csv doesnt have header; Translate will do just fine.
Fixes #889
Type of change
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.