Skip to content

Conversation

@GerjanDorgelo
Copy link
Contributor

@GerjanDorgelo GerjanDorgelo commented Jan 3, 2026

Description

update for new report functions. also deleted one logging as there is no need to give a warning when csv doesnt have header; Translate will do just fine.

Fixes #889

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • LaTeX translation now covers sections, subsections, captions, list items, table cells and equations via a coordinated bulk extract→translate→replace flow.
    • New public translator class provides a simple API to translate LaTeX using a CSV dictionary (wildcard support) with fallback translation.
  • Bug Fixes

    • Removed noisy missing-language warning and improved decimal-separator handling inside inline/display math (preserves tags).
  • Tests

    • Expanded tests for captions, titles/sections, lists, tables (including empty), equations, non-English CSVs and failure modes.

✏️ Tip: You can customize this high-level summary in your review settings.

@GerjanDorgelo GerjanDorgelo self-assigned this Jan 3, 2026
Copilot AI review requested due to automatic review settings January 3, 2026 17:44
@GerjanDorgelo GerjanDorgelo linked an issue Jan 3, 2026 that may be closed by this pull request
2 tasks
@github-actions
Copy link

github-actions bot commented Jan 3, 2026

Thank you so much for contributing to Blueprints!
Your contributions help thousands of engineers work more efficiently and accurately.

Now that you've created your pull request, please don't go away; take a look at the bottom of this page for the automated checks that should already be running. If they pass, great! If not, please click on 'Details' and see if you can fix the problem they've identified. A maintainer should be along shortly to review your pull request and help get it added!

@coderabbitai
Copy link

coderabbitai bot commented Jan 3, 2026

📝 Walkthrough

Walkthrough

Moved and expanded LaTeX translation logic into a new LatexTranslator utility, added extractors/replacers for sections, captions, items, and table cells, rewired bulk-translation flow to collect/translate/reapply constructs (including math-aware decimal handling), and removed the CSV-header missing-language warning.

Changes

Cohort / File(s) Summary
New utility + public API
blueprints/utils/language/translate.py
Added LatexTranslator class with __init__, text property, and __str__. Implements CSV loading (optional custom_csv), wildcard support, bulk translation across LaTeX constructs, fallback/error handling, and decimal-separator adjustments.
Core translation updates
blueprints/language/translate.py
Updated _replace_text_commands signature to (_self, text: str, replacements: list), removed CSV-header missing-language warning, and added multiple _extract_* and _replace_* helpers for sections, captions, items, and table cells; _translate_latex now collects constructs, translates in bulk, and reapplies translations in a defined order.
Tests — adapter and expanded coverage
tests/language/test_translate.py
Replaced TranslateLatex usage with LatexTranslator (constructor arg renamed custom_csv_pathcustom_csv), renamed test class, and expanded tests to cover captions, titles/sections/subsections, tables (including empty), itemize/enumerate, equations, non-English CSVs, and mocked translation failures.

Sequence Diagram(s)

(omitted — changes are internal to translation logic and do not introduce a multi-component sequential flow suitable for a diagram)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Hop, hop, translate we go,
Sections, captions, cells in tow,
Bulk of phrases whirl and weave,
Math and commas tucked up at eve,
Rabbit smiles — LaTeX in flow!

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding translation support for LaTeX section, caption, item, and table commands, which matches the core functionality additions documented in the raw summary.
Description check ✅ Passed The description is mostly complete with issue reference, type of change selection, and checklist items marked as done, though the description itself is brief and could provide more technical detail about the implementation.
Linked Issues check ✅ Passed The PR implementation fully addresses the requirements from issue #889: adds translation support for (sub)sections, titles [#889], captions [#889], table content [#889], and list items [#889], as documented in the raw summary changes.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: new LaTeX translation helpers, updated test coverage, and removed unnecessary CSV header warning. No unrelated modifications detected outside the scope of supporting new report functions.
Docstring Coverage ✅ Passed Docstring coverage is 80.95% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds translation support for additional LaTeX commands beyond the existing text formatting commands. It extends the translation functionality to handle structural elements like sections, captions, list items, and table content, enabling more comprehensive document translation.

Key changes:

  • Added translation support for section commands (\section, \subsection, \subsubsection, \title)
  • Added translation support for captions (\caption) and list items (\item)
  • Added translation support for table cell content
  • Updated decimal separator logic to protect figure environments from period-to-comma conversion

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tests/language/test_translate.py Added comprehensive test cases covering figure captions, section commands, table translation, and list item translation
blueprints/language/translate.py Implemented extraction and replacement methods for section, caption, item, and table commands; updated decimal separator protection logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (143c2d4) to head (d97c8fe).

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #890    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          404       404            
  Lines        12798     12933   +135     
==========================================
+ Hits         12798     12933   +135     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
blueprints/language/translate.py (1)

186-211: Consider aligning method signature with other _replace_* methods.

_replace_text_commands implicitly uses self.original, while the new methods (_replace_section_commands, _replace_caption_commands, etc.) accept text as a parameter. This works in the current flow but creates an inconsistent API surface.

🔎 Suggested refactor for consistency
-    def _replace_text_commands(self, replacements: list) -> str:
+    def _replace_text_commands(self, text: str, replacements: list) -> str:
         # ... docstring updates ...
-        return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, self.original)
+        return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, text)

Then update the call site in _translate_latex:

-        replaced = self._replace_text_commands(text_translations) if texts else self.original
+        replaced = self._replace_text_commands(self.original, text_translations) if texts else self.original
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef25ce2 and e0473a0.

📒 Files selected for processing (2)
  • blueprints/language/translate.py
  • tests/language/test_translate.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
🔇 Additional comments (11)
blueprints/language/translate.py (7)

213-240: LGTM!

The implementation correctly handles section command replacement with proper regex matching for innermost content.


242-268: LGTM!

The caption replacement logic is straightforward and correctly handles the \caption{} command.


270-299: LGTM!

The item replacement correctly preserves the whitespace immediately after \item. The regex pattern appropriately stops at the next LaTeX command.


385-396: LGTM!

Good addition to protect figure blocks from decimal separator conversion. This prevents unintended modifications to file paths and other content within figures.


415-425: LGTM!

Clean extraction with proper handling of nested text commands by excluding content that contains nested patterns.


427-486: LGTM!

The extraction methods are well-structured and consistent. Note that _extract_table_cells correctly handles missing \midrule at line 470, which should be mirrored in _replace_table_cells.


501-544: LGTM!

The translation orchestration correctly extracts all text types, performs bulk translation, and applies replacements in the proper sequence. The index-based slicing to split translations back to their categories is well-implemented.

tests/language/test_translate.py (4)

112-123: LGTM!

Good test coverage for figure caption translation with appropriate guarding against network-dependent translation failures.


125-151: LGTM!

Comprehensive test coverage for all sectioning command types (\title, \section, \subsection, \subsubsection) with consistent structure and proper failure guards.


153-181: Good coverage, but consider adding a test for tables without \midrule.

The tests cover tables with content and empty tables, but a table without a \midrule separator would trigger the IndexError identified in _replace_table_cells. Adding such a test would catch the bug.

def test_table_without_midrule(self) -> None:
    """Test TranslateLatex with table that has no midrule."""
    example_latex = (
        r"\begin{tabular}{ll} A & B \\ C & D \\ \end{tabular}"
    )
    # This should not crash
    result_nl = TranslateLatex(example_latex, "nl")
    assert result_nl.translated is not None

183-195: LGTM!

Good test coverage for both \itemize and \enumerate list environments. The expected output correctly reflects the whitespace handling behavior of the replacement regex.

Copilot AI review requested due to automatic review settings January 3, 2026 18:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between def39b2 and 3a9feb1.

📒 Files selected for processing (1)
  • blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/language/translate.py
🔇 Additional comments (5)
blueprints/language/translate.py (5)

213-268: LGTM - Section and caption replacement methods follow consistent patterns.

The _replace_section_commands and _replace_caption_commands methods follow the same clean pattern as _replace_text_commands, using regex substitution with a closure to track replacement indices. The approach is straightforward and appropriate for non-nested LaTeX commands.


365-395: LGTM - Decimal separator now correctly handles inline math and display equations.

The refactored _check_decimal_separator properly extends period-to-comma replacement to inline math ($...$) and \begin{equation}...\end{equation} environments, while correctly preserving periods inside \tag{} commands via the _replace_excluding_tag helper.


414-487: LGTM - Extraction methods implement consistent filtering patterns.

The five extraction methods follow clean, consistent patterns:

  • Use _extract_balanced_content for robust brace matching
  • Apply appropriate filters (e.g., line 422 excludes nested text commands, lines 455 and 482 use negative lookahead to skip non-text LaTeX commands)
  • Line 471 safely handles tables without \midrule via parts[1] if len(parts) >= 2 else tabular_content

The filtering logic ensures only translatable plain-text content is extracted while preserving LaTeX structure.


489-545: Translation orchestration is well-structured, but sequential replacement order introduces risk.

The orchestration logic is clean and efficient:

  • Bulk extraction (lines 503-507): All translatable content extracted from self.original in one pass
  • Bulk translation (line 514): Single API call translates all content
  • Precise slicing (lines 518-527): Translations correctly split back using index arithmetic—logic verified, no off-by-one errors
  • Sequential replacement (lines 530-542): Applies translations in order

However, the sequential replacement order creates the synchronization issue flagged in the _replace_table_cells comment above. If items or table cells extracted from self.original contain \text{} commands, those \text{} regions will already be translated when _replace_item_commands or _replace_table_cells runs, causing the original extracted strings to no longer match.

Recommendation: Verify with tests or usage patterns that items and table cells in BP LatexReport output don't contain \text{}, \txt{}, etc., or refactor to extract only plain-text portions (excluding already-handled command content).


270-299: Item replacement regex pattern prevents capturing inline formatting, contradicting extraction filter logic.

The regex pattern [^\\]+? in line 299 only captures item content up to the first backslash. However, the extraction filter (line 453) explicitly allows whitelisted formatting commands (\text{}, \txt{}, \textbf{}, \textit{}), creating an inconsistency: the filter suggests items can contain inline formatting, but the regex cannot properly extract it.

For example, \item Some text \text{formula} would extract only "Some text ", losing the \text{formula} part. The test cases (lines 185, 192) only contain plain text items without inline formatting, so this gap is not currently exercised. If items need to support inline formatting, the regex pattern should be adjusted to capture the full content including whitelisted commands.

Copilot AI review requested due to automatic review settings January 3, 2026 19:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
blueprints/language/translate.py (1)

186-211: Inconsistent method signature causes replacement chain to break.

The _replace_text_commands method differs from all other _replace_* methods by not accepting a text parameter and always operating on self.original (line 211). This breaks the replacement chain in _translate_latex where line 537 assigns the result to replaced, inadvertently discarding all prior replacements (table cells, processed at line 534).

🔎 Proposed fix
-    def _replace_text_commands(self, replacements: list) -> str:
+    def _replace_text_commands(self, text: str, replacements: list) -> str:
         r"""
         Replace all \text{...}, \txt{...}, \textbf{...}, and \textit{...} in the string with the corresponding replacements.
         Only captures the innermost text content when commands are nested.
 
         Parameters
         ----------
+        text : str
+            The LaTeX string to process.
         replacements : list[str]
             The list of replacement strings.
 
         Returns
         -------
         str
             The string with \text{...}, \txt{...}, \textbf{...}, and \textit{...} replaced by the corresponding replacements.
         """
         replacement_index = 0
 
         def _repl(match: re.Match) -> str:
             nonlocal replacement_index
             command = match.group(1)  # Captures 'text', 'txt', 'textbf', or 'textit'
             replacement = replacements[replacement_index]
             replacement_index += 1
             return f"\\{command}{{{replacement}}}"
 
         # Apply all replacements in one pass (since we only extract innermost text)
-        return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, self.original)
+        return re.sub(r"\\(text|txt|textbf|textit)\{([^{}]*)\}", _repl, text)

And update the call site at line 537:

-            replaced = self._replace_text_commands(text_translations)
+            replaced = self._replace_text_commands(replaced, text_translations)
♻️ Duplicate comments (1)
blueprints/language/translate.py (1)

301-363: Potential synchronization issue remains despite reordering.

While lines 353-355 safely assume \midrule is present (per learnings), a subtle duplication persists:

  1. _extract_table_cells (line 486) extracts full cells including \text{} commands, e.g., "Length: \text{m}"
  2. _extract_text_commands (line 423) separately extracts "m" from the same \text{m} command
  3. Both are translated in the bulk operation (line 514-515)
  4. Even after the fix to line 537, when table cells are replaced first, the cell becomes "Lengte: \text{m}" (translated)
  5. Then _replace_text_commands tries to match and replace \text{m} commands, but the context may have changed

This could lead to inconsistent translations or missed replacements. Consider either:

  • Option A: Modify _extract_table_cells to extract only plain-text portions (excluding \text{} content) to avoid duplication, OR
  • Option B: After replacing table cells, skip _replace_text_commands for content within tables

Based on learnings, BP uses \text{} as a semantic marker, so tables generated by BP may frequently contain such commands, making this scenario non-trivial.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9feb1 and 39680a2.

📒 Files selected for processing (1)
  • blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
  • GitHub Check: Code coverage report
🔇 Additional comments (10)
blueprints/language/translate.py (10)

213-240: LGTM!

The implementation correctly handles section command replacements with proper parameter passing and regex pattern matching. The pattern at line 240 matches the extraction pattern in _extract_section_commands, ensuring consistency.


242-268: LGTM!

The caption replacement logic is correct and follows the same pattern as other replacement methods.


270-299: LGTM!

The item replacement logic correctly handles simple list items. The regex pattern at line 299 is consistent with the extraction pattern at line 449, ensuring extraction and replacement alignment. Note that the pattern treats backslashes as command delimiters, so items with escaped characters (e.g., \%) would be split at those points, but this appears intentional for the BP use case.


365-395: LGTM!

The expanded decimal separator logic correctly handles inline math (line 394) and equation environments (line 395) while preserving \tag{} content via the _replace_excluding_tag helper. The negative lookbehind (?<!\\) at line 394 properly excludes escaped dollar signs.


414-424: LGTM!

The extraction logic correctly identifies innermost text commands using balanced brace matching and filters out nested commands at line 422. This ensures only leaf-level content is extracted for translation.


426-434: LGTM!

The section extraction logic is straightforward and correct, using balanced brace matching to extract section titles.


436-444: LGTM!

Caption extraction is implemented correctly using balanced brace matching.


446-460: LGTM with caveat.

The item extraction logic correctly handles simple list items and preserves trailing spaces (line 453). Note that items containing \text{} commands face the same duplication issue as table cells (already flagged separately), where both the full item text and the \text{} content are extracted for translation.


462-487: LGTM!

Table cell extraction is implemented robustly, with line 470-471 gracefully handling tables without \midrule. The negative lookahead at line 482 correctly allows \text{} commands while filtering out other LaTeX constructs. Note that the duplication concern with \text{} content (already flagged under _replace_table_cells) applies here as well.


489-549: Well-structured orchestration with one critical bug.

The translation orchestration is well-designed:

  • Lines 503-507: Extract all LaTeX construct types
  • Line 514: Efficient bulk translation of all texts
  • Lines 517-527: Clean slicing to distribute translations
  • Lines 529-530: Clear comment explaining replacement order intent

However, the critical bug at line 537 (already flagged) prevents the intended ordering from working correctly. Once that's fixed, the orchestration will correctly chain replacements. The synchronization concern with \text{} commands in table cells/items (also flagged separately) remains a broader design issue beyond this method.

After applying the fix at line 537, verify that table cells containing \text{} commands translate correctly by testing with:

from blueprints.utils.report import LatexReport
latex = LatexReport().add_table(headers=["Column"], rows=[["Length: \\text{m}"]]).to_document()

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39680a2 and 89bf486.

📒 Files selected for processing (1)
  • blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/language/translate.py
🔇 Additional comments (5)
blueprints/language/translate.py (5)

186-214: LGTM! Signature change properly implements parameterized text processing.

The method now correctly accepts and operates on the text parameter instead of self.original, which allows it to be used in the sequential replacement pipeline at line 539.


215-302: LGTM! New replacement methods follow consistent patterns.

The three new replacement methods (_replace_section_commands, _replace_caption_commands, _replace_item_commands) correctly follow the same pattern as _replace_text_commands, using closures to track replacement indices and regex patterns appropriate for each LaTeX construct.


367-397: LGTM! Decimal separator replacement correctly handles inline math and equations.

The updated _check_decimal_separator properly:

  • Excludes \tag{} content from period-to-comma replacement (lines 388-393)
  • Handles inline math with negative lookbehind for unescaped $ (line 396)
  • Handles equation environments with DOTALL flag (line 397)

416-462: LGTM! Extraction methods correctly extract translatable content from various LaTeX constructs.

The new extraction methods properly:

  • Use _extract_balanced_content for handling nested braces
  • Filter for innermost content to avoid double-extraction (line 424)
  • Use consistent negative lookahead patterns to allow \text{}/\txt{}/\textbf{}/\textit{} while excluding other commands (lines 457, 459)
  • Handle edge cases like empty content and nested structures

491-551: Well-structured translation pipeline with efficient bulk processing.

The updated _translate_latex method correctly:

  • Extracts content from all supported LaTeX constructs (lines 504-509)
  • Performs a single bulk translation operation for efficiency (line 517)
  • Accurately splits translations using index arithmetic (lines 520-529)
  • Applies replacements in a defined order (lines 535-548)

The comment at lines 531-532 demonstrates awareness of the replacement order importance, though the critical synchronization issue flagged separately still requires resolution.

Copilot AI review requested due to automatic review settings January 3, 2026 19:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
blueprints/language/translate.py (1)

303-368: Synchronization fix looks good, but minor inconsistency with extraction method.

The synchronization issue from past reviews is correctly addressed:

  • Line 489 in _extract_table_cells extracts only plain text portions (excluding \text{} commands)
  • Line 335-340 here replaces only those plain text portions
  • This prevents the mismatch that occurred when table cells were replaced after text commands modified the \text{} content

However, there's a minor inconsistency: the extraction method at line 476 defensively checks len(parts) >= 2 before accessing the content after \midrule, but lines 358-360 here directly access parts[1] and parts[2] without checking. While the learning confirms LatexReport tables always include \midrule, consider adding the same defensive check for consistency.

🔎 Optional: Add defensive length check for consistency
             # Skip header lines (before \midrule)
             parts = re.split(r"(\\midrule)", tabular_content, maxsplit=1)
-            header_part = parts[0]
-            midrule = parts[1]
-            content_part = parts[2]
-
-            # Process rows in content part
-            processed_content = re.sub(r"([^\\].*?)(?=\\\\|\\bottomrule|\\end)", _repl_row, content_part, flags=re.DOTALL)
-
-            return tabular_start + header_part + midrule + processed_content + tabular_end
+            if len(parts) >= 3:
+                header_part = parts[0]
+                midrule = parts[1]
+                content_part = parts[2]
+                # Process rows in content part
+                processed_content = re.sub(r"([^\\].*?)(?=\\\\|\\bottomrule|\\end)", _repl_row, content_part, flags=re.DOTALL)
+                return tabular_start + header_part + midrule + processed_content + tabular_end
+            else:
+                # No \midrule found (shouldn't happen with LatexReport, but be defensive)
+                return match.group(0)  # Return unchanged

Based on learnings about LatexReport table structure.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89bf486 and 650e524.

📒 Files selected for processing (1)
  • blueprints/language/translate.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/language/translate.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Code Checks and Tests with Minimum Dependencies (3.14)
🔇 Additional comments (5)
blueprints/language/translate.py (5)

186-214: Good fix: signature now accepts text parameter.

The updated signature correctly addresses the past review comment about operating on a passed text parameter instead of self.original. The replacement logic preserves the LaTeX command structure while replacing content.


215-301: LGTM: New replacement methods follow consistent patterns.

The three new replacement methods (_replace_section_commands, _replace_caption_commands, _replace_item_commands) follow the established pattern from _replace_text_commands. The item method correctly preserves whitespace (line 297), and all methods appropriately use the nonlocal replacement index to track progress through the replacements list.


370-400: LGTM: Decimal separator handling correctly extended.

The update properly extends decimal separator replacement to inline math ($...$) and display equations (\begin{equation}...\end{equation}), while correctly preserving \tag{} content via the helper function. The negative lookbehind (?<!\\) at line 399 correctly avoids matching escaped dollar signs.


419-492: LGTM: Extraction methods correctly isolate translatable content.

All five extraction methods properly:

  • Use _extract_balanced_content for brace matching
  • Apply appropriate filtering (e.g., line 427 excludes nested text commands, line 460 excludes complex LaTeX in items)
  • Handle the synchronization issue: line 489 extracts only plain text portions from table cells, excluding \text{} commands that are separately handled by _extract_text_commands

The filtering patterns are consistent between items (line 460) and tables (line 487), both using the same negative lookahead pattern.


494-554: LGTM: Orchestration correctly implements bulk translation with proper ordering.

The method correctly:

  • Extracts all translatable content from multiple LaTeX constructs (lines 508-512)
  • Performs a single bulk translation operation (line 520) for efficiency
  • Splits translations back by count (lines 523-532)
  • Critically: applies replacements in the correct order (lines 538-542) - table cells before text commands to prevent the synchronization issue where text commands within cells would be replaced first, causing mismatches

The replacement order at lines 538-551 ensures that when a table cell contains \text{m}, the entire cell content (including \text{m}) is replaced first, preventing the later text command replacement from modifying it.

Copilot AI review requested due to automatic review settings January 6, 2026 19:03
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @blueprints/utils/language/translate.py:
- Around line 207-218: The coroutine handling is inverted: calling
loop.run_until_complete() when get_running_loop() succeeds will raise
RuntimeError; instead, when asyncio.iscoroutine(translations) is true, call
asyncio.get_running_loop() and if that raises RuntimeError (no running loop) use
translations = asyncio.run(translations), otherwise (a loop is running) submit
the coroutine to that loop with asyncio.run_coroutine_threadsafe(translations,
loop).result() and assign its result to translations; update the block around
the translations variable (the coroutine handling that currently uses
loop.run_until_complete) accordingly.
🧹 Nitpick comments (1)
blueprints/utils/language/translate.py (1)

346-348: Potential edge case with item content matching.

The regex pattern \\item(\s+)([^\\]+?)(?=\\|$) uses a non-greedy match that stops at any backslash. This could prematurely truncate item content if the text contains escaped characters like \% or other LaTeX escapes within the item text.

Consider whether escaped characters in item content are a realistic use case. If so, the pattern may need adjustment to allow specific escape sequences.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 650e524 and d57ad46.

⛔ Files ignored due to path filters (1)
  • blueprints/utils/language/translations.csv is excluded by !**/*.csv
📒 Files selected for processing (3)
  • blueprints/utils/language/__init__.py
  • blueprints/utils/language/translate.py
  • tests/language/test_translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/utils/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/utils/language/translate.py
🧬 Code graph analysis (1)
tests/language/test_translate.py (1)
blueprints/utils/language/translate.py (2)
  • LatexTranslator (21-628)
  • text (606-617)
🔇 Additional comments (16)
tests/language/test_translate.py (6)

1-16: LGTM!

Imports are clean and the test class setup is appropriate. The use of os.path.join with __file__ for locating the test CSV is a reliable approach.


18-66: LGTM!

The CSV-based translation tests are comprehensive, covering basic translations, wildcards, special characters, and edge cases like mismatched wildcard counts. The test structure is clear and assertions are appropriate.


254-276: LGTM!

The itemize and enumerate translation tests are well-structured with proper mocking. The expected outputs correctly preserve the whitespace after \item commands, which addresses the concerns from past review comments about inconsistent whitespace handling.


284-310: LGTM!

The non-English source language test is well-designed. It properly creates a temporary CSV with multiple language pairs, tests various translation directions, and ensures cleanup in a finally block. This provides good coverage for the source_language parameter.


312-340: LGTM!

The error handling tests properly verify graceful fallback behavior when Google Translate fails. Using side_effect to simulate exceptions is the correct approach, and asserting that the original text is preserved demonstrates robust error handling.


101-105: No issue found. The test correctly uses the wildcard pattern system.

The CSV contains "With formula **:" → "Met formule **:" with ** as a wildcard placeholder. The LatexTranslator._wildcard_match() method (lines 131-152) properly handles this pattern: it splits the source string by **, builds a regex to match fixed parts with variable content, and replaces ** in the target with the captured groups. The test input "With formula 6.83:" correctly matches this pattern and translates to "Met formule 6.83:" without requiring mocks or CSV literals. The implementation works as designed.

Likely an incorrect or invalid review comment.

blueprints/utils/language/translate.py (10)

1-33: LGTM!

The imports are appropriate, and the ImportError handling provides clear installation instructions. The class docstring effectively communicates the translation behavior, fallback mechanism, and important warnings about Google Translate usage.


35-83: LGTM!

The constructor is well-designed with sensible defaults, comprehensive documentation, and lazy translation evaluation. The CSV format documentation in the docstring is particularly helpful for users providing custom translation files.


85-129: LGTM!

The CSV loading logic is robust with proper error handling for missing files, missing language columns, and short rows. The "-" handling for preserving source text is a useful feature.


131-152: LGTM!

The wildcard matching implementation is solid. Using re.escape on the fixed parts ensures special regex characters are handled correctly. The warning for mismatched wildcard counts prevents silent failures.


404-413: Table processing assumes \midrule is always present.

As discussed in past reviews and recorded in learnings, this assumption is valid because tables are generated by BP LatexReport which always includes \midrule. The code is correctly scoped for this input format. Based on learnings, this is intentional design.


418-448: LGTM!

The decimal separator conversion is well-implemented. The language lists are comprehensive, and the logic correctly preserves \tag{} content while converting periods to commas in mathematical expressions.


450-466: LGTM!

The balanced brace extraction is a clean implementation that correctly handles nested braces. This is essential for properly parsing LaTeX commands with complex content.


500-514: Consistent handling of text command exclusions.

The pattern \\(?!text\{|txt\{|textbf\{|textit\{) is used consistently across extraction and replacement methods to allow text formatting commands while filtering out other LaTeX commands. This is a reasonable approach for the use case.


583-600: LGTM!

The replacement ordering is well-documented. Processing table cells before text commands prevents index misalignment when cells contain \text{} commands. This is a thoughtful design decision.


605-628: LGTM!

The lazy evaluation pattern with _translation caching is efficient and the __str__ implementation provides a clean interface for using the translator in string contexts.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 5 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
blueprints/utils/language/translate.py (2)

97-97: Simplify the redundant phrasing.

The phrase "may probably not contain" is grammatically awkward and redundant.

🔎 Proposed fix
-            This file may probably not contain translations for your specific use case and language pair.
+            This file may not contain translations for your specific use case and language pair.

487-503: Consider handling escaped braces in balanced content extraction.

The _extract_balanced_content method doesn't distinguish between structural braces ({, }) and escaped literal braces (\{, \}). If LaTeX content contains escaped braces, the depth counting will be incorrect, potentially causing extraction to stop prematurely or continue too far.

This is unlikely to cause issues with Blueprints-generated LaTeX, but could fail with user-provided content containing literal braces.

🔎 Proposed fix
     def _extract_balanced_content(text: str, start_pos: int) -> tuple[str, int]:
         r"""
         Extract content from balanced braces starting at start_pos.
         Returns (content, end_position).
         """
         depth = 0
         i = start_pos
         while i < len(text):
-            if text[i] == "{":
-                depth += 1
-            elif text[i] == "}":
-                depth -= 1
-                if depth == 0:
-                    break
+            ch = text[i]
+            if ch in "{}" :
+                # Check if this brace is escaped (preceded by odd number of backslashes)
+                backslash_count = 0
+                j = i - 1
+                while j >= 0 and text[j] == "\\":
+                    backslash_count += 1
+                    j -= 1
+                # If odd number of backslashes, the brace is escaped
+                if backslash_count % 2 == 0:  # Not escaped
+                    if ch == "{":
+                        depth += 1
+                    else:  # ch == "}"
+                        depth -= 1
+                        if depth == 0:
+                            break
             i += 1
         return text[start_pos + 1 : i], i
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d57ad46 and d97c8fe.

📒 Files selected for processing (1)
  • blueprints/utils/language/translate.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-03T17:53:19.607Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 890
File: blueprints/language/translate.py:351-360
Timestamp: 2026-01-03T17:53:19.607Z
Learning: Tables generated by Blueprints LatexReport always include a `\midrule` separator, so the `_replace_table_cells` method in `blueprints/language/translate.py` can safely assume `parts[1]` and `parts[2]` exist when splitting tabular content by `\midrule`.

Applied to files:

  • blueprints/utils/language/translate.py
📚 Learning: 2025-12-27T08:33:34.952Z
Learnt from: GerjanDorgelo
Repo: Blueprints-org/blueprints PR: 882
File: blueprints/codes/report.py:49-85
Timestamp: 2025-12-27T08:33:34.952Z
Learning: In the ReportCheck class (blueprints/codes/report.py), the add_text method intentionally wraps plain text in LaTeX \text{} command even in text mode. While not standard LaTeX semantics, this serves as a semantic marker for the Word converter (ReportToWordConverter) and translation tools to distinguish plain text from equations and other content types.

Applied to files:

  • blueprints/utils/language/translate.py
🔇 Additional comments (1)
blueprints/utils/language/translate.py (1)

240-255: No issue found—googletrans API behavior is correct.

The code correctly assumes that translator.translate(missing, dest=...) returns a list of Translated objects, each with a .text attribute. This matches the documented googletrans 4.0.2+ behavior and is confirmed by the existing test mocks in the codebase. Line 255's list comprehension [tr.text for tr in translations] is the correct way to extract translated text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[✨ Feature request]: Update the translate with new report functions

3 participants