Skip to content

Conversation

HYLcool
Copy link
Collaborator

@HYLcool HYLcool commented Oct 11, 2025

Optimize the op doc building logics -- Use the previous Chinese description when

  1. the English description is unchanged
  2. the translation failed

…iption when the English description is unchanged or the translation failed
@HYLcool HYLcool requested review from Qirui-jiao, cmgzn and yxdyc October 11, 2025 07:45
@HYLcool HYLcool self-assigned this Oct 11, 2025
@HYLcool HYLcool added documentation Improvements or additions to documentation dj:ci/cd issues/PRs about CI/CD of Data-Juicer labels Oct 11, 2025
Copy link
Contributor

Summary of Changes

Hello @HYLcool, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant optimizations to the build_op_doc pre-commit hook. The primary goal is to enhance the efficiency and reliability of generating operator documentation, particularly concerning multilingual descriptions. By implementing a more intelligent approach to handling Chinese translations, the system now avoids redundant translation calls and gracefully falls back to previous versions when new translations are not required or fail, leading to a more robust and performant documentation generation process.

Highlights

  • Optimized Chinese Description Handling: The build_op_doc hook now intelligently reuses existing Chinese descriptions if the English description remains unchanged or if the translation service encounters an error, avoiding unnecessary re-translations and improving efficiency.
  • Separation of English and Chinese Descriptions: The OPRecord class has been updated to store English (desc) and Chinese (desc_zh) descriptions separately, improving data structure and clarity within the documentation generation process.
  • Refactored Translation Logic: The generate_op_table_section function now only attempts to translate English descriptions that have actually changed, and the get_op_desc_in_en_zh_batched function returns None on translation failure, allowing for a graceful fallback to previous Chinese descriptions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the operator documentation build process by reusing existing Chinese translations when the English source is unchanged. The overall approach is sound, but I've identified a critical bug that could cause the script to crash when new operators are added, as well as a high-severity issue related to fragile parsing logic. I've provided detailed comments and code suggestions to address these points, which will improve the script's robustness and efficiency.

Copy link
Collaborator

@cmgzn cmgzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HYLcool HYLcool merged commit 7870825 into main Oct 11, 2025
2 of 4 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in data-juicer Oct 11, 2025
@HYLcool HYLcool deleted the opt/build_op_doc branch October 11, 2025 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dj:ci/cd issues/PRs about CI/CD of Data-Juicer documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants