Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @fabrahman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces foundational changes to support pre-tokenized user queries within the data pipeline, primarily by adding a new field for raw prompts and updating data structures and processing logic to utilize it. The changes aim to streamline how prompts are handled and are part of an ongoing effort to resolve a reported timeout error during sequence packing.
Highlights
- Pre-tokenized Query Support: Introduced a new
RAW_PROMPT_KEYto store and handle raw, pre-tokenized user queries throughout the data processing pipeline, enabling direct use of these queries in subsequent steps. - Data Processing Refinement: Modified data loading and batching mechanisms to incorporate the new raw prompt data, specifically updating the
Batchdataclass and its handling ingrpo_fast.pyto useraw_queriesdirectly for decoding. - Debugging Enhancements: Added
breakpoint()calls inground_truth_utils.pyandutils.py, and setload_from_cache_file=Falseindataset_transformation.py'sget_dataset_v1function, indicating active debugging efforts related to data processing and potential caching issues. - Error Handling Improvement: Adjusted the
extract_user_queryfunction to return the full conversation string if no specific user query pattern is matched, preventingNonereturns in certain scenarios.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
Code Review Feedback🚨 Critical Issues1. Debugging Code Left in Production (High Priority)
2. Performance & Caching Concern
🔧 Code Quality Issues3. Code Duplication
4. Inconsistent Logic
5. Missing Documentation
🐛 Potential Bugs6. Null Safety
7. Data Consistency
📋 Recommendations
✅ Positive Notes
The core functionality looks solid, but the debugging artifacts and caching change need immediate attention before this can be merged. |
There was a problem hiding this comment.
Code Review
This pull request introduces a change to pre-load user queries as raw strings to avoid decoding them during training, which is a good performance optimization. My review focuses on several areas for improvement:
- Code Cleanup: There are several instances of leftover debugging code, such as
breakpoint()calls and commented-out code, which should be removed. - Redundancy: The constant
RAW_PROMPT_KEYis defined multiple times across the same file. It should be defined once and imported where needed. - Potential Bug: There's a potential logic error in
sft_tokenize_mask_out_prompt_v1where an incorrect key seems to be used, which could lead to unexpected behavior. - Clarity and Best Practices: Some minor improvements to code formatting and removal of unnecessary lines are suggested to enhance readability and maintainability.
Overall, the core logic of the PR is sound, but addressing these points will improve the quality and robustness of the code.
* Reapply "Added a decorator to profile functions." This reverts commit 504a8d3. * Removed claude * Removed extraneuous stuff. * Reverted to main. * Removed changes * Removed changes
* add better logging * fix * lint * lint2 * adjust based on finbarrs feedback * fix * remove lambda --------- Co-authored-by: root <root@neptune-cs-aus-263.reviz.ai2.in>
|
*** Update: This PR contains ***:
Here is a successful 1-gpu debug job |
hamishivi
left a comment
There was a problem hiding this comment.
Looks good and runs with my testing but needs a little cleanup!
| logging.getLogger("litellm.cost_calculator").setLevel(logging.CRITICAL) | ||
| logging.getLogger("litellm._client").setLevel(logging.CRITICAL) | ||
| logging.getLogger("cost_calculator").setLevel(logging.WARNING) | ||
| logging.getLogger("httpx").setLevel(logging.WARNING) |
There was a problem hiding this comment.
Does this work to remove the logging from litellm?
There was a problem hiding this comment.
yes, it worked both locally and in the debug beaker job.
| DEFAULT_SFT_MESSAGES_KEY = "messages" | ||
| GROUND_TRUTHS_KEY = "ground_truth" | ||
| VERIFIER_SOURCE_KEY = "dataset" | ||
| RAW_PROMPT_KEY = "prompt" |
There was a problem hiding this comment.
If there's not multiple prompt key variables, let's just use PROMPT_KEY?
| prompt = row[sft_messages_key] | ||
| else: | ||
| prompt = row[sft_messages_key][:-1] | ||
|
|
| frac_or_num_samples = float(frac_or_num_samples) | ||
| else: | ||
| frac_or_num_samples = int(frac_or_num_samples) | ||
|
|
| logger = logger_utils.setup_logger(__name__) | ||
|
|
||
|
|
||
| logging.getLogger("LiteLLM").setLevel(logging.WARNING) |
There was a problem hiding this comment.
Can you add a comment as to why you're doing this?
This reverts commit fa7e608.
This reverts commit fa7e608.
This reverts commit 341a77b.
* uses add_request * ran linter * Clean up * Fixed bug * Added duplication * Added prompt_tokens to metadata. * Added missing key to metadata * Fixed bug where we weren't returning properly. * Fix script * Added logging * fix bug * use clone for SamplingParams * Fixes to duplication * Removed logging. * Cleaned up PR. * Clean PR * Removed whitespace * Cleaned up PR * Added comment for cleaner PR. * Cleaning up PR * Revert "load pretokenized user query (v0) (#965)" This reverts commit fa7e608. * Bug fix. * Fixed issue where we weren't setting params right in tools. * Updated descriptions. * Fix ordering. * Updated tool script with description. * Fixed use of wrong vllm.SamplingParams. * Now, tool use should run. * Reapply "load pretokenized user query (v0) (#965)" This reverts commit 341a77b. * minor clean up.
* load pretokenized user queries * remove breakpoints * Remove claude (allenai#966) * Reapply "Added a decorator to profile functions." This reverts commit 87cb1a4. * Removed claude * Removed extraneuous stuff. * Reverted to main. * Removed changes * Removed changes * Better logging of errors in threads (allenai#967) * add better logging * fix * lint * lint2 * adjust based on finbarrs feedback * fix * remove lambda --------- Co-authored-by: root <root@neptune-cs-aus-263.reviz.ai2.in> * use a smaller model (allenai#968) * Less logging. (allenai#969) * fix PendingQueriesMap to include raw_query * fix reading the pretokenized query, modify judge prompt, remove litellm logs * remove load_from_cache_file=False * fix linting * pr cleanup * remove extract user query import * update to include system prompt * lint * priotize system prompt * update tests for new raw queries * lint * modify general-quality (no reference) jduge prompt * assume user queries are not None * remove spaces and add comments --------- Co-authored-by: Finbarr Timbers <finbarrtimbers@gmail.com> Co-authored-by: Hamish Ivison <hamishivi@gmail.com> Co-authored-by: root <root@neptune-cs-aus-263.reviz.ai2.in>
* uses add_request * ran linter * Clean up * Fixed bug * Added duplication * Added prompt_tokens to metadata. * Added missing key to metadata * Fixed bug where we weren't returning properly. * Fix script * Added logging * fix bug * use clone for SamplingParams * Fixes to duplication * Removed logging. * Cleaned up PR. * Clean PR * Removed whitespace * Cleaned up PR * Added comment for cleaner PR. * Cleaning up PR * Revert "load pretokenized user query (v0) (allenai#965)" This reverts commit bc9d084. * Bug fix. * Fixed issue where we weren't setting params right in tools. * Updated descriptions. * Fix ordering. * Updated tool script with description. * Fixed use of wrong vllm.SamplingParams. * Now, tool use should run. * Reapply "load pretokenized user query (v0) (allenai#965)" This reverts commit 341a77b. * minor clean up.
* load pretokenized user queries * remove breakpoints * Remove claude (allenai#966) * Reapply "Added a decorator to profile functions." This reverts commit 38817d3. * Removed claude * Removed extraneuous stuff. * Reverted to main. * Removed changes * Removed changes * Better logging of errors in threads (allenai#967) * add better logging * fix * lint * lint2 * adjust based on finbarrs feedback * fix * remove lambda --------- Co-authored-by: root <root@neptune-cs-aus-263.reviz.ai2.in> * use a smaller model (allenai#968) * Less logging. (allenai#969) * fix PendingQueriesMap to include raw_query * fix reading the pretokenized query, modify judge prompt, remove litellm logs * remove load_from_cache_file=False * fix linting * pr cleanup * remove extract user query import * update to include system prompt * lint * priotize system prompt * update tests for new raw queries * lint * modify general-quality (no reference) jduge prompt * assume user queries are not None * remove spaces and add comments --------- Co-authored-by: Finbarr Timbers <finbarrtimbers@gmail.com> Co-authored-by: Hamish Ivison <hamishivi@gmail.com> Co-authored-by: root <root@neptune-cs-aus-263.reviz.ai2.in>
* uses add_request * ran linter * Clean up * Fixed bug * Added duplication * Added prompt_tokens to metadata. * Added missing key to metadata * Fixed bug where we weren't returning properly. * Fix script * Added logging * fix bug * use clone for SamplingParams * Fixes to duplication * Removed logging. * Cleaned up PR. * Clean PR * Removed whitespace * Cleaned up PR * Added comment for cleaner PR. * Cleaning up PR * Revert "load pretokenized user query (v0) (allenai#965)" This reverts commit 6baee4c. * Bug fix. * Fixed issue where we weren't setting params right in tools. * Updated descriptions. * Fix ordering. * Updated tool script with description. * Fixed use of wrong vllm.SamplingParams. * Now, tool use should run. * Reapply "load pretokenized user query (v0) (allenai#965)" This reverts commit 341a77b. * minor clean up.
This is the early pr for the pre tokenized user query. It leads to some
2025-08-29 17:03:30 - WARNING - grpo_fast.py:2076 - [Main Thread] Timeout waiting for packed sequences. Retrying... 2025-08-29 17:04:00 - WARNING - grpo_fast.py:2076 - [Main Thread] Timeout waiting for packed sequences. Retrying...error.@hamishivi can you TAK?