Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The code changes include adding a TODO comment in deepeyes_plugin.py, adding pre_norm_last_hidden_state and pre_projection_last_hidden_state to the output of _output_embedding_hook in patcher.py, and modifying Seq2SeqTrainer.compute_loss to handle route_label_kwargs and initialize the model if loss_type is route_hybrid_infonce. Additionally, several new files were added, including shell scripts for GRPO training and vLLM server setup, a JSON example for dataset formatting, and Python scripts for tool call parsing and testing. The review comments highlight security vulnerabilities due to hardcoded API keys and tokens, portability issues from hardcoded absolute paths, redundancy in assigning the same value to different keys, and the need to address or remove a placeholder TODO comment.
Note: Security Review did not run due to the size of the PR.
| # 基础配置 | ||
| model_name="/mnt/cfs/ssw/ljc/LLaMA-Factory/models/Qwen3-4B" | ||
| output_dir="./debug_output" | ||
| wandb_api_key="8b7eb3957d2cf7157ab46fcf3e5b602cf2e7b24e" |
| --offload_optimizer true \ | ||
| --deepspeed zero2 \ | ||
| --report_to swanlab \ | ||
| --swanlab_token GFPjNmyR2K5Cog3C6N7uA \ |
| wandb_api_key="8b7eb3957d2cf7157ab46fcf3e5b602cf2e7b24e" | ||
| swanlab_api_key="GFPjNmyR2K5Cog3C6N7uA" |
There was a problem hiding this comment.
Hardcoding API keys (wandb_api_key, swanlab_api_key) in a script is a critical security vulnerability. These keys should be loaded from environment variables or a secure configuration file, not stored in version control.
| wandb_api_key="8b7eb3957d2cf7157ab46fcf3e5b602cf2e7b24e" | |
| swanlab_api_key="GFPjNmyR2K5Cog3C6N7uA" | |
| wandb_api_key="${WANDB_API_KEY}" | |
| swanlab_api_key="${SWANLAB_API_KEY}" |
| --deepspeed zero2 \ | ||
| --report_to swanlab \ | ||
| --swanlab_token GFPjNmyR2K5Cog3C6N7uA \ | ||
| --swanlab_mode cloud \ |
| --offload_optimizer true \ | ||
| --deepspeed zero3 \ | ||
| --report_to swanlab \ | ||
| --swanlab_token GFPjNmyR2K5Cog3C6N7uA \ |
| --reward_model /zhoupc/safe_alignment/models/safe_rlhf_v/rm_qwen2_5_vl | ||
| --cost_model /zhoupc/safe_alignment/models/safe_rlhf_v/cm_qwen2_5_vl # | ||
| --train_type full | ||
| --dataset /zhoupc/safe_alignment/datasets/converted_sample.jsonl | ||
| --torch_dtype bfloat16 | ||
| --num_train_epochs 2 | ||
| --per_device_train_batch_size 1 | ||
| --per_device_eval_batch_size 1 | ||
| --attn_impl flash_attn | ||
| --learning_rate 5e-7 | ||
| --remove_unused_columns false #? | ||
| --warmup_ratio 0.03 | ||
| --dataloader_num_workers 0 | ||
| --deepspeed zero3_offload | ||
| --dataset_num_proc 8 | ||
|
|
||
| --freeze_vit true | ||
|
|
||
| --gradient_accumulation_steps 4 | ||
| --eval_steps 3000 | ||
| --save_steps 10000 | ||
| --save_total_limit 1 | ||
| --logging_steps 5 | ||
| --max_length 21000 | ||
| ## Saving settings | ||
| --save_only_model true | ||
| --output_dir /zhoupc/safe_alignment/checkpoints/safe_rlhf_v_ppo_qwen-7b |
|
|
||
| # hard settings | ||
| nproc_per_node=8 # 使用的GPU数量,根据你的硬件调整 | ||
| # model_name="/mnt/cfs/ssw/ljc/LLaMA-Factory/saves/qwen3-4b/full/long1.0+plannner+format1.0" # 模型名称 |
|
|
||
| import sys | ||
| import os | ||
| sys.path.insert(0, '/mnt/cfs/ssw/ljc/ms-swift') |
| if count_tool_1 != count_tool_2: | ||
| is_format_error = True | ||
|
|
||
| # TODO: ? |
| 'pre_norm_last_hidden_state': pre_norm_embeddings.contiguous(), | ||
| 'pre_projection_last_hidden_state': pre_norm_embeddings.contiguous(), |
There was a problem hiding this comment.
The keys 'pre_norm_last_hidden_state' and 'pre_projection_last_hidden_state' are both assigned the same value (pre_norm_embeddings.contiguous()). This seems redundant. If they are intended for different purposes, adding a comment to clarify would be helpful. Otherwise, consider removing one of the keys to avoid confusion.
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).