-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: sandbox fusion for multi-turn #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
''' | ||
|
||
expect_turn_0 = ''' | ||
Okay, so I need to find out how many students at Dala High School are not taking any of the three classes: Math, Science, or English. The total number of students is 152. Let me see... I remember this is a problem about sets and maybe using the principle of inclusion-exclusion. Let me recall how that works.\n\nFirst, the inclusion-exclusion principle for three sets says that the total number of students taking at least one of the classes is equal to the sum of the numbers in each individual class, minus the sum of the numbers in each pair of classes, plus the number in all three classes. Then, subtract that total from the overall number of students to get those not taking any of the three. \n\nLet me write that down step by step. Let M be the set of students taking Math, S for Science, and E for English. Then:\n\nTotal in at least one class = |M ∪ S ∪ E| = |M| + |S| + |E| - |M ∩ S| - |M ∩ E| - |S ∩ E| + |M ∩ S ∩ E|\n\nGiven the numbers:\n\n|M| = 100\n\n|S| = 94\n\n|E| = 57\n\n|M ∩ S| = 73\n\n|M ∩ E| = 24\n\n|S ∩ E| = 27\n\n|M ∩ S ∩ E| = 22\n\nSo plugging these into the formula:\n\nTotal = 100 + 94 + 57 - 73 - 24 - 27 + 22\n\nLet me compute that step by step using code to ensure accuracy.\n\n<tool_call>[{"arguments": {"code": "M = 100\\nS = 94\\nE = 57\\nM_S = 73\\nM_E = 24\\nS_E = 27\\nM_S_E = 22\\n\\ntotal_in_any = M + S + E - M_S - M_E - S_E + M_S_E\\nstudents_neither = 152 - total_in_any\\nprint(students_neither)", "language": "python"}, "name": "calc_code_result"}]</tool_call>\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里对的,应该是会 format toll_call 到句尾,也有一种可行的方式是保留 message 形式,用 待测模型的 tokenizer 来 format,避免测试模型的 chat template 不同导致的 gap
''' | ||
|
||
tool_return_0 = ''' | ||
\n<interpreter>\n3\n</interpreter>\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里正常情况下不应该再有 retool 引入的 tag,应该直接返回 3,另外 tool turn 本身在 qwen3 下会添加 <tool_response> tag ,可以用 tokenizer apply chat template 看下
verl/tools/prime_tools.py
Outdated
|
||
return result, result, {} | ||
|
||
async def execute_code(self,instance_id,code): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 execute_code 是否可以传一些 metrics 给 execute 返回?让用户可以监控服务质量?
verl/tools/prime_tools.py
Outdated
logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN")) | ||
|
||
|
||
class PrimeTool(BaseTool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 Prime 调和 rollout 调不能合并的本质原因是?
verl/tools/prime_tools.py
Outdated
"ground_truth": ground_truth, | ||
"reward": [], | ||
} | ||
print(f"self._instance_dict: {self._instance_dict}, prime_tools create are called") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 logger
verl/tools/prime_tools.py
Outdated
# we should always expect this since we don't have correct answer | ||
if metadata["run_status"] == "Finished": | ||
actual_output = metadata["stdout"] if metadata["stdout"] is not None else "" | ||
print("actual_output from sandbox fusion: ",actual_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 logger
verl/tools/prime_tools.py
Outdated
async def calc_reward(self, instance_id: str, **kwargs) -> str: | ||
# this code only called as a cumulation reward, so we return the sandbox result | ||
# only for unit test to do any kind of verification | ||
print(f"self._instance_dict: {self._instance_dict}, prime_tools calc_reward are called") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成 logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我稍后在整体逻辑搞定后统一lint一下
876bcd3
to
caa35a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些代码已经合并到main了,你的分支merge下main,避免文件冲突
@Irvingwangjr commit有12个,而且很多标题相同,最好把commit合并成一个,force push过来。这样合并到main之后,其他人比较容易找到对应的改动 |
c6e7fd0
to
acd48e1
Compare
这个没事,最后都是 squash merge ,在 main 里面就是一个 commit。 |
0114625
to
a5245db
Compare
sandbox_url = "" | ||
|
||
|
||
def get_sandbox_fusion_data(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use message list format
{ | ||
"input": " | ||
|
||
system\nYou are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"code_interpreter\", \"description\": \"A tool for executing code.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The code to execute.\", \"enum\": null}}, \"required\": [\"code\"]}, \"strict\": false}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we transform this into typical message list used in openai compatible api ?
@@ -0,0 +1,17 @@ | |||
tools: | |||
- class_name: "verl.tools.sandbox_fusion_tools.SandboxFusionTool" | |||
config: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use yaml style
sandbox-fusion
as the code execution system, providing the community with a reimplementation ofretools
.