Skip to content

feat: sandbox fusion for multi-turn #1525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Irvingwangjr
Copy link
Collaborator

@Irvingwangjr Irvingwangjr commented May 14, 2025

  • As users of veRL, we want to allow the model to call certain tools during Actor rollout, incorporating the results into the training process.
  • We aim to support tool-calling capabilities of inference engines using sandbox-fusion as the code execution system, providing the community with a reimplementation of retools.

@CLAassistant
Copy link

CLAassistant commented May 14, 2025

CLA assistant check
All committers have signed the CLA.

@Irvingwangjr Irvingwangjr changed the title feat: sandbox fusion for multi-turn [DONOTMERGE]feat: sandbox fusion for multi-turn May 14, 2025
'''

expect_turn_0 = '''
Okay, so I need to find out how many students at Dala High School are not taking any of the three classes: Math, Science, or English. The total number of students is 152. Let me see... I remember this is a problem about sets and maybe using the principle of inclusion-exclusion. Let me recall how that works.\n\nFirst, the inclusion-exclusion principle for three sets says that the total number of students taking at least one of the classes is equal to the sum of the numbers in each individual class, minus the sum of the numbers in each pair of classes, plus the number in all three classes. Then, subtract that total from the overall number of students to get those not taking any of the three. \n\nLet me write that down step by step. Let M be the set of students taking Math, S for Science, and E for English. Then:\n\nTotal in at least one class = |M ∪ S ∪ E| = |M| + |S| + |E| - |M ∩ S| - |M ∩ E| - |S ∩ E| + |M ∩ S ∩ E|\n\nGiven the numbers:\n\n|M| = 100\n\n|S| = 94\n\n|E| = 57\n\n|M ∩ S| = 73\n\n|M ∩ E| = 24\n\n|S ∩ E| = 27\n\n|M ∩ S ∩ E| = 22\n\nSo plugging these into the formula:\n\nTotal = 100 + 94 + 57 - 73 - 24 - 27 + 22\n\nLet me compute that step by step using code to ensure accuracy.\n\n<tool_call>[{"arguments": {"code": "M = 100\\nS = 94\\nE = 57\\nM_S = 73\\nM_E = 24\\nS_E = 27\\nM_S_E = 22\\n\\ntotal_in_any = M + S + E - M_S - M_E - S_E + M_S_E\\nstudents_neither = 152 - total_in_any\\nprint(students_neither)", "language": "python"}, "name": "calc_code_result"}]</tool_call>\n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对的,应该是会 format toll_call 到句尾,也有一种可行的方式是保留 message 形式,用 待测模型的 tokenizer 来 format,避免测试模型的 chat template 不同导致的 gap

'''

tool_return_0 = '''
\n<interpreter>\n3\n</interpreter>\n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里正常情况下不应该再有 retool 引入的 tag,应该直接返回 3,另外 tool turn 本身在 qwen3 下会添加 <tool_response> tag ,可以用 tokenizer apply chat template 看下


return result, result, {}

async def execute_code(self,instance_id,code):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 execute_code 是否可以传一些 metrics 给 execute 返回?让用户可以监控服务质量?

logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))


class PrimeTool(BaseTool):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 Prime 调和 rollout 调不能合并的本质原因是?

"ground_truth": ground_truth,
"reward": [],
}
print(f"self._instance_dict: {self._instance_dict}, prime_tools create are called")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 logger

# we should always expect this since we don't have correct answer
if metadata["run_status"] == "Finished":
actual_output = metadata["stdout"] if metadata["stdout"] is not None else ""
print("actual_output from sandbox fusion: ",actual_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 logger

async def calc_reward(self, instance_id: str, **kwargs) -> str:
# this code only called as a cumulation reward, so we return the sandbox result
# only for unit test to do any kind of verification
print(f"self._instance_dict: {self._instance_dict}, prime_tools calc_reward are called")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 logger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我稍后在整体逻辑搞定后统一lint一下

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些代码已经合并到main了,你的分支merge下main,避免文件冲突

@Irvingwangjr Irvingwangjr changed the title [DONOTMERGE]feat: sandbox fusion for multi-turn feat: sandbox fusion for multi-turn May 23, 2025
@chenhaiq
Copy link
Collaborator

@Irvingwangjr commit有12个,而且很多标题相同,最好把commit合并成一个,force push过来。这样合并到main之后,其他人比较容易找到对应的改动

@Irvingwangjr Irvingwangjr force-pushed the feat/fc-sandbox branch 2 times, most recently from c6e7fd0 to acd48e1 Compare May 26, 2025 02:36
@feifeibear
Copy link
Collaborator

@Irvingwangjr commit有12个,而且很多标题相同,最好把commit合并成一个,force push过来。这样合并到main之后,其他人比较容易找到对应的改动

这个没事,最后都是 squash merge ,在 main 里面就是一个 commit。

sandbox_url = ""


def get_sandbox_fusion_data():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use message list format

{
"input": "

system\nYou are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"code_interpreter\", \"description\": \"A tool for executing code.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The code to execute.\", \"enum\": null}}, \"required\": [\"code\"]}, \"strict\": false}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we transform this into typical message list used in openai compatible api ?

@@ -0,0 +1,17 @@
tools:
- class_name: "verl.tools.sandbox_fusion_tools.SandboxFusionTool"
config: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use yaml style

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants