feat: sandbox fusion for multi-turn #1525

Irvingwangjr · 2025-05-14T15:45:16Z

As users of veRL, we want to allow the model to call certain tools during Actor rollout, incorporating the results into the training process.
We aim to support tool-calling capabilities of inference engines using sandbox-fusion as the code execution system, providing the community with a reimplementation of retools.

CLAassistant · 2025-05-14T15:45:24Z

All committers have signed the CLA.

SwordFaith · 2025-05-14T15:58:18Z

tests/workers/rollout/test_sglang_async_rollout_prime_tools.py

+        '''
+
+        expect_turn_0 = '''
+        Okay, so I need to find out how many students at Dala High School are not taking any of the three classes: Math, Science, or English. The total number of students is 152. Let me see... I remember this is a problem about sets and maybe using the principle of inclusion-exclusion. Let me recall how that works.\n\nFirst, the inclusion-exclusion principle for three sets says that the total number of students taking at least one of the classes is equal to the sum of the numbers in each individual class, minus the sum of the numbers in each pair of classes, plus the number in all three classes. Then, subtract that total from the overall number of students to get those not taking any of the three. \n\nLet me write that down step by step. Let M be the set of students taking Math, S for Science, and E for English. Then:\n\nTotal in at least one class = |M ∪ S ∪ E| = |M| + |S| + |E| - |M ∩ S| - |M ∩ E| - |S ∩ E| + |M ∩ S ∩ E|\n\nGiven the numbers:\n\n|M| = 100\n\n|S| = 94\n\n|E| = 57\n\n|M ∩ S| = 73\n\n|M ∩ E| = 24\n\n|S ∩ E| = 27\n\n|M ∩ S ∩ E| = 22\n\nSo plugging these into the formula:\n\nTotal = 100 + 94 + 57 - 73 - 24 - 27 + 22\n\nLet me compute that step by step using code to ensure accuracy.\n\n<tool_call>[{"arguments": {"code": "M = 100\\nS = 94\\nE = 57\\nM_S = 73\\nM_E = 24\\nS_E = 27\\nM_S_E = 22\\n\\ntotal_in_any = M + S + E - M_S - M_E - S_E + M_S_E\\nstudents_neither = 152 - total_in_any\\nprint(students_neither)", "language": "python"}, "name": "calc_code_result"}]</tool_call>\n


这里对的，应该是会 format toll_call 到句尾，也有一种可行的方式是保留 message 形式，用待测模型的 tokenizer 来 format，避免测试模型的 chat template 不同导致的 gap

SwordFaith · 2025-05-14T15:59:28Z

tests/workers/rollout/test_sglang_async_rollout_prime_tools.py

+        '''
+
+        tool_return_0 = '''
+        \n<interpreter>\n3\n</interpreter>\n


这里正常情况下不应该再有 retool 引入的 tag，应该直接返回 3，另外 tool turn 本身在 qwen3 下会添加 <tool_response> tag ，可以用 tokenizer apply chat template 看下

SwordFaith · 2025-05-14T16:01:10Z

verl/tools/prime_tools.py

+
+        return result, result, {}
+
+    async def execute_code(self,instance_id,code):


这里 execute_code 是否可以传一些 metrics 给 execute 返回？让用户可以监控服务质量？

SwordFaith · 2025-05-14T16:02:11Z

verl/tools/prime_tools.py

+logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))
+
+
+class PrimeTool(BaseTool):


这里 Prime 调和 rollout 调不能合并的本质原因是？

verl/workers/rollout/schemas.py

feifeibear · 2025-05-15T03:51:27Z

verl/tools/prime_tools.py

+            "ground_truth": ground_truth,
+            "reward": [],
+        }
+        print(f"self._instance_dict: {self._instance_dict}, prime_tools create are called")


改成 logger

feifeibear · 2025-05-15T03:51:30Z

verl/tools/prime_tools.py

+        # we should always expect this since we don't have correct answer
+        if metadata["run_status"] == "Finished":
+            actual_output = metadata["stdout"] if metadata["stdout"] is not None else ""
+            print("actual_output from sandbox fusion: ",actual_output)


改成 logger

feifeibear · 2025-05-15T03:51:34Z

verl/tools/prime_tools.py

+    async def calc_reward(self, instance_id: str, **kwargs) -> str:
+        # this code only called as a cumulation reward, so we return the sandbox result
+        # only for unit test to do any kind of verification
+        print(f"self._instance_dict: {self._instance_dict}, prime_tools calc_reward are called")


改成 logger

我稍后在整体逻辑搞定后统一lint一下

verl/tools/prime_tools.py

verl/tools/sandbox_fusion_tools.py

chenhaiq · 2025-05-16T10:56:28Z

verl/utils/reward_score/sandbox_fusion/utils.py

这些代码已经合并到main了，你的分支merge下main，避免文件冲突

verl/tools/sandbox_fusion_tools.py

.github/workflows/sgl.yml

chenhaiq · 2025-05-26T02:07:35Z

@Irvingwangjr commit有12个，而且很多标题相同，最好把commit合并成一个，force push过来。这样合并到main之后，其他人比较容易找到对应的改动

feifeibear · 2025-05-26T02:43:46Z

@Irvingwangjr commit有12个，而且很多标题相同，最好把commit合并成一个，force push过来。这样合并到main之后，其他人比较容易找到对应的改动

这个没事，最后都是 squash merge ，在 main 里面就是一个 commit。

verl/tools/sandbox_fusion_tools.py

SwordFaith · 2025-05-27T05:42:45Z

tests/workers/rollout/test_sglang_async_rollout_sf_tools.py

+sandbox_url = ""
+
+
+def get_sandbox_fusion_data():


Better use message list format

SwordFaith · 2025-05-27T09:20:08Z

docs/sglang_multiturn/sandbox_fusion.rst

+{
+  "input": "
+
+  system\nYou are a math expert. You are given a question and you need to solve it step by step. Reasoning step by step before any tool call. You should use the `calc_gsm8k_reward` tool after step by step solving the question, before generate final answer at least once and refine your answer if necessary. Put your final answer in the format of `#### <answer>`.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"code_interpreter\", \"description\": \"A tool for executing code.\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The code to execute.\", \"enum\": null}}, \"required\": [\"code\"]}, \"strict\": false}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>\n


Can we transform this into typical message list used in openai compatible api ?

SwordFaith · 2025-05-27T09:22:14Z

examples/sglang_multiturn/config/tool_config/sandbox_fusion_tool_config.yaml

@@ -0,0 +1,17 @@
+tools:
+  - class_name: "verl.tools.sandbox_fusion_tools.SandboxFusionTool"
+    config: {


better use yaml style

tests/e2e/run_gsm8k_fsdp_sgl_multiturn_sf_tool.sh

tests/workers/rollout/test_sglang_async_rollout_sf_tools.py

Irvingwangjr changed the title ~~feat: sandbox fusion for multi-turn~~ [DONOTMERGE]feat: sandbox fusion for multi-turn May 14, 2025

SwordFaith reviewed May 14, 2025

View reviewed changes

feifeibear reviewed May 15, 2025

View reviewed changes

feifeibear reviewed May 16, 2025

View reviewed changes

verl/tools/prime_tools.py Outdated Show resolved Hide resolved

SwordFaith mentioned this pull request May 17, 2025

Multi-turn rollout Update #2 zhaochenyang20/Awesome-ML-SYS-Tutorial#132

Open

44 tasks

chenhaiq mentioned this pull request May 19, 2025

Multi-turn rollout Status & Roadmap #1579

Open

Irvingwangjr force-pushed the feat/fc-sandbox branch from 876bcd3 to caa35a7 Compare May 20, 2025 12:01

chenhaiq reviewed May 20, 2025

View reviewed changes

Irvingwangjr changed the title ~~[DONOTMERGE]feat: sandbox fusion for multi-turn~~ feat: sandbox fusion for multi-turn May 23, 2025

feifeibear reviewed May 26, 2025

View reviewed changes

.github/workflows/sgl.yml Outdated Show resolved Hide resolved

Irvingwangjr force-pushed the feat/fc-sandbox branch 2 times, most recently from c6e7fd0 to acd48e1 Compare May 26, 2025 02:36

feat: add sandbox fusion for multi-turns

a5245db

Irvingwangjr force-pushed the feat/fc-sandbox branch from 0114625 to a5245db Compare May 26, 2025 07:00

feat: fix test

8b59101

SwordFaith reviewed May 27, 2025

View reviewed changes


		return result, result, {}

		async def execute_code(self,instance_id,code):

		logger.setLevel(os.getenv("VERL_LOGGING_LEVEL", "WARN"))


		class PrimeTool(BaseTool):

feat: sandbox fusion for multi-turn #1525

Are you sure you want to change the base?

feat: sandbox fusion for multi-turn #1525

Conversation

Irvingwangjr commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenhaiq commented May 26, 2025

Uh oh!

feifeibear commented May 26, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Irvingwangjr commented May 14, 2025 •

edited

Loading

CLAassistant commented May 14, 2025 •

edited

Loading