Skip to content

Commit 876bcd3

Browse files
committed
feat: update log
1 parent 82681fc commit 876bcd3

File tree

3 files changed

+40
-41
lines changed

3 files changed

+40
-41
lines changed

tests/workers/rollout/resource/tool_configs/sandbox_fusion_tool_config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ tools:
66
tool_schema:
77
type: "function"
88
function:
9-
name: "calc_code_result"
9+
name: "code_interpreter"
1010
description: "A tool for executing code."
1111
parameters:
1212
type: "object"

tests/workers/rollout/test_sglang_async_rollout_prime_tools.py

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -32,21 +32,21 @@
3232

3333
sandbox_url = ""
3434

35-
def get_prime_data():
35+
def get_sandbox_fusion_data():
3636
prompt = '''
3737
Solve the following problem step by step. You now have the ability to selectively write executable Python code to enhance your reasoning process. \n\n**user question:**\nThere are 152 students at Dala High School. Assume the following: \n- 100 students take a Math class \n- 94 students take a Science class \n- 57 students take an English class \n- 73 students take a Math class and a Science class \n- 24 students take a Math class and an English class \n- 27 students take a Science class and an English class \n- 22 students take a Math class and a Science class and an English class\n \nHow many students take neither a Math class nor a Science class nor an Eglish class?\n\nRemember to place the final answer in the last part using the format: \n<answer>\n\boxed{'The final answer goes here.'}\n</answer>
3838
'''
3939

4040
expect_turn_0 = '''
41-
Okay, so I need to find out how many students at Dala High School are not taking any of the three classes: Math, Science, or English. The total number of students is 152. Let me see... I remember this is a problem about sets and maybe using the principle of inclusion-exclusion. Let me recall how that works.\n\nFirst, the inclusion-exclusion principle for three sets says that the total number of students taking at least one of the classes is equal to the sum of the numbers in each individual class, minus the sum of the numbers in each pair of classes, plus the number in all three classes. Then, subtract that total from the overall number of students to get those not taking any of the three. \n\nLet me write that down step by step. Let M be the set of students taking Math, S for Science, and E for English. Then:\n\nTotal in at least one class = |M ∪ S ∪ E| = |M| + |S| + |E| - |M ∩ S| - |M ∩ E| - |S ∩ E| + |M ∩ S ∩ E|\n\nGiven the numbers:\n\n|M| = 100\n\n|S| = 94\n\n|E| = 57\n\n|M ∩ S| = 73\n\n|M ∩ E| = 24\n\n|S ∩ E| = 27\n\n|M ∩ S ∩ E| = 22\n\nSo plugging these into the formula:\n\nTotal = 100 + 94 + 57 - 73 - 24 - 27 + 22\n\nLet me compute that step by step using code to ensure accuracy.\n\n<tool_call>[{"arguments": {"code": "M = 100\\nS = 94\\nE = 57\\nM_S = 73\\nM_E = 24\\nS_E = 27\\nM_S_E = 22\\n\\ntotal_in_any = M + S + E - M_S - M_E - S_E + M_S_E\\nstudents_neither = 152 - total_in_any\\nprint(students_neither)", "language": "python"}, "name": "calc_code_result"}]</tool_call>\n
41+
Okay, so I need to find out how many students at Dala High School are not taking any of the three classes: Math, Science, or English. The total number of students is 152. Let me see... I remember this is a problem about sets and maybe using the principle of inclusion-exclusion. Let me recall how that works.\n\nFirst, the inclusion-exclusion principle for three sets says that the total number of students taking at least one of the classes is equal to the sum of the numbers in each individual class, minus the sum of the numbers in each pair of classes, plus the number in all three classes. Then, subtract that total from the overall number of students to get those not taking any of the three. \n\nLet me write that down step by step. Let M be the set of students taking Math, S for Science, and E for English. Then:\n\nTotal in at least one class = |M ∪ S ∪ E| = |M| + |S| + |E| - |M ∩ S| - |M ∩ E| - |S ∩ E| + |M ∩ S ∩ E|\n\nGiven the numbers:\n\n|M| = 100\n\n|S| = 94\n\n|E| = 57\n\n|M ∩ S| = 73\n\n|M ∩ E| = 24\n\n|S ∩ E| = 27\n\n|M ∩ S ∩ E| = 22\n\nSo plugging these into the formula:\n\nTotal = 100 + 94 + 57 - 73 - 24 - 27 + 22\n\nLet me compute that step by step using code to ensure accuracy.\n\n<tool_call>[{"arguments": {"code": "M = 100\\nS = 94\\nE = 57\\nM_S = 73\\nM_E = 24\\nS_E = 27\\nM_S_E = 22\\n\\ntotal_in_any = M + S + E - M_S - M_E - S_E + M_S_E\\nstudents_neither = 152 - total_in_any\\nprint(students_neither)", "language": "python"}, "name": "code_interpreter"}]</tool_call>\n
4242
'''
4343

4444
tool_return_0 = '''
4545
\n<interpreter>\n3\n</interpreter>\n
4646
'''
4747

4848
expect_turn_1 = '''
49-
\nThe code returns 3, indicating that 3 students take none of the classes. To verify, I'll break down each category to ensure the total matches:\n\n- Only Math and Science: 73 - 22 = 51\n- Only Math and English: 24 - 22 = 2\n- Only Science and English: 27 - 22 = 5\n- Only Math: 100 - 51 - 2 - 22 = 25\n- Only Science: 94 - 51 - 5 - 22 = 16\n- Only English: 57 - 2 - 5 - 22 = 28\n\nSumming all categories:\n\n<tool_call>[{"arguments": {"code": "only_M_S = 73 - 22\\nonly_M_E = 24 - 22\\nonly_S_E = 27 - 22\\n\\nonly_M = 100 - only_M_S - only_M_E - 22\\nonly_S = 94 - only_M_S - only_S_E - 22\\nonly_E = 57 - only_M_E - only_S_E - 22\\n\\ntotal_verify = only_M + only_S + only_E + only_M_S + only_M_E + only_S_E + 22\\nprint(total_verify)", "language": "python"}, "name": "calc_code_result"}]</tool_call>\n
49+
\nThe code returns 3, indicating that 3 students take none of the classes. To verify, I'll break down each category to ensure the total matches:\n\n- Only Math and Science: 73 - 22 = 51\n- Only Math and English: 24 - 22 = 2\n- Only Science and English: 27 - 22 = 5\n- Only Math: 100 - 51 - 2 - 22 = 25\n- Only Science: 94 - 51 - 5 - 22 = 16\n- Only English: 57 - 2 - 5 - 22 = 28\n\nSumming all categories:\n\n<tool_call>[{"arguments": {"code": "only_M_S = 73 - 22\\nonly_M_E = 24 - 22\\nonly_S_E = 27 - 22\\n\\nonly_M = 100 - only_M_S - only_M_E - 22\\nonly_S = 94 - only_M_S - only_S_E - 22\\nonly_E = 57 - only_M_E - only_S_E - 22\\n\\ntotal_verify = only_M + only_S + only_E + only_M_S + only_M_E + only_S_E + 22\\nprint(total_verify)", "language": "python"}, "name": "code_interpreter"}]</tool_call>\n
5050
'''
5151

5252
tool_return_1 = '''
@@ -215,23 +215,23 @@ def multiple_code_data(self):
215215
])
216216

217217
@pytest.fixture
218-
def prime_data(self):
219-
return get_prime_data()
218+
def sandbox_fusion_data(self):
219+
return get_sandbox_fusion_data()
220220

221221
@pytest.fixture
222-
def prime_rollout_config(self):
222+
def sandbox_fusion_rollout_config(self):
223223
max_prompt_length = 1024
224224
max_prompt_length = 1024
225225
max_response_length = 1024
226226
dtype = "bfloat16"
227227
tensor_parallel_size = 1
228-
tool_path = "./resource/tool_configs/prime_tool_config"
228+
tool_path = "./resource/tool_configs/sandbox_fusion_tool_config"
229229
rollout_config = get_rollout_config(max_response_length, max_prompt_length, dtype, tensor_parallel_size,tool_path)
230230
return rollout_config
231231

232232
@pytest.fixture
233-
def prime_data_proto(self,prime_data,qwen_tokenizer):
234-
preencode_prompts,_,_ = prime_data
233+
def sandbox_data_proto(self,sandbox_fusion_data,qwen_tokenizer):
234+
preencode_prompts,_,_ = sandbox_fusion_data
235235
prompts = [qwen_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True) for message in preencode_prompts]
236236
input_ids, attention_mask, position_ids = prepare_inputs(qwen_tokenizer, prompts, 1000)
237237
prompt_dict = TensorDict(
@@ -244,7 +244,7 @@ def prime_data_proto(self,prime_data,qwen_tokenizer):
244244
)
245245
messages = np.asarray(preencode_prompts)
246246
tools_kwargs = np.array([{
247-
"calc_code_result": {
247+
"code_interpreter": {
248248
"create_kwargs": {"ground_truth": "test-solution-str"},
249249
},
250250
}],dtype=object)
@@ -256,28 +256,28 @@ def prime_data_proto(self,prime_data,qwen_tokenizer):
256256
@patch.object(AsyncSGLangRollout, '_init_distributed_env', return_value=None)
257257
@patch.object(AsyncSGLangRollout, '_init_inference_engine', return_value=None)
258258
@patch.object(AsyncSGLangRollout, '_init_sampling_params', return_value=None)
259-
def test_tools_registration(self, mock_env,mock_engine,mock_sampling,prime_rollout_config,qwen_tokenizer,qwen_model_config):
260-
rollout = AsyncSGLangRollout(actor_module="", config=prime_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
259+
def test_tools_registration(self, mock_env,mock_engine,mock_sampling,sandbox_fusion_rollout_config,qwen_tokenizer,qwen_model_config):
260+
rollout = AsyncSGLangRollout(actor_module="", config=sandbox_fusion_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
261261
assert len(rollout._tool_schemas) == 1
262-
assert "calc_code_result" in rollout._tool_map.keys()
262+
assert "code_interpreter" in rollout._tool_map.keys()
263263
from verl.tools.sandbox_fusion_tools import SandboxFusionTool
264-
assert isinstance(rollout._tool_map["calc_code_result"],SandboxFusionTool)
264+
assert isinstance(rollout._tool_map["code_interpreter"],SandboxFusionTool)
265265
assert rollout._tool_call_parser_type == "qwen25"
266266

267267

268268
@patch.object(AsyncSGLangRollout, '_init_distributed_env', return_value=None)
269269
@patch.object(AsyncSGLangRollout, '_init_inference_engine', return_value=None)
270270
@patch.object(AsyncSGLangRollout, '_init_sampling_params', return_value=None)
271-
def test_rollout_req_creation(self,mock_env,mock_engine,mock_sampling,prime_rollout_config,qwen_tokenizer,qwen_model_config,prime_data_proto):
272-
rollout = AsyncSGLangRollout(actor_module="", config=prime_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
273-
req_list = rollout._preprocess_prompt_to_async_rollout_requests(prime_data_proto,n=1)
271+
def test_rollout_req_creation(self,mock_env,mock_engine,mock_sampling,sandbox_fusion_rollout_config,qwen_tokenizer,qwen_model_config,sandbox_data_proto):
272+
rollout = AsyncSGLangRollout(actor_module="", config=sandbox_fusion_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
273+
req_list = rollout._preprocess_prompt_to_async_rollout_requests(sandbox_data_proto,n=1)
274274
assert len(req_list) == 1
275275
assert req_list[0].state == AsyncRolloutRequestStateEnum.PENDING
276276
assert req_list[0].tools == [
277277
OpenAIFunctionToolSchema(
278278
type="function",
279279
function=OpenAIFunctionSchema(
280-
name="calc_code_result",
280+
name="code_interpreter",
281281
description="A tool for executing code.",
282282
parameters=OpenAIFunctionParametersSchema(
283283
type="object",
@@ -299,14 +299,14 @@ def test_rollout_req_creation(self,mock_env,mock_engine,mock_sampling,prime_roll
299299
@patch.object(AsyncSGLangRollout, '_init_distributed_env', return_value=None)
300300
@patch.object(AsyncSGLangRollout, '_init_inference_engine', return_value=None)
301301
@patch.object(AsyncSGLangRollout, '_init_sampling_params', return_value=None)
302-
def test_over_size_case(self,mock_env,mock_engine,mock_sampling,prime_rollout_config,qwen_tokenizer,qwen_model_config,prime_data_proto,prime_data):
303-
prime_rollout_config.multi_turn.max_turns = 1
304-
rollout = AsyncSGLangRollout(actor_module="", config=prime_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
305-
req = rollout._preprocess_prompt_to_async_rollout_requests(prime_data_proto,n=1)[0]
302+
def test_over_size_case(self,mock_env,mock_engine,mock_sampling,sandbox_fusion_rollout_config,qwen_tokenizer,qwen_model_config,sandbox_data_proto,sandbox_fusion_data):
303+
sandbox_fusion_rollout_config.multi_turn.max_turns = 1
304+
rollout = AsyncSGLangRollout(actor_module="", config=sandbox_fusion_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
305+
req = rollout._preprocess_prompt_to_async_rollout_requests(sandbox_data_proto,n=1)[0]
306306
req = MagicMock(wraps=req, spec=AsyncRolloutRequest)
307307
req.finalize = MagicMock()
308308
req_list = [req]
309-
_,expect_turn_array,tool_return_array = prime_data
309+
_,expect_turn_array,tool_return_array = sandbox_fusion_data
310310
# here we mock a meta info with 'length'. indicate the response is truncate
311311
rollout._handle_engine_call=MagicMock()
312312
future = asyncio.Future()
@@ -323,7 +323,7 @@ def test_over_size_case(self,mock_env,mock_engine,mock_sampling,prime_rollout_co
323323
assert len(output_req_list) == 1
324324
output_req = output_req_list[0]
325325
assert output_req.state == AsyncRolloutRequestStateEnum.COMPLETED
326-
assert output_req.reward_scores == {'calc_code_result': []}
326+
assert output_req.reward_scores == {'code_interpreter': []}
327327
# we should only have two message, one for prompt, second for response.
328328
assert len(output_req.messages) == 2
329329
assert output_req.messages[1] == Message(
@@ -337,15 +337,15 @@ def test_over_size_case(self,mock_env,mock_engine,mock_sampling,prime_rollout_co
337337
@patch.object(AsyncSGLangRollout, '_init_distributed_env', return_value=None)
338338
@patch.object(AsyncSGLangRollout, '_init_inference_engine', return_value=None)
339339
@patch.object(AsyncSGLangRollout, '_init_sampling_params', return_value=None)
340-
def test_tool_call_basic_case(self,mock_env,mock_engine,mock_sampling,prime_rollout_config,qwen_tokenizer,qwen_model_config,prime_data_proto,prime_data):
341-
prime_rollout_config.multi_turn.max_turns = 10
342-
rollout = AsyncSGLangRollout(actor_module="", config=prime_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
343-
self._tool_map["calc_code_result"].sandbox_fusion_url = sandbox_url
344-
req = rollout._preprocess_prompt_to_async_rollout_requests(prime_data_proto,n=1)[0]
340+
def test_tool_call_basic_case(self,mock_env,mock_engine,mock_sampling,sandbox_fusion_rollout_config,qwen_tokenizer,qwen_model_config,sandbox_data_proto,sandbox_fusion_data):
341+
sandbox_fusion_rollout_config.multi_turn.max_turns = 10
342+
rollout = AsyncSGLangRollout(actor_module="", config=sandbox_fusion_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
343+
self._tool_map["code_interpreter"].sandbox_fusion_url = sandbox_url
344+
req = rollout._preprocess_prompt_to_async_rollout_requests(sandbox_data_proto,n=1)[0]
345345
req = MagicMock(wraps=req, spec=AsyncRolloutRequest)
346346
req.finalize = MagicMock()
347347
req_list = [req]
348-
_,expect_turn_array,tool_return_array = prime_data
348+
_,expect_turn_array,tool_return_array = sandbox_fusion_data
349349
# here we mock a meta info with 'length'. indicate the response is truncate
350350
rollout._handle_engine_call=MagicMock()
351351
futures = [asyncio.Future() for i in expect_turn_array]
@@ -368,7 +368,7 @@ def test_tool_call_basic_case(self,mock_env,mock_engine,mock_sampling,prime_roll
368368
output_req = output_req_list[0]
369369
assert output_req.state == AsyncRolloutRequestStateEnum.COMPLETED
370370
# here we verify whether the code sandbox is executed correctly
371-
assert output_req.reward_scores == {'calc_code_result': ['3', '149']}
371+
assert output_req.reward_scores == {'code_interpreter': ['3', '149']}
372372
assert rollout._handle_engine_call.call_count == 3
373373
assert len(output_req.messages) == 6 # user + 3*assistant + 2*tool_call
374374
code_counter =0
@@ -383,17 +383,17 @@ def test_tool_call_basic_case(self,mock_env,mock_engine,mock_sampling,prime_roll
383383
@patch.object(AsyncSGLangRollout, '_init_distributed_env', return_value=None)
384384
@patch.object(AsyncSGLangRollout, '_init_inference_engine', return_value=None)
385385
@patch.object(AsyncSGLangRollout, '_init_sampling_params', return_value=None)
386-
def test_tool_call_batch_case(self,mock_env,mock_engine,mock_sampling,prime_rollout_config,qwen_tokenizer,qwen_model_config,prime_data_proto,prime_data):
387-
prime_rollout_config.multi_turn.max_turns = 10
388-
rollout = AsyncSGLangRollout(actor_module="", config=prime_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
389-
self._tool_map["calc_code_result"].sandbox_fusion_url = sandbox_url
390-
req = rollout._preprocess_prompt_to_async_rollout_requests(prime_data_proto,n=1)[0]
386+
def test_tool_call_batch_case(self,mock_env,mock_engine,mock_sampling,sandbox_fusion_rollout_config,qwen_tokenizer,qwen_model_config,sandbox_data_proto,sandbox_fusion_data):
387+
sandbox_fusion_rollout_config.multi_turn.max_turns = 10
388+
rollout = AsyncSGLangRollout(actor_module="", config=sandbox_fusion_rollout_config, tokenizer=qwen_tokenizer, model_hf_config=qwen_model_config)
389+
self._tool_map["code_interpreter"].sandbox_fusion_url = sandbox_url
390+
req = rollout._preprocess_prompt_to_async_rollout_requests(sandbox_data_proto,n=1)[0]
391391
req_nums = 100
392392
req_list = []
393393
req_turns_counter = {}
394394
# this map should a Map[id:List[Futures]]
395395
req_turns_map ={}
396-
_,expect_turn_array,tool_return_array = prime_data
396+
_,expect_turn_array,tool_return_array = sandbox_fusion_data
397397
for i in range(req_nums):
398398
_temp_req = deepcopy(req)
399399
_temp_req.batch_data_id = i
@@ -429,7 +429,7 @@ async def hacked_handle_engine_call(self, _req: AsyncRolloutRequest,do_sample: b
429429
for output_req in output_req_list:
430430
assert output_req.state == AsyncRolloutRequestStateEnum.COMPLETED
431431
# here we verify whether the code sandbox is executed correctly
432-
assert output_req.reward_scores == {'calc_code_result': ['3', '149']}
432+
assert output_req.reward_scores == {'code_interpreter': ['3', '149']}
433433
assert len(output_req.messages) == 6 # user + 3*assistant + 2*tool_call
434434
code_counter =0
435435
for msg in output_req.messages:

verl/tools/sandbox_fusion_tools.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ def __init__(self, config: dict, tool_schema: OpenAIFunctionToolSchema):
109109
_tool_schema = OpenAIFunctionToolSchema.model_validate({
110110
"type": "function",
111111
"function": {
112-
"name": "calc_code_result",
112+
"name": "code_interpreter",
113113
"description": "A tool for execute code",
114114
"parameters": {
115115
"type": "object",
@@ -173,7 +173,6 @@ def execute_code(self,instance_id,code):
173173
language: str
174174
)
175175
'''
176-
# TODO make this into asyncio format:
177176
result_status, metadata = _process_single_case(0, None, None,self.sandbox_fusion_url, code, 30, "python")
178177
# we should always expect this since we don't have correct answer
179178
if metadata["run_status"] == "Finished":

0 commit comments

Comments
 (0)