Skip to content

Commit 0bef662

Browse files
committed
Update cpu affinity test; add TLLM NUMA AWARE WORKER AFFINITY as a optional env variable can be set as user
1 parent d5df4ce commit 0bef662

File tree

2 files changed

+11
-5
lines changed

2 files changed

+11
-5
lines changed

.github/workflows/e2e_ppo_grpo_trainer_trtllm_cpu_affinity_test.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ jobs:
8787
fetch-depth: 0
8888
- name: Install the current repository
8989
run: |
90+
pip3 install ray==2.41.0
9091
pip3 install -r requirements-test.txt
9192
pip3 install --no-deps -e .
9293
- name: Prepare GSM8K dataset

verl/workers/rollout/trtllm_rollout/trtllm_async_server.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -328,11 +328,16 @@ async def launch_servers(self):
328328
if not self.is_reward_model
329329
else f"trtllm_server_reward_{self.replica_rank}"
330330
)
331-
332-
runtime_env_vars = {
333-
"RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES": "1",
334-
# "TLLM_NUMA_AWARE_WORKER_AFFINITY": "0"
335-
}
331+
tllm_numa_aware_worker_affinity = os.getenv("TLLM_NUMA_AWARE_WORKER_AFFINITY")
332+
if tllm_numa_aware_worker_affinity == "0":
333+
runtime_env_vars = {
334+
"RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES": "1",
335+
"TLLM_NUMA_AWARE_WORKER_AFFINITY": "0"
336+
}
337+
else:
338+
runtime_env_vars = {
339+
"RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES": "1",
340+
}
336341
server = TRTLLMHttpServer.options(
337342
scheduling_strategy=ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy(
338343
node_id=node_id,

0 commit comments

Comments
 (0)