Skip to content

Commit 9c360e0

Browse files
GavinZhu-GMIachartier
authored andcommitted
[None][fix] write per-rank torch profile traces
PyExecutor reads TLLM_TORCH_PROFILE_TRACE directly and every rank calls torch_profiler.export_chrome_trace() on the same path. When TP/PP/DP > 1, the concurrent writes interleave and the resulting file fails to parse in Chrome tracing / Perfetto (bad control character / unterminated string at the byte where one rank's output overran another's). Append the rank to the env-provided path before the first use so each rank writes to its own file. Matches SGLang's scheduler_profiler_mixin filename convention: the user supplies a base path, the runtime adds the per-rank suffix automatically. Example: TLLM_TORCH_PROFILE_TRACE=/tmp/trace.json now produces /tmp/trace-rank-0.json, /tmp/trace-rank-1.json, etc. Signed-off-by: Gavin.Zhu <gavin.z@gmicloud.ai>
1 parent 17ac84c commit 9c360e0

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

tensorrt_llm/_torch/pyexecutor/py_executor.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -936,6 +936,14 @@ def _profiler(self):
936936
prev_device_step_time = None
937937

938938
torch_trace_path = os.environ.get(PROFILE_TRACE_ENV_VAR_NAME, None)
939+
if torch_trace_path is not None:
940+
# Append the rank so each rank writes to its own file. Without
941+
# this, TP/PP/DP > 1 runs have every rank calling
942+
# torch_profiler.export_chrome_trace() on the same path
943+
# concurrently, producing interleaved output that fails to
944+
# parse in Chrome tracing / Perfetto.
945+
trace_base, trace_ext = os.path.splitext(torch_trace_path)
946+
torch_trace_path = f"{trace_base}-rank-{self.global_rank}{trace_ext}"
939947
profile_start_stop = os.environ.get(PROFILE_START_STOP_ENV_VAR_NAME,
940948
None)
941949
enable_torch_trace = bool(torch_trace_path and profile_start_stop)

0 commit comments

Comments
 (0)