Skip to content

Commit c8e3f2f

Browse files
shengfukevinfacebook-github-bot
authored andcommitted
Fixed comm parser issue
Summary: This DIFF is to fix the following two comm parser issue: 1. process_group:init support both u_id and backend_id 2. record_param_comms has different number of input. Reviewed By: shengbao-zheng Differential Revision: D56091619 fbshipit-source-id: 58e12a515b17150ee68557fc6b4ad729e1614d49
1 parent 0a07342 commit c8e3f2f

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

train/comms/pt/commsTraceParser.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ def _parseExecutionTrace(
233233
break
234234

235235
for pg in pgObj:
236-
backendId = pg["backend_id"]
236+
backendId = pg["uid"] if "uid" in pg else pg["backend_id"]
237237
ranks = pg["ranks"]
238238
if isinstance(ranks, list):
239239
pgId = int(pg["pg_name"])
@@ -256,7 +256,7 @@ def _parseExecutionTrace(
256256
for node in in_trace.nodes.values():
257257
if node.name == "record_param_comms":
258258
shift = (
259-
0 if len(node.inputs) == 8 else 1
259+
0 if len(node.inputs) == 8 or len(node.inputs) == 10 else 1
260260
) # wait/barrier ops do not have an input tensor (len=7), shift index one over
261261
newComm = commsArgs()
262262
newComm.id = node.id

0 commit comments

Comments
 (0)