Clone https://github.com/STAR-Laboratory/Accelerating-RecSys-Training and place files accordingly.
Code base:
commit 396409aa1fe3eb606c726bc3f6245b44201f30c8 (origin/main, origin/HEAD, main)
Author: madnan92 <[email protected]>
Date: Sun Sep 17 17:10:02 2023 -0700
Updated
Note: These modifications are specifically for python 3.8.12 + pytorch 1.10. Different software environment might need or need not these modifications to run the stock FAE codes.
- Replace
with torch.autograd.profiler.profile(args.enable_profiling, use_gpu) as prof:With
with torch.autograd.profiler.profile(enabled=args.enable_profiling, use_cuda=use_gpu) as prof:- Replace
dlrm_fae.py line 1390 and line 1726
hot_row = emb_dict[(emb_no, emb_row)]With
hot_row = int(emb_dict[(emb_no, emb_row)])- Add
...... \
--arch-embedding-size="987994-4162024-9439"\to the end of 'TBSM\run_fae_profiler.sh'
- TBSM/tbsm_fae.py line 714
Replace
hot_row = emb_dict[(emb_no, emb_row)]With
hot_row = int(emb_dict[(emb_no, emb_row)])Both qr_flag and md_flag for the embedding layer are not supported.
num_workers in dataloader is not supported.