Skip to content

lengran/RecSys-Training-Planner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RecSys-Training-Planner

Usage

Clone https://github.com/STAR-Laboratory/Accelerating-RecSys-Training and place files accordingly.

Code base:

commit 396409aa1fe3eb606c726bc3f6245b44201f30c8 (origin/main, origin/HEAD, main)
Author: madnan92 <[email protected]>
Date:   Sun Sep 17 17:10:02 2023 -0700

    Updated

Necessary code modifications for my pytorch runtime environment

Note: These modifications are specifically for python 3.8.12 + pytorch 1.10. Different software environment might need or need not these modifications to run the stock FAE codes.

  1. Replace
with torch.autograd.profiler.profile(args.enable_profiling, use_gpu) as prof:

With

with torch.autograd.profiler.profile(enabled=args.enable_profiling, use_cuda=use_gpu) as prof:
  1. Replace

dlrm_fae.py line 1390 and line 1726

hot_row = emb_dict[(emb_no, emb_row)]

With

hot_row = int(emb_dict[(emb_no, emb_row)])
  1. Add
...... \
--arch-embedding-size="987994-4162024-9439"\

to the end of 'TBSM\run_fae_profiler.sh'

  1. TBSM/tbsm_fae.py line 714

Replace

hot_row = emb_dict[(emb_no, emb_row)]

With

hot_row = int(emb_dict[(emb_no, emb_row)])

Note

Both qr_flag and md_flag for the embedding layer are not supported.

num_workers in dataloader is not supported.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published