RecSys-Training-Planner

Usage

Clone https://github.com/STAR-Laboratory/Accelerating-RecSys-Training and place files accordingly.

Code base:

commit 396409aa1fe3eb606c726bc3f6245b44201f30c8 (origin/main, origin/HEAD, main)
Author: madnan92 <[email protected]>
Date:   Sun Sep 17 17:10:02 2023 -0700

    Updated

Necessary code modifications for my pytorch runtime environment

Note: These modifications are specifically for python 3.8.12 + pytorch 1.10. Different software environment might need or need not these modifications to run the stock FAE codes.

Replace

with torch.autograd.profiler.profile(args.enable_profiling, use_gpu) as prof:

With

with torch.autograd.profiler.profile(enabled=args.enable_profiling, use_cuda=use_gpu) as prof:

Replace

dlrm_fae.py line 1390 and line 1726

hot_row = emb_dict[(emb_no, emb_row)]

With

hot_row = int(emb_dict[(emb_no, emb_row)])

Add

...... \
--arch-embedding-size="987994-4162024-9439"\

to the end of 'TBSM\run_fae_profiler.sh'

TBSM/tbsm_fae.py line 714

Replace

hot_row = emb_dict[(emb_no, emb_row)]

With

hot_row = int(emb_dict[(emb_no, emb_row)])

Note

Both qr_flag and md_flag for the embedding layer are not supported.

num_workers in dataloader is not supported.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
DLRM		DLRM
TBSM		TBSM
planner		planner
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RecSys-Training-Planner

Usage

Necessary code modifications for my pytorch runtime environment

Note

About

Uh oh!

Releases

Packages

Languages

License

lengran/RecSys-Training-Planner

Folders and files

Latest commit

History

Repository files navigation

RecSys-Training-Planner

Usage

Necessary code modifications for my pytorch runtime environment

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages