Major Features and Improvements
Train/Eval/Export
- Support get null value for int/float dtype features when use negative sampler #202
- Support freeze embedding parameters #206
- Add mixed_precision bf16/fp16 and gradient accumulation support #220
- Add fp16 embedding dtype support #221
- Add TrainPipelineBase to support model w/o sparse parameters #222
- Add EmbeddingCollection quant support #265
- Optimize sequence emb inference speed #266
Model
- Add DlrmHSTU model #224 #227 #231 #232 #237 #250 #257
- Add DCN_V1 model #235
- Add DCN_V2 and xDeepFM model #242
- Add WideAndDeep model and wide init_fn #212
- Add sequence self_attention encoder #251
- Add binary focal loss #208
- Add xauc and grouped xauc #252
- Add feature selection for DSSM_V2 #219
- Add use_ln option for MLP module #223
Feature
- Add const input for feature #210
- Expr feature support value_dim #216
- Support feature only used as fg dag intermediate result (stub_type=true) #218
Dataset
- Sampler support odps schema #267
Upgrade
- Upgrade pytorch to v2.8 and torchrec to v1.3.0 #241
Python
Note
For TorchEasyRec 0.9.x, you should use Docker image version 0.9.
- For the GPU version (CUDA 12.6):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.9-cu126- PyTorch: v2.8 CUDA: v12.6 FBGEMM: v1.3.0 TorchRec: v1.3.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.9-cpu- PyTorch: v2.8 FBGEMM: v1.3.0 TorchRec: v1.3.0 Python: v3.11
Bug Fixes and Other Changes
- [feat] prune mem of one shard > mem of one device for DynamicProgrammingProposer by @tiankongdeguiji in #194
- [feat] use oss accelerate endpoint by @tiankongdeguiji in #203
- [bugfix] fix force_base_data_group when export model by @tiankongdeguiji in #200
- [bugfix] fix int32 and double column type of negative sampler table by @tiankongdeguiji in #204
- [bugfix] remove redundant print in sampler by @tiankongdeguiji in #207
- [bugfix] fix hitrate hang on OdpsWriter & refactor broadcast_object and gather_object pg by @tiankongdeguiji in #205
- [feat] upgrade pyfg to 0.6.9 and refine expr/overlap feature doc by @tiankongdeguiji in #199
- create fg.json if exist will error by @chengaofei in #211
- [bugfix] add missing wide_and_deep doc index by @tiankongdeguiji in #214
- Feature/fix dense embedding export in dssmv2 by @eric-gecheng in #213
- [bugfix] support remove bucketizer for sequence feature and add tests by @tiankongdeguiji in #215
- [feat] add error and warning for restore_model when model path not exists by @tiankongdeguiji in #217
- [bugfix] prevent unittest nightly timeout by @tiankongdeguiji in #225
- [bugfix] fix tzrec optimizer not update params by @tiankongdeguiji in #226
- [bugfix] fix pyfg oss accelerate url by @tiankongdeguiji in #228
- rocket launching train failed by @chengaofei in #229
- [bugfix] add ops init py & add build wheel ci test by @tiankongdeguiji in #234
- increase dlrm and rocket_launching benchmark by @chengaofei in #233
- [bugfix] fix value cannot be converted to type int32 without overflow in trt test by @tiankongdeguiji in #243
- [feat] increase benchmark timeout by @tiankongdeguiji in #244
- [bugfix] fix string id support for tdm sampler by @tiankongdeguiji in #245
- [bugfix] fix tdm user defined attr delim and optimize attrs of TDMSampler by @tiankongdeguiji in #248
- [bugfix] fix HardNegativeSampler with string id by @tiankongdeguiji in #249
- [bugfix] fix pyre check by @tiankongdeguiji in #246
- [bugfix] fix save checkpoint at epoch 0 when save_checkpoint_epochs > 1 by @tiankongdeguiji in #253
- [bugfix] refine distinguish sparse module in create_train_pipeline to fix continue training failure by @tiankongdeguiji in #254
- [bugfix] fix tdm retrieval nccl hang when use odps writer by @tiankongdeguiji in #255
- [feat] refactor ec_list to ec_dict for TDMEmbedding export by @tiankongdeguiji in #258
- [bugfix] fix hard negative sampler with zero hard negative indices by @tiankongdeguiji in #259
- [bugfix] fix sampler with string id memory leak by @tiankongdeguiji in #260
- [feat] bump up pyfg to 0.7.3 by @tiankongdeguiji in #263
- support odps three schema by @chengaofei in #264
- [bugfix]nccl timeout by @eric-gecheng in #262
- [bugfix] fix quant ec doc by @tiankongdeguiji in #269
- [bugfix] fix odps dataset test by @tiankongdeguiji in #270
Full Changelog: v0.8.0...v0.9.0