Releases: alibaba/TorchEasyRec
Releases · alibaba/TorchEasyRec
v0.9.0
Major Features and Improvements
Train/Eval/Export
- Support get null value for int/float dtype features when use negative sampler #202
- Support freeze embedding parameters #206
- Add mixed_precision bf16/fp16 and gradient accumulation support #220
- Add fp16 embedding dtype support #221
- Add TrainPipelineBase to support model w/o sparse parameters #222
- Add EmbeddingCollection quant support #265
- Optimize sequence emb inference speed #266
Model
- Add DlrmHSTU model #224 #227 #231 #232 #237 #250 #257
- Add DCN_V1 model #235
- Add DCN_V2 and xDeepFM model #242
- Add WideAndDeep model and wide init_fn #212
- Add sequence self_attention encoder #251
- Add binary focal loss #208
- Add xauc and grouped xauc #252
- Add feature selection for DSSM_V2 #219
- Add use_ln option for MLP module #223
Feature
- Add const input for feature #210
- Expr feature support value_dim #216
- Support feature only used as fg dag intermediate result (stub_type=true) #218
Dataset
- Sampler support odps schema #267
Upgrade
- Upgrade pytorch to v2.8 and torchrec to v1.3.0 #241
Python
Note
For TorchEasyRec 0.9.x, you should use Docker image version 0.9.
- For the GPU version (CUDA 12.6):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.9-cu126- PyTorch: v2.8 CUDA: v12.6 FBGEMM: v1.3.0 TorchRec: v1.3.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.9-cpu- PyTorch: v2.8 FBGEMM: v1.3.0 TorchRec: v1.3.0 Python: v3.11
Bug Fixes and Other Changes
- [feat] prune mem of one shard > mem of one device for DynamicProgrammingProposer by @tiankongdeguiji in #194
- [feat] use oss accelerate endpoint by @tiankongdeguiji in #203
- [bugfix] fix force_base_data_group when export model by @tiankongdeguiji in #200
- [bugfix] fix int32 and double column type of negative sampler table by @tiankongdeguiji in #204
- [bugfix] remove redundant print in sampler by @tiankongdeguiji in #207
- [bugfix] fix hitrate hang on OdpsWriter & refactor broadcast_object and gather_object pg by @tiankongdeguiji in #205
- [feat] upgrade pyfg to 0.6.9 and refine expr/overlap feature doc by @tiankongdeguiji in #199
- create fg.json if exist will error by @chengaofei in #211
- [bugfix] add missing wide_and_deep doc index by @tiankongdeguiji in #214
- Feature/fix dense embedding export in dssmv2 by @eric-gecheng in #213
- [bugfix] support remove bucketizer for sequence feature and add tests by @tiankongdeguiji in #215
- [feat] add error and warning for restore_model when model path not exists by @tiankongdeguiji in #217
- [bugfix] prevent unittest nightly timeout by @tiankongdeguiji in #225
- [bugfix] fix tzrec optimizer not update params by @tiankongdeguiji in #226
- [bugfix] fix pyfg oss accelerate url by @tiankongdeguiji in #228
- rocket launching train failed by @chengaofei in #229
- [bugfix] add ops init py & add build wheel ci test by @tiankongdeguiji in #234
- increase dlrm and rocket_launching benchmark by @chengaofei in #233
- [bugfix] fix value cannot be converted to type int32 without overflow in trt test by @tiankongdeguiji in #243
- [feat] increase benchmark timeout by @tiankongdeguiji in #244
- [bugfix] fix string id support for tdm sampler by @tiankongdeguiji in #245
- [bugfix] fix tdm user defined attr delim and optimize attrs of TDMSampler by @tiankongdeguiji in #248
- [bugfix] fix HardNegativeSampler with string id by @tiankongdeguiji in #249
- [bugfix] fix pyre check by @tiankongdeguiji in #246
- [bugfix] fix save checkpoint at epoch 0 when save_checkpoint_epochs > 1 by @tiankongdeguiji in #253
- [bugfix] refine distinguish sparse module in create_train_pipeline to fix continue training failure by @tiankongdeguiji in #254
- [bugfix] fix tdm retrieval nccl hang when use odps writer by @tiankongdeguiji in #255
- [feat] refactor ec_list to ec_dict for TDMEmbedding export by @tiankongdeguiji in #258
- [bugfix] fix hard negative sampler with zero hard negative indices by @tiankongdeguiji in #259
- [bugfix] fix sampler with string id memory leak by @tiankongdeguiji in #260
- [feat] bump up pyfg to 0.7.3 by @tiankongdeguiji in #263
- support odps three schema by @chengaofei in #264
- [bugfix]nccl timeout by @eric-gecheng in #262
- [bugfix] fix quant ec doc by @tiankongdeguiji in #269
- [bugfix] fix odps dataset test by @tiankongdeguiji in #270
Full Changelog: v0.8.0...v0.9.0
v0.8.0
Major Features and Improvements
Train/Eval/Export
- Support eval and save checkpoint by epoch #116
- Support export fp32/fp16/int8/int4/int2 ebc embedding quant model #137
- Enhance export efficiency by restoring state dict directly instead of copying and gathering #177
- Add faiss gpu support for evaluation #170
- Enhance optimizer state loading for changed plans with plan checkpoint #185
- Support tensorboard log for model parameters #181
- Add restore ckpt check for continue train #180
- Add allow_tf32 flag and global embedding param constraint #188
Model
- Add MIND model #119 #123 #157 #172
- Add RocketLaunching model #129
- Add DLRM model #148
- Add MaskNet #179 #187
- Add dice activation and support bn for sequence mlp #107
- Add regression and multi-classification metric #149
- Optimize distributed GAUC memory use #127
- Add SequenceEmbeddingGroup and support jagged forward #152
- Support max sequence length setting for sequence encoder #184
- Support hard negative sampler #195
- Optimize HSTU training and sampling process and add triton ops (WIP) #93 #154
Feature
- Support custom feature and custom sequence feature #144
- Weighted id feature support map dtype #190
- Dump parsed inputs support weighted id and multi-value sequence feature #191
Dataset
- Support dataset shuffle #114
- Optimize performance of ParquetDataset and Rebalance parquet files dynamically #125 #126
- Add odps read session refresh to extend odps session expired time #132
- Add more alibaba cloud credentials for odps dataset #115
- Add odps_data_compression (ZSTD) config for OdpsDataset #146
- Always lazy init odps writer #178
Upgrade
- Upgrade pytorch to v2.7 and torchrec to v1.2.0 #197
Note
For TorchEasyRec 0.8.x, you should use Docker image version 0.8.
- For the GPU version (CUDA 12.6):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cu126- PyTorch: v2.7 CUDA: v12.6 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cpu- PyTorch: v2.7 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11
Bug Fixes and Other Changes
- [bugfix] fix cpu docker image build without trt by @tiankongdeguiji in #100
- add_dssm_recall_benchmark by @chengaofei in #101
- [feat] support ignore unused features in negative sampler by @tiankongdeguiji in #102
- [bugfix] fix multi-val sequence embedding nan when pooling_type = mean by @tiankongdeguiji in #104
- [feat] upgrade ruff to 2025 code style by @tiankongdeguiji in #105
- [bugfix] fix correctness of kjt.lengths when ShardedEmbeddingBag’s pooling_type is mean and shard_type is row_wise by @tiankongdeguiji in #106
- [feat] upload feature assets to odps and fix remove_bucketizer in create_fg_json by @tiankongdeguiji in #103
- [bugfix] fix multi-value sequence raw feature by @tiankongdeguiji in #109
- [bugfix] fix dice init params by @tiankongdeguiji in #108
- [feat] docker support rtx gpu by @tiankongdeguiji in #111
- [bugfix] fix create_fg_json invalid option force_update_resource by @chengaofei in #110
- [feat] clean fg_encoded config by @tiankongdeguiji in #112
- [bugfix] prevent redundant file uploading to odps when use create_fg_json by @tiankongdeguiji in #113
- [bugfix] fix modify feature group config in training by @chengaofei in #118
- [feat] refactor weighted id feature with pyfg 0.4.5 encoded format by @tiankongdeguiji in #117
- [bugfix] fix feature.keys() none error when all features in embedding group are zch by @tiankongdeguiji in #120
- [feat] refine pyarrow type to odps table type convert by @tiankongdeguiji in #121
- [bugfix] bump up pyfg version 0.4.8 to fix sequence_length in config < true sequence length in data by @tiankongdeguiji in #130
- Support non null string list by @yanzhen1233 in #128
- [bugfix] fix mc-abc divisor none error when use mean pooling by @tiankongdeguiji in #133
- [bugfix] fix feature permute when use mc-ebc and mean pooling by @tiankongdeguiji in #134
- support feature_groups select features by @chengaofei in #135
- [bugfix] fix export model with zch by @tiankongdeguiji in #136
- bugfix_export_input_tile_is_2 by @chengaofei in #138
- fix the feature bug: has_dag by @yjjinjie in #140
- [bugfix] add missing dataset utils test by @tiankongdeguiji in #139
- [bugfix] add ArrowInvalid retry for refresh odps session by @tiankongdeguiji in #141
- [feat] remove redundant side_inputs warn when fg_mode=FG_NONE by @tiankongdeguiji in #142
- [bugfix]fix autodis parameter init in dist mode by @eric-gecheng in #143
- [bugfix] fix mlp embedding param init bug by @eric-gecheng in #145
- [bugfix] revert emb_impl call by @tiankongdeguiji in #147
- [bugfix] fix ple typo by @tiankongdeguiji in #150
- [feat] add kernel config and BaseModule by @tiankongdeguiji in #151
- [feat] refactor label_name to label tensor in loss and metric impl by @tiankongdeguiji in #153
- [feat] update maxcompute vpc endpoint and quota doc by @tiankongdeguiji in #155
- [feat] add auto rebalance doc for ParquetDataset by @tiankongdeguiji in #156
- [feat] add odps dataset ci test by @tiankongdeguiji in #159
- [feat] add nightly build wheel and doc by @tiankongdeguiji in #160
- [feat] add benchmark and nightly test by @tiankongdeguiji in #161
- [bugfix] fix build nightly wheel by @tiankongdeguiji in #162
- [bugfix] fix regression metric by @tiankongdeguiji in #163
- [bugfix] fix fork repo cpu ci by @tiankongdeguiji in #166
- fix loop logic in hitrate.py by @eric-gecheng in #165
- [bugfix] fix combo feature value and length mismatch when input data with only one separator by @tiankongdeguiji in #168
- [bugfix] fix mtl weight always equal to 1 after div by mean by @tiankongdeguiji in #171
- [feat] optimze dssm and mtl with weight benchmark by @tiankongdeguiji in #173
- [bugfix] fix clear_variational_dropout and visualize flag of feature selection by @tiankongdeguiji in #174
- [feat] add fg value_type config and make num_buckets default value_dtype as string by @tiankongdeguiji in https://github.com/alibaba/...
v0.7.0
Major Features and Improvements
Train/Eval/Export
- Support train/eval/export on cpu #27
- Support TRT export (Beta) #30 #32 #41 #43 #58 #59 #89
- Support AOT export (WIP) #79
Model
- Optimize TDM gen tree speed #33
- TDM Support string id #72
- Rank and Match models support sample weight #50 #57 #63 #65
- Add zero collision hash embedding #60
- Add intervention methods for multi-target learning #49
- Add Autodis and MLP embedding for raw features #73 #75
- Add task space for multi-target learning loss #82
- Add dual augmented two-tower match model #83
- Add HSTU (WIP) #55
Feature
- pyfg support CPU without avx512 #20
- ExprFeature support l2_norm|dot|euclid_dist #35
- Add fg bucketize only mode & refactor fg_encoded to fg_mode #62
- Make default bucketize value configurable #94
- Support multi-value sequence #96
- Support vocab file #97
Dataset
- Enhance stability for credential of OdpsDataset #45
- Add complex type and credential support for sampler when use odps dataset #52
- Support CsvDataset with null columns #56
- Negative sampler support string id #70
Config
Upgrade
Note
For TorchEasyRec 0.7.x, you should use Docker image version 0.7.
- For the GPU version (CUDA 12.4):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.7-cu124- PyTorch: v2.6 CUDA: v12.4 FBGEMM: v1.1.0 TorchRec: v1.1.0 Python: v3.11
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.7-cpu- PyTorch: v2.6 FBGEMM: v1.1.0 TorchRec: v1.1.0 Python: v3.11
Bug Fixes and Other Changes
- [bugfix] remove redundant sequence key in feature input names when fg_mode is DAG by @tiankongdeguiji in #21
- fix quota_name for add feature info by @chengaofei in #22
- update config delete drop feature config by @chengaofei in #23
- [feat] make docker compat with gpu driver 470 by @tiankongdeguiji in #24
- [bugfix] fix dlc tutorial doc by @tiankongdeguiji in #25
- [bugfix] fix dbmtl model doc by @tiankongdeguiji in #28
- [feat] add pai dlc and dsw dependency in docker by @tiankongdeguiji in #29
- [feat] update easyrec dinggroup qrcode by @tiankongdeguiji in #31
- [feat] update pyfg doc to 0.3.5 by @tiankongdeguiji in #34
- [bugfix] fix fg arrow handler with sample mask by @tiankongdeguiji in #38
- [feat] add unique test work dir by @tiankongdeguiji in #40
- [bugfix] add id field of negative sampler to selected columns by @tiankongdeguiji in #42
- [bugfix] prevent predict hang when subthread or subproc exception by @tiankongdeguiji in #44
- [bugfix] input_tile=3: make dataparser to get user feats before creat… by @yjjinjie in #46
- [bugfix] fix sequence feature doc by @tiankongdeguiji in #48
- [feat] optimize is_user_feat of Feature when use dag by @tiankongdeguiji in #53
- [bugfix] refine sample weight compatibility & refine label dtype check & relax predict pipeline check & fix num_rows < num_workers when use OdpsDataset by @tiankongdeguiji in #54
- [feat] add doc for training with maxcompute tables on DLC by @yanzhen1233 in #47
- create fg will use resource name by @chengaofei in #64
- [bugfix] fix is_sparse of LookupFeature and MatchFeature when use vocab_dict by @tiankongdeguiji in #66
- [bugfix] fix odps quota in hitrate.py & refine error info of CsvReader and ParquetReader by @tiankongdeguiji in #67
- [bugfix] fix mtl model label in ut by @tiankongdeguiji in #68
- [bugfix] fix calculate_shard_storages to handle optimizer correctly by @tiankongdeguiji in #69
- [feat] add LOG_LEVEL environ variable by @tiankongdeguiji in #71
- [bugfix] fix predict when num_workers = 0 by @tiankongdeguiji in #74
- [bugfix] fix duplicate server launch error in odps sampler test by @tiankongdeguiji in #76
- [feat] refactor batch_size to tile_size in Batch dataclass by @tiankongdeguiji in #77
- [feat]add total_loss to the plogger and summary_writer by @eric-gecheng in #78
- [bugfix] fix weighted feature when INPUT_TILE=2 by @tiankongdeguiji in #80
- [bugfix] fix negative sample table with multiple partitions by @tiankongdeguiji in #81
- [bugfix] readme typo by @eric-gecheng in #85
- [doc] fix task space doc error by @chengaofei in #86
- [bugfix] add div_no_nan and prevent divide by zero loss weight by @tiankongdeguiji in #88
- [feat] remove sample weight and labels when export by @tiankongdeguiji in #91
- [feat] configure the shell to be bash by default in docker environments by @tiankongdeguiji in #92
- [doc] creat fg json doc add upload fg json to mc method by @chengaofei in #95
New Contributors
- @yjjinjie made their first contribution in #30
- @eric-gecheng made their first contribution in #50
- @yanzhen1233 made their first contribution in #47
- @Dave-AdamsWANG made their first contribution in #49
- @chengmengli06 made their first contribution in #79
- @iWelkin-coder made their first contribution in #55
Full Changelog: v0.6.0...v0.7.0
v0.6.0
We are excited to announce the release of TorchEasyRec 0.6.0, the first public release for TorchEasyRec.
Major Features and Improvements
- High-performance training, evaluation, and prediction with GPUs.
- Supported a variety of input data types, including MaxCompute Table, OSS files, CSV files, Parquet files doc here.
- Supported a variety of feature types, including IdFeature, RawFeature, ComboFeature, LookupFeature, MatchFeature, ExprFeature, OverlapFeature, TokenizeFeature, SequenceIdFeature, SequenceRawFeature, and SequenceFeature. The feature generation operations is also efficient and robust doc here.
- Supported a variety of models, including DSSM, TDM, DeepFM, MultiTower, DIN, MMoE, DBMTL, PLE. It is also easy to implement customized models.
- Supported a variety of loss, including binary_cross_entropy, softmax_cross_entropy, l2_loss, jrc_loss doc here.
- Supported VariationalDropout feature selection.
- Easy to deploy a TorchEasyRec model as a high-performance inference service using the TorchEasyRec Processor.
Bug Fixes and Other Changes
- [bugfix] fix train_eval may hang when use OdpsDataset and set is_orderby_partition=true by @tiankongdeguiji in
- [bugfix] fix offline predict input tile model with sequence by @tiankongdeguiji in #14
Note
For TorchEasyRec 0.6.x, you should use Docker image version 0.6.
- For the GPU version (CUDA 12.1):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.6-cu121
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.6-cpu
New Contributors
- @tiankongdeguiji made their first contribution in #1
- @jjbbong made their first contribution in #3
- @chengaofei made their first contribution in #4
Full Changelog: https://github.com/alibaba/TorchEasyRec/commits/v0.6.0