v0.8.0
·
117 commits
to master
since this release
Major Features and Improvements
Train/Eval/Export
- Support eval and save checkpoint by epoch #116
- Support export fp32/fp16/int8/int4/int2 ebc embedding quant model #137
- Enhance export efficiency by restoring state dict directly instead of copying and gathering #177
- Add faiss gpu support for evaluation #170
- Enhance optimizer state loading for changed plans with plan checkpoint #185
- Support tensorboard log for model parameters #181
- Add restore ckpt check for continue train #180
- Add allow_tf32 flag and global embedding param constraint #188
Model
- Add MIND model #119 #123 #157 #172
- Add RocketLaunching model #129
- Add DLRM model #148
- Add MaskNet #179 #187
- Add dice activation and support bn for sequence mlp #107
- Add regression and multi-classification metric #149
- Optimize distributed GAUC memory use #127
- Add SequenceEmbeddingGroup and support jagged forward #152
- Support max sequence length setting for sequence encoder #184
- Support hard negative sampler #195
- Optimize HSTU training and sampling process and add triton ops (WIP) #93 #154
Feature
- Support custom feature and custom sequence feature #144
- Weighted id feature support map dtype #190
- Dump parsed inputs support weighted id and multi-value sequence feature #191
Dataset
- Support dataset shuffle #114
- Optimize performance of ParquetDataset and Rebalance parquet files dynamically #125 #126
- Add odps read session refresh to extend odps session expired time #132
- Add more alibaba cloud credentials for odps dataset #115
- Add odps_data_compression (ZSTD) config for OdpsDataset #146
- Always lazy init odps writer #178
Upgrade
- Upgrade pytorch to v2.7 and torchrec to v1.2.0 #197
Note
For TorchEasyRec 0.8.x, you should use Docker image version 0.8.
- For the GPU version (CUDA 12.6):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cu126- PyTorch: v2.7 CUDA: v12.6 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
- For the CPU version:
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cpu- PyTorch: v2.7 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11
Bug Fixes and Other Changes
- [bugfix] fix cpu docker image build without trt by @tiankongdeguiji in #100
- add_dssm_recall_benchmark by @chengaofei in #101
- [feat] support ignore unused features in negative sampler by @tiankongdeguiji in #102
- [bugfix] fix multi-val sequence embedding nan when pooling_type = mean by @tiankongdeguiji in #104
- [feat] upgrade ruff to 2025 code style by @tiankongdeguiji in #105
- [bugfix] fix correctness of kjt.lengths when ShardedEmbeddingBag’s pooling_type is mean and shard_type is row_wise by @tiankongdeguiji in #106
- [feat] upload feature assets to odps and fix remove_bucketizer in create_fg_json by @tiankongdeguiji in #103
- [bugfix] fix multi-value sequence raw feature by @tiankongdeguiji in #109
- [bugfix] fix dice init params by @tiankongdeguiji in #108
- [feat] docker support rtx gpu by @tiankongdeguiji in #111
- [bugfix] fix create_fg_json invalid option force_update_resource by @chengaofei in #110
- [feat] clean fg_encoded config by @tiankongdeguiji in #112
- [bugfix] prevent redundant file uploading to odps when use create_fg_json by @tiankongdeguiji in #113
- [bugfix] fix modify feature group config in training by @chengaofei in #118
- [feat] refactor weighted id feature with pyfg 0.4.5 encoded format by @tiankongdeguiji in #117
- [bugfix] fix feature.keys() none error when all features in embedding group are zch by @tiankongdeguiji in #120
- [feat] refine pyarrow type to odps table type convert by @tiankongdeguiji in #121
- [bugfix] bump up pyfg version 0.4.8 to fix sequence_length in config < true sequence length in data by @tiankongdeguiji in #130
- Support non null string list by @yanzhen1233 in #128
- [bugfix] fix mc-abc divisor none error when use mean pooling by @tiankongdeguiji in #133
- [bugfix] fix feature permute when use mc-ebc and mean pooling by @tiankongdeguiji in #134
- support feature_groups select features by @chengaofei in #135
- [bugfix] fix export model with zch by @tiankongdeguiji in #136
- bugfix_export_input_tile_is_2 by @chengaofei in #138
- fix the feature bug: has_dag by @yjjinjie in #140
- [bugfix] add missing dataset utils test by @tiankongdeguiji in #139
- [bugfix] add ArrowInvalid retry for refresh odps session by @tiankongdeguiji in #141
- [feat] remove redundant side_inputs warn when fg_mode=FG_NONE by @tiankongdeguiji in #142
- [bugfix]fix autodis parameter init in dist mode by @eric-gecheng in #143
- [bugfix] fix mlp embedding param init bug by @eric-gecheng in #145
- [bugfix] revert emb_impl call by @tiankongdeguiji in #147
- [bugfix] fix ple typo by @tiankongdeguiji in #150
- [feat] add kernel config and BaseModule by @tiankongdeguiji in #151
- [feat] refactor label_name to label tensor in loss and metric impl by @tiankongdeguiji in #153
- [feat] update maxcompute vpc endpoint and quota doc by @tiankongdeguiji in #155
- [feat] add auto rebalance doc for ParquetDataset by @tiankongdeguiji in #156
- [feat] add odps dataset ci test by @tiankongdeguiji in #159
- [feat] add nightly build wheel and doc by @tiankongdeguiji in #160
- [feat] add benchmark and nightly test by @tiankongdeguiji in #161
- [bugfix] fix build nightly wheel by @tiankongdeguiji in #162
- [bugfix] fix regression metric by @tiankongdeguiji in #163
- [bugfix] fix fork repo cpu ci by @tiankongdeguiji in #166
- fix loop logic in hitrate.py by @eric-gecheng in #165
- [bugfix] fix combo feature value and length mismatch when input data with only one separator by @tiankongdeguiji in #168
- [bugfix] fix mtl weight always equal to 1 after div by mean by @tiankongdeguiji in #171
- [feat] optimze dssm and mtl with weight benchmark by @tiankongdeguiji in #173
- [bugfix] fix clear_variational_dropout and visualize flag of feature selection by @tiankongdeguiji in #174
- [feat] add fg value_type config and make num_buckets default value_dtype as string by @tiankongdeguiji in #175
- fix convert_easyrec_config_to_tzrec_config.py bug by @yanzhen1233 in #169
- [bugfix] fix remove_bucketizer of create_fg_json tool by @tiankongdeguiji in #182
- [bugfix] fix feature inputs of id feature & combo feature when fg_mode=FG_BUCKETIZE by @tiankongdeguiji in #183
- [bugfix] revert test tearDown by @tiankongdeguiji in #186
- [bugfix] fix wide_embedding_dim in deepfm by @eric-gecheng in #189
Full Changelog: v0.7.0...v0.8.0