Release v0.8.0 · alibaba/TorchEasyRec

Major Features and Improvements

Train/Eval/Export

Support eval and save checkpoint by epoch #116
Support export fp32/fp16/int8/int4/int2 ebc embedding quant model #137
Enhance export efficiency by restoring state dict directly instead of copying and gathering #177
Add faiss gpu support for evaluation #170
Enhance optimizer state loading for changed plans with plan checkpoint #185
Support tensorboard log for model parameters #181
Add restore ckpt check for continue train #180
Add allow_tf32 flag and global embedding param constraint #188

Model

Add MIND model #119 #123 #157 #172
Add RocketLaunching model #129
Add DLRM model #148
Add MaskNet #179 #187
Add dice activation and support bn for sequence mlp #107
Add regression and multi-classification metric #149
Optimize distributed GAUC memory use #127
Add SequenceEmbeddingGroup and support jagged forward #152
Support max sequence length setting for sequence encoder #184
Support hard negative sampler #195
Optimize HSTU training and sampling process and add triton ops (WIP) #93 #154

Feature

Support custom feature and custom sequence feature #144
Weighted id feature support map dtype #190
Dump parsed inputs support weighted id and multi-value sequence feature #191

Dataset

Support dataset shuffle #114
Optimize performance of ParquetDataset and Rebalance parquet files dynamically #125 #126
Add odps read session refresh to extend odps session expired time #132
Add more alibaba cloud credentials for odps dataset #115
Add odps_data_compression (ZSTD) config for OdpsDataset #146
Always lazy init odps writer #178

Upgrade

Upgrade pytorch to v2.7 and torchrec to v1.2.0 #197

Note

For TorchEasyRec 0.8.x, you should use Docker image version 0.8.

For the GPU version (CUDA 12.6):
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cu126
- PyTorch: v2.7 CUDA: v12.6 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11
- We drop support for the 470 GPU driver version. If you still want to use the 470 GPU driver version, you can set LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat
For the CPU version:
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-devel:0.8-cpu
- PyTorch: v2.7 FBGEMM: v1.2.0 TorchRec: v1.2.0 Python: v3.11

Bug Fixes and Other Changes

[bugfix] fix cpu docker image build without trt by @tiankongdeguiji in #100
add_dssm_recall_benchmark by @chengaofei in #101
[feat] support ignore unused features in negative sampler by @tiankongdeguiji in #102
[bugfix] fix multi-val sequence embedding nan when pooling_type = mean by @tiankongdeguiji in #104
[feat] upgrade ruff to 2025 code style by @tiankongdeguiji in #105
[bugfix] fix correctness of kjt.lengths when ShardedEmbeddingBag’s pooling_type is mean and shard_type is row_wise by @tiankongdeguiji in #106
[feat] upload feature assets to odps and fix remove_bucketizer in create_fg_json by @tiankongdeguiji in #103
[bugfix] fix multi-value sequence raw feature by @tiankongdeguiji in #109
[bugfix] fix dice init params by @tiankongdeguiji in #108
[feat] docker support rtx gpu by @tiankongdeguiji in #111
[bugfix] fix create_fg_json invalid option force_update_resource by @chengaofei in #110
[feat] clean fg_encoded config by @tiankongdeguiji in #112
[bugfix] prevent redundant file uploading to odps when use create_fg_json by @tiankongdeguiji in #113
[bugfix] fix modify feature group config in training by @chengaofei in #118
[feat] refactor weighted id feature with pyfg 0.4.5 encoded format by @tiankongdeguiji in #117
[bugfix] fix feature.keys() none error when all features in embedding group are zch by @tiankongdeguiji in #120
[feat] refine pyarrow type to odps table type convert by @tiankongdeguiji in #121
[bugfix] bump up pyfg version 0.4.8 to fix sequence_length in config < true sequence length in data by @tiankongdeguiji in #130
Support non null string list by @yanzhen1233 in #128
[bugfix] fix mc-abc divisor none error when use mean pooling by @tiankongdeguiji in #133
[bugfix] fix feature permute when use mc-ebc and mean pooling by @tiankongdeguiji in #134
support feature_groups select features by @chengaofei in #135
[bugfix] fix export model with zch by @tiankongdeguiji in #136
bugfix_export_input_tile_is_2 by @chengaofei in #138
fix the feature bug: has_dag by @yjjinjie in #140
[bugfix] add missing dataset utils test by @tiankongdeguiji in #139
[bugfix] add ArrowInvalid retry for refresh odps session by @tiankongdeguiji in #141
[feat] remove redundant side_inputs warn when fg_mode=FG_NONE by @tiankongdeguiji in #142
[bugfix]fix autodis parameter init in dist mode by @eric-gecheng in #143
[bugfix] fix mlp embedding param init bug by @eric-gecheng in #145
[bugfix] revert emb_impl call by @tiankongdeguiji in #147
[bugfix] fix ple typo by @tiankongdeguiji in #150
[feat] add kernel config and BaseModule by @tiankongdeguiji in #151
[feat] refactor label_name to label tensor in loss and metric impl by @tiankongdeguiji in #153
[feat] update maxcompute vpc endpoint and quota doc by @tiankongdeguiji in #155
[feat] add auto rebalance doc for ParquetDataset by @tiankongdeguiji in #156
[feat] add odps dataset ci test by @tiankongdeguiji in #159
[feat] add nightly build wheel and doc by @tiankongdeguiji in #160
[feat] add benchmark and nightly test by @tiankongdeguiji in #161
[bugfix] fix build nightly wheel by @tiankongdeguiji in #162
[bugfix] fix regression metric by @tiankongdeguiji in #163
[bugfix] fix fork repo cpu ci by @tiankongdeguiji in #166
fix loop logic in hitrate.py by @eric-gecheng in #165
[bugfix] fix combo feature value and length mismatch when input data with only one separator by @tiankongdeguiji in #168
[bugfix] fix mtl weight always equal to 1 after div by mean by @tiankongdeguiji in #171
[feat] optimze dssm and mtl with weight benchmark by @tiankongdeguiji in #173
[bugfix] fix clear_variational_dropout and visualize flag of feature selection by @tiankongdeguiji in #174
[feat] add fg value_type config and make num_buckets default value_dtype as string by @tiankongdeguiji in #175
fix convert_easyrec_config_to_tzrec_config.py bug by @yanzhen1233 in #169
[bugfix] fix remove_bucketizer of create_fg_json tool by @tiankongdeguiji in #182
[bugfix] fix feature inputs of id feature & combo feature when fg_mode=FG_BUCKETIZE by @tiankongdeguiji in #183
[bugfix] revert test tearDown by @tiankongdeguiji in #186
[bugfix] fix wide_embedding_dim in deepfm by @eric-gecheng in #189

Full Changelog: v0.7.0...v0.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.8.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Major Features and Improvements

Train/Eval/Export

Model

Feature

Dataset

Upgrade

Note

Bug Fixes and Other Changes

Contributors

Uh oh!