v0.5.0
Changelog
v0.5.0 (8/10/2021)
Highlights
- First class support for eager execution. The deprecated APIs are moved to 
oneflow.compatible.single_client - Drop-in replacement of 
import torchfor existing Pytorch projects. You could test it by inter-changingimport oneflow as torchandimport torch as flow. - nn.Module for eager execution
 - nn.Graph for lazy execution
 - DDP for data parallel
 
A sneak peek of the new API
Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.
class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance
    def build(self, x):
        y_pred = self.model(x)
        return y_pred
graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.GraphNew in Python API
- [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
 - [enhancement][python][interface] Add GroupNorm #5175
 - [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
 - [feature][eager][op][python][interface] Add deconv cpu impl #5224
 - [bug][eager][api][python][interface] Fix acosh bug #5221
 - [feature][eager][op][python][interface] Dev modules ctc loss #5168
 - [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
 - [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
 - [feature][eager][python][interface] Add meshgrid module #5205
 - [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
 - [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
 - [enhancement][eager][python] refine pow module and its test #5319
 - [enhancement][eager][op][python] Add triu op #5329
 - [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
 - [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
 - [bug][python][interface] tensor slice assign supports broadcasting #5344
 - [enhancement][op][python] add cpu group conv logic #5314
 - [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
 - [enhancement][build][python] Remove ONNX from setup py #5297
 - [enhancement][python][interface] [add] zeropad2d #5278
 - [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
 - [feature][python][interface] integrate nn.image.flip #5411
 - [bug][python] Fix issues in point of MultiClientSession #5469
 - [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
 - [enhancement][python] Add in_top_k function #5428
 - [enhancement][python] Dev add docstring #5449
 - [feature][api][python] MultiClientSession #5407
 - [documentation][python] remove --user #5431
 - [feature][python][interface] nn.Graph python #5309
 - [feature][python][interface] Fea/nn graph/graph name #5413
 - [bug][python][interface] rm nn.Graph.train #5424
 - [op][documentation][api][python][interface] add bernoulli module #5353
 - [enhancement][python] flow.S/B/P #5306
 - [enhancement][documentation][python] Add instruction on upgrade pip #5400
 - [enhancement][python] Rm oneflow export and experimental #5589
 - [bug][python] Fix nn.graph.utils module conflict #5598
 - [feature][ci][python] Update autotest framework #5520
 - [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
 - [enhancement][api][python] del default env init #5537
 - [enhancement][python] Fix single client using same glog file #5535
 - [bug][api][python] Fix Session TryClose #5531
 - [enhancement][feature][python] split vector-matrix norm #5478
 - [feature][eager][op][python][interface] Add more upsample kernel #5382
 - [enhancement][feature][test][python] add torchstyle unittest #5489
 - [feature][system][python] nn.Graph with training #5662
 - [enhancement][feature][python] Fea/nn graph/block proxy func #5727
 - [enhancement][api][python] consistent_tensor_to_api #5703
 - [feature][eager][op][python] Dev Align torch avgpool #5610
 - [enhancement][python] fix circular deps of sbp python module #5706
 - [documentation][python] [part5]Remove singleclient outdated api #5674
 - [enhancement][python] [part4]Remove singleclient outdated api #5672
 - [bug][op][python] remove outdated code in conv3d #5696
 - [enhancement][test][python] enlarge tolerance of dataloader test #5689
 - [enhancement][test][python] add autotest for some math ops #5646
 - [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
 - [enhancement][python] Add clip_grad_norm #5299
 - [purge][python] Remove Single-Client API in oneflow default python #5827
 - [bug][python] Fix ddp grad size #5834
 - [enhancement][feature][python] Dev RMSprop graph conf #5768
 - [enhancement][purge][eager][python] remove scale arg in optimizer #5821
 - [enhancement][feature][python] graph/block io check #5803
 - [enhancement][feature][python] Dev adam graph conf #5709
 - [purge][python] [part10]Remove singleclient outdated api #5756
 - [feature][api][python] better repr of nn.Graph for debug #5762
 - [bug][python] fix weight decay in RMSprop #5755
 - [purge][python] [part9]Remove singleclient outdated api #5752
 - [purge][python] [part8]Remove singleclient outdated api #5750
 - [documentation][python] add first batch of methods in oneflow.nn.functional namespace #5693
 - [purge][python] [part6]Remove singleclient outdated api #5704
 - [bug][python] use default_generator.seed() as random_seed in init #5721
 - [bug][system][python] ddp broadcast params and buffers #5913
 - [enhancement][test][python] Add consistent tensor requires grad test #5925
 - [bug][python] wrap flow.nn.init.* with flow.no_grad() #5932
 - [feature][api][python][interface] add clip_grad to optimizer #5817
 - [enhancement][ci][op][test][python] add randperm with test and docs #5680
 - [feature][api][python] Fea/nn graph/ lr_schedule(and cosine lr_sch) and opt_group #5846
 - [bug][python] fix bug of SyncOnMasterFn atexit #5909
 - [purge][python] Delete single client nn modules #6061
 - [enhancement][python] Move framework.distribute to env #6022
 - [bug][python] skip sync when abnormally exiting #6025
 - [feature][python] Fea/nn graph/warmup amp config #5969
 - [documentation][python] add optimizer api docs #6131
 - [documentation][python] add_tensor_api_doc #6127
 - [bug][python] Fix test_grid_sample.py and test_affine_grid.py threshold #6125
 - [documentation][api][python] add doc of graph #6093
 - [bug][python] Fix make of_format fail in ubuntu #6120
 - [feature][api][python][interface] Fea/graph helpers #6088
 - [enhancement][eager][python][interface] Use flow.randint in dataloader #6086
 - [feature][eager][api][python][interface] Import oneflow as torch #6076
 - [enhancement][test][api][python][refactor] rename OfrecordReader to OFRcordReader #6090
 - [purge][python][need-single-client-tests] Delete single client nn modules #6082
 - [enhancement][python] flow.load tolerates FileNotFound fault #6083
 - [feature][python] Fea/pipeline in graph #6105
 - [enhancement][test][python] graph activation checkpointing #6192
 - [enhancement][feature][op][python] rnn test #6165
 
New in Ops:
- [enhancement][op][api][refactor] [Functional] Part2: Add partial unary and math functional apis #5218
 - [enhancement][bug][op][interface] Refine deconv kernel #5229
 - [enhancement][op][api][interface] add ReflectionPad2d #5172
 - [feature][eager][op][api][interface] crossentropyloss and nllloss support ignore_index #5195
 - [feature][eager][op][api][interface] Yejiaojiao/dev bcewithlogitsloss #5173
 - [bug][ci][op] Dev user op set default is_dynamic #5223
 - [enhancement][op] add magic method for pow #5199
 - [enhancement][op][interface] add cpu version of upsampling #5194
 - [enhancement][bug][op][api][interface] add ReplicationPad2d #5148
 - [feature][eager][op][api][interface] add kldivloss module #5155
 - [feature][eager][op][documentation][build][api][interface] Add floor module and the corresponding testcases #4964
 - [enhancement][feature][op] Dev conv1d module #5280
 - [enhancement][op] Add ctc_greedy_decoder op #5294
 - [enhancement][op][system] Dev remove default grad func #5320
 - [enhancement][op][system] Add pad grad func. #5354
 - [enhancement][op][system] Add gradient funcs. #5348
 - [feature][purge][bug][eager][op][interface] fix upsample nearest bug #5347
 - [enhancement][op][system] [Functional] Part7: Migrate pooling ops #5253
 - [enhancement][op] nvjpeg hardware acc #5240
 - [enhancement][feature][ci][eager][op][api][interface] Add bmm module #5334
 - [enhancement][eager][op] Dev image decode eager #5333
 - [enhancement][op] Optimize softmax warp impl #4977
 - [enhancement][eager][op] Dev tensor buffer eager #5317
 - [enhancement][op][api][refactor] [Functional] Part6: Migrate conv op #5252
 - [enhancement][eager][op] Dev sort eager #5284
 - [enhancement][bug][op][api] fix bceloss bug in default weight and reduction #5303
 - [bug][eager][op] remove redundant assert and check #5264
 - [enhancement][bug][ci][op] fix bceloss bug about weight #5269
 - [enhancement][op][api][refactor] [Functional] Part5: Migrate nn ops #5249
 - [enhancement][eager][op] Dev argsort eager #5273
 - [enhancement][op][api][refactor] [Functional] Part4: Migrate array ops #5247
 - [enhancement][op][api][refactor] [Functional] Part3: Migrate binary and activation ops #5246
 - [bug][ci][op][test] Dev fix rmsprop ci fail #5481
 - [enhancement][op] add inplace method: Tensor.sin_ #5471
 - [bug][op] hotfix image_batch_align #5461
 - [enhancement][eager][op][interface] Dev maxpool series op 123d #5244
 - [bug][op] fix pool gpu kernel #5446
 - [feature][eager][op][api][interface] add pixelshufflev2 module #5383
 - [enhancement][feature][ci][eager][op][documentation][api][interface] Add flow xxx and tensor xxx autotest #5386
 - [enhancement][feature][eager][op][api][interface] Modules chunk #5324
 - [enhancement][eager][op] add image normalize for eager #5402
 - [enhancement][eager][op] Dev batch align module #5401
 - [enhancement][eager][op] add coco reader module #5391
 - [enhancement][wip][op] Restruct Elementwise kernel #4130
 - [bug][op] Fix DecodeRandom reuse mem #5606
 - [enhancement][op] Align pytorch maxpool #5525
 - [enhancement][bottleneck][eager][op][api] implementation of constantpad-3d op #5529
 - [enhancement][eager][op] Add scale size for resize #5509
 - [enhancement][op][api][refactor] Dev optimize tensor setitem #5501
 - [enhancement][op] register uint8 dtypeto support dataloader #5499
 - [enhancement][op] Add unique.cuh #5487
 - [enhancement][op][api][interface] Dev ofrecord auto truncating #5412
 - [feature][op][system][interface] Feat: LazyInterpret::ApplyImpl support SourceUserOpExpr and Copy #5711
 - [enhancement][op][interface] Dev logical_and/or modules #5636
 - [enhancement][op] support any number positional arguments for ones and zeros op #5698
 - [enhancement][feature][eager][op] Add conv3d Module #5327
 - [feature][eager][op][api][interface] add batchnorm3d module #5631
 - [bug][eager][op] fix reduce min max backward bug #5651
 - [enhancement][op] Debug dim scatter #5371
 - [enhancement][op][interface] Dev eye #5583
 - [enhancement][eager][op] Dev minimum maximum #5576
 - [enhancement][op] Restruct activation grad op #5669
 - [enhancement][feature][eager][op] Rewrite activation function #5465
 - [bug][op][documentation] add oneflow.cat for documentation #5621
 - [enhancement][op] Lcy logsoftmax #5746
 - [feature][op][need-simple-ci] Feat empty op #5659
 - [enhancement][eager][op] Dev split #5714
 - [enhancement][op][interface] add index_select op #5661
 - [bug][op] fix nvjpeg hw acc #5851
 - [enhancement][op] Remove move in conv_cudnn #5828
 - [enhancement][op][interface] Dev logical_xor module #5694
 - [bug][eager][op] fix squeeze #5808
 - [enhancement][op] Get parallel_id and parallel_num through rank and world size in DDP #5717
 - [bug][eager][op] delete interpolate int type #5805
 - [bug][op] Fix bug in scatter #5743
 - [enhancement][op] Refactor: remove module not required, call function directly #5754
 - [enhancement][op] Remove modules not required(tan, erfc, log1p, scatter_nd) #5791
 - [enhancement][op] Refactor scatter, clamp and pow in cpp instead of in python #5715
 - [enhancement][op] Rm useless code in gather files #5687
 - [enhancement][eager][op] change flip_code to scalar #5786
 - [enhancement][bug][op][interface] fix upsample bug #5753
 - [bug][op][interface] Quick fix Lazy nn.Graph input/output OpConf.BlobConf.is_dynamic #5767
 - [enhancement][bug][eager][op] fix argwhere 0-dim bug #5760
 - [enhancement][eager][op] delete unused code #5744
 - [feature][op] Export fused_scale_tril op #5933
 - [bug][op] Fix backward bug in 3d #5908
 - [bug][op] Fix one_hot api limit #5927
 - [enhancement][eager][op] Dev where scalar #5797
 - [bug][op] fix grad error #5914
 - [feature][bug][op] Fix inplace op circle reference bug #5910
 - [enhancement][op] Move the judgment content to c++, And add scalar fmod #5854
 - [enhancement][op] Support combined_margin_loss op in flow.nn.modules #5830
 - [enhancement][op][api][interface] functional_one_hot #5315
 - [enhancement][op] Dev scalar op #5778
 - [bug][eager][op] fix gather kernel 0 shape #5888
 - [enhancement][op] add l2_normalize for mutl-client interfaces #5859
 - [feature][op] Export function softmax_cross_entropy #6056
 - [enhancement][op] Add int attr for functional adaptive average pool #6059
 - [enhancement][op][interface] dev full op #5955
 - [bug][eager][op] fix 0dim inplace add #6029
 - [feature][op][system][interface] Feat: nn.Graph image gpu decoder #6014
 - [enhancement][op][interface] dev optim_optim_lr_scheduler_multisteplr #5975
 - [enhancement][op] NopKernel #6035
 - [enhancement][eager][op][api] Dev tril op #6005
 - [enhancement][op] dev unfold and fold #5675
 - [enhancement][op] ResNet CUDA Graphs #6018
 - [enhancement][feature][op] add broadcast pow #6013
 - [enhancement][op][interface] init of op diag #5298
 - [op][documentation][api] Fix api document bug #6009
 - [enhancement][op] Dev fused functional #5954
 - [bug][op][build] Add nvcc flag -Werror cross-execution-space-call #6002
 - [bug][op] Fix Normalization grad function #5993
 - [enhancement][feature][eager][op][test][interface] Add fused self attention #5966
 - [enhancement][bug][ci][eager][op][api][interface] Try to fix var bug #5973
 - [enhancement][feature][eager][op][interface] add prod op #5867
 - [enhancement][eager][op][api] add glu op #6065
 - [enhancement][op] Align Torch.nn.functional poolXd #6184
 - [bug][eager][op] fix backward index for gamma beta #6149
 - [bug][op][system] Fix BroadcastMatmulGrad bug #6168
 - [enhancement][op][api] Add Int support for functional.avg/maxpool #6174
 - [bug][eager][op][api][interface] align dropout api name with pytorch #6170
 - [enhancement][op] support inplace operation for hardsigmoid #6137
 - [enhancement][bug][op] Fix do bias correction in Adam/AdamW #5960
 - [bug][eager][op][api][interface] fix repeat 0-dim tensor bug #6150
 - [enhancement][bug][op] Fix select_first_grad bug #6142
 - [bug][ci][eager][op][documentation][interface] Add clipgrad doc and contiguous #6130
 - [bug][op] Fix eager optim dynamic attr bug #6111
 - [enhancement][op] Support grid_sample and affine_grid operator #6038
 - [op][documentation] Export apis for documentation #6068
 - [enhancement][feature][bug][ci][eager][op][documentation][interface] transfer python function to c++ method #6114
 - [op][documentation] Dev functional batch_gather #6233
 - [enhancement][op][test] fix cross_entropy_loss and its test #5799
 - [bug][op] Use attr nd_sbp to check consistent #6222
 - [enhancement][op] Dev fused bn functional #6077
 - [enhancement][op] support default value in intlist #6201
 - [bug][op] fix sparse_softmax get_nd_sbp #6203
 - [bug][op] Fix bug in model fused update #6197
 - [enhancement][op][system][refactor] Optimize tensor getitem. #5433
 
New in Eager:
- [enhancement][eager][interface] Reconstruct module files #5251
 - [bug][eager][documentation][interface] Fix conv module bug #5245
 - [bug][ci][eager][interface] Fix bce withlogitloss ci error #5237
 - [feature][eager][api][interface] module BCELoss #5144
 - [enhancement][feature][eager][api][interface] Dev norm op #5178
 - [enhancement][bug][eager] Fix stack module #5222
 - [enhancement][feature][eager] Support different dtype of equal module #5214
 - [enhancement][bug][eager][documentation][api][interface] Add nllloss backward #5210
 - [enhancement][eager][api][upload-core] Decouple FileSystem and IOConf #5162
 - [enhancement][ci][eager] Set lower precision avoid ci failing #5200
 - [eager][documentation] Add hint when apply FunctionNode second time #5369
 - [enhancement][feature][bug][ci][eager][documentation][api] Fix upsample bilinear bug #5366
 - [bug][eager] Fix not contiguous ndarray to tensor bug #5351
 - [enhancement][eager][system] Infer consistent tensor meta #5118
 - [feature][eager] Feat graph autograd engine #5296
 - [enhancement][eager][interface] Dev type as module #5349
 - [feature][eager][documentation][api][interface] Add new ones module #5342
 - [enhancement][bug][eager] Fix logical slice assign dtype #5339
 - [bug][ci][eager][documentation][api][interface] Fix where module bug #5300
 - [bug][ci][eager][documentation][api] Fix l1loss ci error #5307
 - [enhancement][bug][eager][documentation][api][interface] Qi's First Edit of deleting "print" and ".numpy" #5129
 - [feature][eager][refactor] Separate autograd meta to tensor #5267
 - [feature][eager][api][interface] add tile module #5234
 - [enhancement][eager] Release lambda function to reuse tensor memory #5266
 - [feature][bug][eager][documentation] Fix default value not set bug #5483
 - [enhancement][eager][interface] [Add] gather_nd scatter_nd #5422
 - [enhancement][bug][eager] fix param #5473
 - [bug][eager] Fix Tensor.grad setter bug #5462
 - [enhancement][eager] Rename now_grad_arg to current_grad #5466
 - [eager][test][documentation][interface] Add autotest part1 #5436
 - [enhancement][eager] Use functional copy instead of op_builder #5460
 - [bottleneck][bug][eager][interface] fix -1 index not support bug #5448
 - [bug][ci][eager][documentation][api] Fix concat backward bug #5443
 - [enhancement][bug][ci][eager] Add autograd engine warning #5444
 - [feature][eager][api][interface] Smoothl1loss #5256
 - [enhancement][bottleneck][eager] remove device dtype params #5434
 - [bug][ci][eager][documentation][interface] Delete maxpool failed test #5409
 - [enhancement][eager][api] Add tensor grad assginment #5379
 - [enhancement][bug][eager] fix-abs #5398
 - [enhancement][bug][eager][interface] Fix bn track running stats #5393
 - [enhancement][bug][eager] Support uint dtype of constant op #5396
 - [enhancement][bug][eager][documentation][interface] Delete useless code upsample #5392
 - [enhancement][ci][eager][interface] add flow.view #5301
 - [enhancement][bug][ci][eager][api][interface] Add masked select module #5356
 - [bug][eager][interface] Fix batchnorm backward bug #5602
 - [enhancement][eager] Support weight_dacay(l2 actually) #5587
 - [feature][eager][documentation][api] Add new autotest #5588
 - [enhancement][eager][documentation][api] Dev fmod #5404
 - [feature][eager] Support inplace add #5432
 - [feature][eager][interface] Feat tensor stride property #5543
 - [enhancement][feature][eager][documentation][api] Add flip module #5541
 - [feature][eager] Feat module repr #5486
 - [enhancement][bottleneck][bug][eager][interface] Fix maxpool1d params #5493
 - [enhancement][feature][eager][interface] Dev flow.utils.data part1 #5406
 - [bug][eager][api] Fix tensor getitem bug #5474
 - [enhancement][eager][need-simple-ci] export datasets interface #5691
 - [enhancement][eager][system] rebase #5601
 - [enhancement][eager][test] added nn.RecordBytesDecoder with its test #5475
 - [enhancement][feature][eager][need-simple-ci] 0-dim tensor support #5552
 - [enhancement][bug][eager] rewrite slice_update backward #5677
 - [enhancement][bug][eager][interface] align view input style with torch #5676
 - [enhancement][eager][interface][need-simple-ci] add autotests for modules #5666
 - [enhancement][bottleneck][eager][interface] Dev constantpad1d op #5579
 - [enhancement][eager][api][interface] Restruct MathOps AutoTest #5654
 - [enhancement][bug][ci][eager] Fix flip bug #5657
 - [bug][eager][api][interface] Fix expand module bug #5650
 - [enhancement][bug][eager][documentation][api] Fix repeat bug #5633
 - [enhancement][eager][test][api][interface] Add new autotest #5617
 - [enhancement][eager][api][interface] Dev flow.utils.data part2 #5500
 - [enhancement][bug][eager] make setitem device match #5835
 - [bug][eager][api][interface] align reshape input param with pytorch #5804
 - [feature][bug][eager][api] Align where op with torch #5850
 - [enhancement][bug][eager][api] Restruct prelu op #5829
 - [bug][eager][need-simple-ci] fix pooling ceil_mode bug #5818
 - [enhancement][eager] stateful local kernel supports consistent #5789
 - [bug][eager][api][interface] Fix argwhere bug #5816
 - [enhancement][eager][documentation][api] dev-nonzero #5809
 - [enhancement][feature][eager][api] Add fake quantize op #5690
 - [enhancement][bug][eager][documentation][api] Add api #5663
 - [enhancement][eager] Refactor consistent infer result #5790
 - [bug][eager][need-simple-ci] skip dataloader test #5780
 - [bug][eager][need-simple-ci] fix 0-dim tensor.fill_ #5771
 - [enhancement][eager] Cpu mpi broadcast #5726
 - [feature][eager] Feat grad mode classes #5956
 - [enhancement][bug][eager] fix wrong names #5951
 - [enhancement][eager][system] Local dep object pool #5953
 - [enhancement][eager][interface] rename OpExprInterpState to AutoGradCaptureState #5918
 - [bug][eager] Fix linear bug #5945
 - [bug][eager] Fix tensor_meta update bug #5924
 - [enhancement][eager] use flow.randperm #5928
 - [enhancement][eager] consistent init/save/load #5896
 - [enhancement][bug][eager][documentation][interface] Restruct sort and argsort op #5911
 - [enhancement][bug][eager][interface] Try to fix the problem that the insightface cannot converge。 #5906
 - [enhancement][bug][eager][interface] Add autotest #5899
 - [enhancement][eager] The scheduler thread joins worker threads #5893
 - [enhancement][eager] Bugfix async callback #5881
 - [feature][eager] Feat tensor to bool #5836
 - [bug][eager] Remove inplace broadcast_add #5551
 - [enhancement][eager] Broadcast consistent shape and dtype #5784
 - [enhancement][eager] Fix optimizer list parameters input bug #5848
 - [enhancement][eager][interface] Dev flow.utils.data part3 #5644
 - [enhancement][eager][api] Normalize naming of modules #6066
 - [enhancement][feature][eager][api][interface] add truncnormal #6051
 - [enhancement][bug][eager] AutoMatedTest support test module.parameter.grad #6043
 - [enhancement][feature][bug][eager] add module call kwags #6069
 - [enhancement][eager][api][interface] add tensor.item tensor.tolist #6021
 - [enhancement][eager][api][interface] Export pool ops api #6047
 - [enhancement][bug][eager][test][documentation][interface] Add more autotest sample #6039
 - [enhancement][bug][eager][system] disable cuda_h2d stream #6020
 - [feature][eager][test][api][interface] Add autotest codegen #6019
 - [feature][eager][documentation] Refactor cosine lr scheduler #6000
 - [enhancement][eager][interface] tensor.cpu/tensor.cuda #5894
 - [enhancement][eager][api] Support consistent_tensor.to(dtype) #5991
 - [bug][eager][interface] remove redundant codes in ModuleDict #5961
 - [bug][eager] Fix LayerNorm check bug #6196
 - [enhancement][eager][api] Change dropout api #6182
 - [enhancement][good for pr][eager][api][interface] add: test convert dependency #6023
 - [enhancement][bug][eager][interface] Fix autotest codegen bug #6171
 - [bug][eager] restore instr_local_dep_object_pool_size for nccl #6160
 - [enhancement][eager][api][interface] Aligin pooling op functional api names with torch #6163
 - [feature][bug][eager][api][interface] delete file #6162
 - [bug][eager] Fix optim load_state_dict bug #6152
 - [enhancement][eager][api] add is_training to dropout functor #6148
 - [enhancement][eager] Decompose nd sbp boxing #5800
 - [enhancement][eager] support consistent_tensor.to(copy=True) #6122
 - [feature][eager] Static grad scaler #6135
 - [bug][eager] Fix LayerNorm expr bug #6121
 - [bug][eager][api] move numpy c api init in numpy.cpp, make np array contiguous before copying #6117
 - [enhancement][eager][refactor] Remove params from ParamGroup getitem #6096
 - [enhancement][feature][eager] Support tensor and optimizer serialization #6087
 - [enhancement][bug][eager] fix bug about tensor str in nonsymmetric cast and getitem in consist… #6239
 - [enhancement][eager] Cpu all reduce #5849
 - [feature][eager] Support assign copy interface #6228
 - [enhancement][eager][api][interface] Dev reconstruct pad ops #6223
 - [enhancement][eager][api][interface] support flow.cuda.is_available #6124
 - [bug][eager] make flow._C.local_all_reduce sync lanuched #6175
 - [enhancement][eager] Rename flow to oneflow in user hint #6190
 - [bug][eager][tooling][test][api][interface] Autotest generate input tensor #6206
 - [enhancement][eager] consistent tensor zeros_() #6202
 - [enhancement][eager] Cpu mpi #5865
 
Build enhancements:
- [bug][build] Fix GRPC compilation failure on CMake 3.20 #5255
 - [bug][build] Refine header file copy #5254
 - [bug][build] Fix older version CMake doesn't support multiple targets in CLI #5248
 - [bug][build] Turn off NCCL_STATIC/CUDNN_STATIC when CUDA_STATIC is OFF #5243
 - [feature][build] Fix support for Ninja and add Ninja build in Simple CI #5236
 - [enhancement][build] Add cmake option CUDA_STATIC #5164
 - [bug][build] Fix protobuf debug postfix #5233
 - [enhancement][ci][build] Move default third party dir into build dir #5230
 - [enhancement][build] Refine protobuf cmake #5216
 - [enhancement][ci][build] Remove transport test main #5215
 - [enhancement][ci][build] Speedup opencv build #5213
 - [enhancement][build] Support clang #5015
 - [enhancement][documentation][build] Add prefix when creating git archive #5201
 - [enhancement][build] Add cmake option NCCL_STATIC #5160
 - [enhancement][build] Refine CMake CUDA version handling #5192
 - [enhancement][build] Use clang plugin to check Maybe variables are used #5358
 - [enhancement][build] Add BUILD_BYPRODUCTS for ExternalProject_Add #5316
 - [enhancement][build] Add cmake init cache to simplify user onboarding #5311
 - [feature][bug][build] Fix macOS support and run macOS build in Simple CI #4947
 - [enhancement][build] flatbuffers use mirror #5295
 - [enhancement][build] Don't build test by default #5302
 - [enhancement][build] Prevent building from scratch when toggle flag BUILD_GIT_VERSION #5259
 - [enhancement][build] Refine gRPC, glog, gflags cmake for conda #5276
 - [feature][build] Support XLA with CPU-only #5260
 - [enhancement][ci][onnx][build] Remove ONNX from CI #5257
 - [enhancement][build] Refactor build_wheel to support oneflowinc images #5427
 - [enhancement][build] Add arg skip_audit in build wheel #5423
 - [bug][build] hwloc disable shared #5388
 - [documentation][build] Update readme for autoconf and libtool #5376
 - [enhancement][build] remove dir python and compatible_single_client_python #5609
 - [bug][build][system] Fix pyyaml version #5594
 - [enhancement][ci][build] force release flags #5574
 - [bug][build] prevent endless loop #5534
 - [enhancement][build] Support sccache #5528
 - [enhancement][build] Add definition for CMAKE_BUILD_TYPE and print cmake_build_type in oneflow doctor #5505
 - [enhancement][ci][build][need-simple-ci] Fix macOS for recent changes #5705
 - [bug][build] fix return type error on gcc 4.8.5 #5660
 - [enhancement][build] Check CMAKE_BUILD_TYPE #5656
 - [enhancement][build] add -Werror=return-type #5655
 - [enhancement][build] Clean and fix for new py dir #5618
 - [enhancement][build] cmake: disable array-bounds check & treat warnings as errors for pyextobj and oneflow_internal & fix warnings #5838
 - [bug][build] set CMAKE_BUILD_TYPE to Release if undefined #5842
 - [enhancement][build][need-simple-ci] Fix all warnings & Add option TREAT_WARING_AS_ERROR to cmake #5751
 - [enhancement][build] add CMAKE_INTERPROCEDURAL_OPTIMIZATION in fast cmake cache #5970
 - [enhancement][build] add clang tidy target #5957
 - [bug][build] cmake: fix cmake cache args in opencv #5959
 - [enhancement][build] Add cmake option USE_SYSTEM_NCCL #5897
 - [enhancement][build] cmake: include third party headers as system headers to avoid warnings #5879
 - [enhancement][build] Ignore opencv-python on machine aarch64 #5884
 - [enhancement][build] enable CMake first class cuda support #5858
 - [bug][build] Fix compile warning (strict-aliasing) #5872
 - [enhancement][bug][build][need-simple-ci] Upgrade gtest and fix some errors raised by clang #6079
 - [bug][ci][build] cmake: fix ninja build in CI #6072
 - [bug][build] fix files not actually removed when building for multiple python versions #6060
 - [bug][build][api] functional_api: fix build error in mac os #6010
 - [bug][build][need-simple-ci][need-single-client-tests] Fix recompile from scratch #6036
 - [bug][build] Turn on NVCC's warnings #6011
 - [bug][build][need-single-client-tests] fix bundle .so of other python version #6034
 - [bug][ci][build][need-single-client-tests] use copy_all_files_in_dir to replace copy_files #6033
 - [enhancement][build] check compiler version in cmake #6026
 - [enhancement][build] Add CUDA_NVCC_THREADS_NUMBER #6017
 - [enhancement][build][need-simple-ci] optimize of_include_copy #5978
 - [enhancement][ci][build][need-single-client-tests] CI: remove 
-DTREAT_WARNINGS_AS_ERRORS=OFF#6008 - [enhancement][build][xla] xrt: fix all warnings #5915
 - [enhancement][build] Prevent opencv compile failure with std 17 #5997
 - [enhancement][build] Use bundled cub #5998
 - [enhancement][ci][build] update clang tidy diff warnings-as-errors option #5989
 - [enhancement][build] Update run_clang_tidy.py to set return code and add warning-as-errors #5977
 - [enhancement][build] check: fix clang-tidy-diff commands #5972
 - [bug][build] Suppress NVCC warning #177-D #6094
 
XLA enhancements:
- [bug][xla] Make the blob header memory aligned. #5286
 
System:
- [enhancement][system] Refactor Memory Zone #5072
 - [enhancement][system] Add interface InferContext::OutputTensorDesc #5219
 - [bug][system] Lazy construct functor to make sure that the operators has already been registered. #5225
 - [enhancement][system] Refactor infer ctx output isdynamic #5220
 - [enhancement][system] Refactor infer ctx input isdynamic #5211
 - [enhancement][system] Wake up the heartbeat thread immediately #5081
 - [enhancement][system] Fix xla test case fail #5203
 - [enhancement][system] Add interface InferContext::InputDType #5153
 - [purge][system] delete const_cast in Output #5196
 - [feature][system] Add hwloc for topology detection #5291
 - [enhancement][system] fix registry may segment #5336
 - [enhancement][system] Use functional api instead of op_expr_helper::XXXOp. #5364
 - [enhancement][system] move btob to op #5274
 - [documentation][system] Add Latest News section in README #5361
 - [enhancement][bug][system] fix dropout module: return directly if not training #5346
 - [bug][system] add missing JUST #5357
 - [documentation][system] Add more communication outlets on README #5359
 - [enhancement][feature][system] CommNet dynamic register memory #5281
 - [enhancement][system] Use symbol device #5341
 - [enhancement][system] fix multithread bug in env #5283
 - [bug][system][api] fix bug in cfg_replacement #5335
 - [bug][system] Fix create log directory thread-unsafe #5326
 - [bug][system] fix_bug_in_make_parallel #5328
 - [enhancement][system][cfg] replace train_conf, job_conf using cfg::xx #5263
 - [enhancement][system][quantization] support tensorrt in qat #5287
 - [enhancement][system][api] Export functional apis for oneflow.experimental. #5313
 - [enhancement][system] fix bug check between cfg enum and proto enum #5285
 - [enhancement][system] replace CHECK_EQ using CHECK_EQ_OR_RETURN #5279
 - [enhancement][system] Refactor SbpXXX to cfg::SbpXXX #5120
 - [enhancement][system][api] add detach for LazyMirroredtensorImpl #5270
 - [enhancement][system] shorten XXIsDynamic4ArgNameAndIndex to be xxIsDynamic #5265
 - [enhancement][system][cfg] job_config to cfg #5235
 - [feature][system] Multi-Client LogicalRun degenerate to PhysicalRun #5479
 - [enhancement][system] fix ConstructOp without JUST #5480
 - [enhancement][system] Output arg modifier return maybe part 1 #5451
 - [feature][system][interface] Fea/nn graph/graph build ctx #5420
 - [enhancement][system] Throw exception if check failed #5457
 - [feature][system] multi client launch #5372
 - [enhancement][system][api] Optimize reduce mean #5452
 - [enhancement][system] export Tensor only to python #5440
 - [enhancement][system] Output arg modifier return maybe part_0 #5447
 - [enhancement][system] ThreadMgr support AddPlan #5450
 - [enhancement][system] Refactor infer ctx input tensordesc #5226
 - [enhancement][system][api] instruction builder return maybe #5442
 - [feature][system][interface] MultiClientSessionContext #5421
 - [enhancement][feature][system] add launcher, update multi client launch and exit #5414
 - [purge][system][refactor] Remove IOConf #5419
 - [enhancement][system] Dev refine generator #5426
 - [enhancement][system] Support inplace operations #5204
 - [enhancement][system][refactor] Dev refactor generator #5397
 - [enhancement][system] Add new placement init func #5408
 - [enhancement][system] NNGraphIf #5387
 - [enhancement][system][refactor] Cast explicitily in unpack call to avoid confilt with Optional. #5380
 - [enhancement][system][interface] [Random Generator] Part2: Migrate functional dropout #5378
 - [enhancement][system] replace ForeignJobInstance using JobInstance #5374
 - [enhancement][system][refactor] Speedup reshape module by 5x. #5381
 - [feature][system][interface] [Random Generator] Part1: Dev random generator #5360
 - [enhancement][system] Add ONEFLOW_STREAM_CUDA_EVENT_FLAG_BLOCKING_SYNC #5612
 - [enhancement][system] [part2]Remove singleclient outdated api #5568
 - [feature][system][interface] nn.Graph call and launch impl #5580
 - [enhancement][system] remove outdated doctest api and "@experimental_api" #5564
 - [feature][system][interface] Register ForeignCallback and Watcher in Multi-Client #5591
 - [enhancement][system] [Part-1]remove outdated api and files of multi-client on master branch #5556
 - [feature][system][interface] LazyInterpret build LocalTensor if input is local #5582
 - [enhancement][system] add job_pass MultiClientAutoSourceAndSinkTick #5507
 - [feature][system] Fea/nn graph/optimizer #5533
 - [feature][system][interface] New/CloseRuntimeBuffers and RunLazyJob impl #5571
 - [feature][system][refactor][interface] NNGraph interface and implement for CompileAndRuntime #5558
 - [feature][system] Fea/nn graph/forward graph #5516
 - [enhancement][system] Lazy job stream type #5389
 - [enhancement][system] Refactor single client autotick #5506
 - [enhancement][system] replace underline using dot in single client #5547
 - [bug][system] fix return type #5548
 - [feature][system][interface] LazyInterpret for UserOpExpr #5544
 - [enhancement][system] Add ProfilerStart/ProfilerStop API #5542
 - [feature][system][interface] LazyInterpreter for FetchOutputOpExpr and set op parallel_distribution #5527
 - [enhancement][system] Multi client push pull #5492
 - [enhancement][system] registry_callback_fn return maybe #5456
 - [enhancement][system] bw_gen_fn return maybe #5455
 - [enhancement][system] gen_bw_fn return maybe #5454
 - [enhancement][system] Compatible single client #5417
 - [feature][system][interface] GlobalMultiClientEnv and refine EagerExecution #5523
 - [enhancement][system] Job pass maybe system #5503
 - [enhancement][system] Remove Plan::net_topo #5502
 - [feature][system][interface] LazyInterpret for FeedVariableOpExpr #5490
 - [enhancement][system] Input arg modifier return maybe #5453
 - [feature][system][interface] Fea/nn graph/block scope #5498
 - [feature][system] jit_fuse_cast_scale #5332
 - [enhancement][system] Remove obsolete Profiler #5747
 - [enhancement][system][api] Dev fix batch norm not stats #5733
 - [enhancement][system] rename rpc_token to TransportToken #5735
 - [enhancement][system][api] Refacotr maximum minimum py2cpp #5724
 - [enhancement][system] Replace piece_id with comm_net_sequence_number #5731
 - [enhancement][system] beautify stack frame #5686
 - [enhancement][system] Add env ONEFLOW_KERNEL_DISABLE_BLOB_ACCESS_CHECKER #5728
 - [enhancement][system] Add env ONEFLOW_THREAD_ENABLE_LOCAL_MESSAGE_QUEUE #5720
 - [enhancement][system][api][refactor] Refactor functional sub, mul and div apis #5713
 - [feature][system] ddp #5008
 - [enhancement][system][api][refactor] Refactor functional matmul and add apis. #5697
 - [bug][system] Fix ClearKV("plan") #5710
 - [enhancement][system] Rename cpu to async cpu #5712
 - [enhancement][system] Support tensor.to()/to_local() #5271
 - [feature][system][refactor][interface] Multi-Runtime for multi nn.Graph #5683
 - [bug][system][refactor] Add tag for Optional inplace constructor #5619
 - [enhancement][system] Move Global to env scope #5670
 - [enhancement][system] add JUST wrapper #5681
 - [enhancement][system] New sync consistent meta info #5634
 - [enhancement][system][refactor][interface] Refactor RuntimeCtx for multi-runtime #5664
 - [feature][system][interface] Feat: memory shared between EagerTensor with VariableRegst #5649
 - [enhancement][system] Use functional call directly instead of construct a module and then call-Add #5613
 - [enhancement][system] disable eager_op consistent mode #5647
 - [enhancement][system] add msg_penddin_list in ibverbs_qp to optimize qp_init_attr.cap.max_send_wr #5485
 - [enhancement][system] IBVerbsCommNet add knobs #5626
 - [enhancement][system] Prune python tensor #5596
 - [feature][system][interface] Feat: LazyInterpret infer op / tensor ParallelDescScope #5625
 - [enhancement][system] Replace src tick with with wait and send ids #5603
 - [enhancement][system] Support symbol placement type in functional. #5627
 - [enhancement][system][api][refactor][interface] Dev advanced indexing #5559
 - [enhancement][system] Optimize maybe. #5839
 - [enhancement][system] Decorator 4 disable recursive boxing call #5796
 - [enhancement][system] add_eager_boxing_and_op_interpreter_dispatch_error_info #5819
 - [enhancement][system] Kernel CUDA Graphs Support #5725
 - [bug][system] Fix placement print bug #5853
 - [bug][system] when error msg formatting fails, return error->DebugString #5844
 - [enhancement][system][refactor] Rename variables named 
*parallel_distribution*to*nd_sbp*(1) #5815 - [feature][system][interface] Support Free EagerTensor caught in nn.Graph build #5777
 - [enhancement][system] Reuse CUDA event / Refine BnInOp2Blob / Refine channel #5837
 - [enhancement][system][serving] fix bug in AddInputOutputOpsPass: check existence of key in HashMap(inferface_lbi2scope_sym_id) #5653
 - [enhancement][system][api] unpack_call: impl new 
unpack_call_dispatcherfor better performance #5820 - [feature][system] Feat consistent tensor python constructor #5812
 - [feature][system] Support 0shape tensor #5620
 - [documentation][system] fix launcher description #5770
 - [feature][system][interface] Multi-nn.Graph memory reuse by Chunk manager #5658
 - [bug][system] Fix naive b2p error #5806
 - [enhancement][system] set created generator with default rng seed #5801
 - [enhancement][system] enhance_local_to_consistent #5761
 - [feature][system] add flow.randn #5736
 - [enhancement][system] Refactor hierarchical parallel cast autograd #5764
 - [enhancement][system] Collective boxing executor add_plan delete_plan #5495
 - [enhancement][system] Fix throw abort #5795
 - [enhancement][system] DECORATE #5794
 - [enhancement][system] Inferface eager boxing #5682
 - [enhancement][system] extract_consistent_to_consistent_op_expr #5870
 - [enhancement][system] disable backward pass consistent tensor meta check. #5871
 - [enhancement][system] Add CudaStreamIndexGenerator::GenerateNamedStreamIndex #5940
 - [bug][system] Only query PCI bus id when CUDA version >= 11 #5937
 - [enhancement][system] maybe: add 
JUST_MSGandCHECK_JUST_MSG#5904 - [bug][system] Fix bug scalar #5950
 - [enhancement][system] framework: fix rvalue reference warnings #5948
 - [purge][system] Remove CudaWorkType #5942
 - [enhancement][system] refactor_symbol #5941
 - [bug][system] consistent_tensor_infer_cache: fix memory leak #5938
 - [feature][system] support to print gpu #5936
 - [enhancement][system] Bugfix static check #5935
 - [bug][system] fix nccl_version log #5934
 - [bug][system] Fix bug of multi-GPU train nn.Graph extra mem cost in rank 0 #5930
 - [enhancement][system] Only gradient acc be scheduled in parallel. #5926
 - [enhancement][bug][system] fix_ddp_bug_on_8_process #5929
 - [enhancement][system] Fix bug error msg format #5866
 - [feature][system] print consistent tensor data #5902
 - [bug][system] Move parse env to the constructor #5922
 - [enhancement][system] Remove GlobalWorkStreamId/GlobalThrdId #5917
 - [bug][system] shared_or_scalar: fix alias warnings #5916
 - [purge][system] Remove CompActor #5919
 - [enhancement][system] Use symbol dtype #5641
 - [enhancement][feature][system] Control Graph / Session / Env's python c++ object destruction #5845
 - [enhancement][bug][system] Sync access and assign indexing tensor. #5907
 - [enhancement][system][api][refactor] Dev consistent arange #5883
 - [enhancement][system] Lazy interpreter for new ConsistentToConsistentOpExpr #5903
 - [bug][system] Fix BUG of LazyInterpret FreeEagerTensor memory shared with regst #5891
 - [bug][system] fix typo in 
raise RuntimeError#5890 - [enhancement][system][refactor] Rename the 
ParallelDistributionclass toNdSbp#5814 - [feature][system] add flow.rand #5722
 - [feature][system] Lazy Interpret support infer default device cpu #5880
 - [enhancement][system] Tensor str #5783
 - [feature][system][interface] Lazy to_consistent #5774
 - [enhancement][system] wait vm empty before exiting #5860
 - [enhancement][system] Eager boxing n to 1 #5949
 - [enhancement][system] add kernel observer #6052
 - [enhancement][ci][system] Optimize ddp broadcast and add speed/memory test in ci #6044
 - [enhancement][system] add var to control only print warning once when blocked #6045
 - [enhancement][system][refactor] Rewrite pow and logical functional apis #6032
 - [enhancement][system] Token seq id #5964
 - [enhancement][documentation][system] Remove python function wrapper. #6012
 - [feature][system] Add timeout and loc for blocking calls #6007
 - [enhancement][system] Eager boxing 1 to n #5943
 - [enhancement][system] Boxing expr #6015
 - [enhancement][system] new_X_to_B #5987
 - [enhancement][system] Add unimplemented return information #5952
 - [enhancement][system] Revert "Faster decorator" #6006
 - [enhancement][system] Throw exception if using advanced indexing for tensor setitem #6001
 - [enhancement][system] Support eager boxing sm 2 sn #5869
 - [enhancement][system] Move framework/local_dep_object.* to the eager directory #5988
 - [enhancement][system] Fix builtin op arg tuple. #5464
 - [feature][system][refactor] Dev functional multiple signatures #5982
 - [enhancement][system] Faster decorator #5996
 - [enhancement][system] Placed nd sbp #5995
 - [feature][system] Support asymmetric input/output/variable tensors in nn.Graph #5983
 - [enhancement][system] LightActor #5868
 - [bug][system] Prevent running oneflow in forked subprocess #5976
 - [bug][system] common/error: fix build error in mac os #5971
 - [bug][system] fix_bug_test_tensor_str #5958
 - [enhancement][system] Refine StreamContext #6191
 - [enhancement][system] container_util: fix VectorAt, remove useless MutMapAt #6172
 - [enhancement][system] Typesafe KernelState #6198
 - [enhancement][system] Primitive based copy task node #6195
 - [feature][system][interface] Lazy support Scalar #6181
 - [enhancement][system] Disable implicit boxing when parallel num eq one #6188
 - [enhancement][system] Primitive #6183
 - [enhancement][system] Remove IDMgr::GetGpuPhyIdFromThrdId/IDMgr::GetDeviceTypeFromThrdId #6169
 - [enhancement][system] remove op_expr_helper inside gradient_funcs #6057
 - [feature][system][api] Add tensor yaml, support export tensor functional api. #6099
 - [feature][system] Plan memory log #6151
 - [feature][system] Add dtype bfloat16 #5304
 - [enhancement][system] StreamContext #6129
 - [bug][system] Fix wrong inplace acc grad #6146
 - [enhancement][system] UserKernel remove job_desc #6144
 - [enhancement][system][api] Fea/graph/add outputs buffer to enable pipeline #6126
 - [enhancement][system] not fuse request for nccl 2.10.3 #6136
 - [bug][system] NewUniqueId thread safe #6141
 - [enhancement][system] XRT remove job_desc #6139
 - [enhancement][system] SystemOpFillJobNamePass #6138
 - [enhancement][system] mv_boxing_folder_to_core #6140
 - [enhancement][system] Refactor boxing interpreter to boxing expr #6134
 - [enhancement][system] Eager boxing one to one #6048
 - [enhancement][system] Vm cpu efficiency #6110
 - [enhancement][system] Naive generic boxing #6116
 - [feature][system] send/recv #5992
 - [enhancement][system] disable_print_stack_in_tensor_numpy #6123
 - [feature][system] add all_reduce by to_consistent #5963
 - [enhancement][system] KernelContext #6084
 - [enhancement][bug][system] Fix sync nccl and async nccl deadlock #6071
 - [bug][system][refactor] Refactor to local #6098
 - [enhancement][system] Replace xor with hash combine (part 1) #6078
 - [enhancement][system] Optimize error message #6073
 - [enhancement][system] Rename Error::xx to Error::xxError #6049
 - [enhancement][system] send formatted msg to glog #5999
 - [feature][bottleneck][bug][system][interface] [Feat.] NNGraph new eager tensor for new variable created in JobPass #6091
 - [bug][system] Fix bug of multi-GPU eager copy D2H extra mem cost in rank 0 #6092
 - [enhancement][system][api] Rename module flow.F to flow._C #6053
 - [feature][system][interface] [Feat.] Eager consistent OFRecordReader #6089
 - [enhancement][system][api] Dev fix and align interface #6075
 - [feature][bottleneck][bug][system][interface] NNGraph input/output valid by register tensors #6240
 - [bug][system][interface] Fix bug of Multi-Client src tick output order #6221
 - [enhancement][bug][system] Add cast primitive #6234
 - [feature][bottleneck][system][interface] Auto FixPipelineStageIdPass #6204
 - [enhancement][system] move scalar to oneflow namespace. #6235
 - [enhancement][system] UserKernel init CUDA Graphs with state #6230
 - [feature][system] Comm broadcast #6213
 - [enhancement][system][refactor] Rename opname to optype_name in AutogradEngine #6154
 - [enhancement][system] Add memset primitive #6218
 - [enhancement][system] Add StreamContext::device_type()/DeviceCtx::device_type() #6217
 - [feature][system] add all_gather and fix bug of multi rank doctest #6189
 - [feature][system][interface] [Feat.] Lazy interpreter skip hierarchical_parallel_cast #6208
 - [purge][system] Cleanup KernelUtil #6212
 - [enhancement][system] StreamContextAdapter #6205
 - [enhancement][system] Dev eliminate gcc warnings #6199
 - [feature][bottleneck][system][interface] [Feat.] nn.Graph support grad acc with input/output tensor #6155
 - [enhancement][system] Cpu symetric s to s #6153
 - [enhancement][system][upload-core] Op expr infer tensor meta #5064
 - [enhancement][system] Infer consistent tensor meta #5362
 
CI enhancements:
- [bug][ci][api][interface] Refine module test #5232
 - [enhancement][ci] Add Simple CI, runs CPU-only on GitHub hosted servers #5207
 - [enhancement][ci] Run exe test in CPU-only #5202
 - [enhancement][ci] Cancel all workflow runs but the latest #5206
 - [enhancement][ci] Fix master not running Simple CI #5368
 - [enhancement][ci] Refine Simple CI and Clang analysis #5367
 - [enhancement][feature][bug][ci][documentation][interface] Fix upsample bilinear bug #5363
 - [enhancement][ci] Build nightly for py39 #5318
 - [enhancement][ci] Try distributed run for 3 times to prevent failure #5305
 - [enhancement][ci] Upload Simple CI logs to cloud #5268
 - [enhancement][ci] Remove cpu_op_eager and cuda_op_eager #5470
 - [bug][ci] fix segfault in clang plugin #5437
 - [enhancement][ci] Refine Simple CI error output #5435
 - [enhancement][ci] Add conda env to Simple CI #5385
 - [enhancement][ci] Fix clang plugin core file not found #5390
 - [bug][ci] upload core when build with clang plugin #5384
 - [bug][ci] clang plugin skip more files #5373
 - [enhancement][ci] Use gh-action-scheduler-v2 #5370
 - [enhancement][ci] relax speed threshold #5569
 - [bug][ci] Fix wrong test path under compatible #5567
 - [enhancement][ci][need-simple-ci] Prevent upload logs automatically #5560
 - [enhancement][ci][interface] Add 
nn.AdaptiveAvgPool1dandnn.AdaptiveAvgPool3d#5445 - [feature][ci] add speed test in ci #5496
 - [enhancement][ci] Reduce usage of Simple CI #5546
 - [feature][bug][ci][api] Restruct upsample module #5524
 - [feature][ci] multi client launcher test #5488
 - [enhancement][ci] Remove automerge if cuda_new_interface failed #5519
 - [enhancement][ci] Prevent adding subdir in python/test #5514
 - [enhancement][ci] piprepo->pipindex #5517
 - [enhancement][ci] add dynamic_loss_scale in ci tests #5337
 - [enhancement][ci] Add timeout for wait_gpu_slot #5497
 - [enhancement][feature][ci] new static check based on clang-tidy #5476
 - [enhancement][ci] Fix url not downloadable in some browers #5701
 - [feature][ci] multi client multi machine test #5685
 - [enhancement][ci] Add cpu new interface CI #5639
 - [enhancement][ci][need-simple-ci] Mv clangtidy to simple ci #5667
 - [enhancement][ci][need-simple-ci] use clang tidy appimage in ci #5841
 - [enhancement][ci] Use gcc 7 in release to prevent error #5840
 - [enhancement][ci] bn tol 1e-4 => 1e-3 #5811
 - [enhancement][ci] fix distributed run on built dir #5810
 - [enhancement][ci] fix third party mirror check_sum #5802
 - [ci][documentation] find more accurately which files need to be doctested #5782
 - [enhancement][ci] Print stack unconditionally #5779
 - [enhancement][ci][need-simple-ci] Enable more checkers for clang-tidy in CI #5738
 - [enhancement][ci] CI: add clang-tidy check to test.yaml #5920
 - [ci][documentation] fix docstring in oneflow.nn.functional namespace #5807
 - [enhancement][ci] disable TREAT_WARNINGS_AS_ERRORS in Release CI #5886
 - [enhancement][ci] Skip ci jobs by git diff #5863
 - [bug][ci] quick fix #5978 #6030
 - [enhancement][bug][ci] fix clang tidy diff options and file format #5990
 - [enhancement][ci] add flow.relu #5847
 - [enhancement][ci] equal => allclose #6164
 - [bug][ci][need-simple-ci] CI: fix clang tidy checks in simple ci #6161
 - [enhancement][bug][ci][documentation][api] add interpolate and layer_norm docs #6157
 - [bug][ci] update speed test #6113
 - [enhancement][bug][ci][documentation][api] speed import oneflow #6107
 - [bug][ci] Also try install dev deps for CODEGEN_PYTHON_EXECUTABLE #6115
 - [bug][ci][need-simple-ci] set gtest_CMAKE_DEBUG_POSTFIX "d" #6085
 - [enhancement][ci] add cache init file for clang and CI build with clang #6062
 - [enhancement][ci] add emoji in speed test output, make it continue-on-error #6214
 
Test enhancements:
- [bug][test][interface] Fix acos ci bug #5217
 - [feature][test] implement automated test #5321
 - [enhancement][test] move generator test into ops folder to accelerate tests #5472
 - [feature][test][api] Add autotest part2 #5467
 - [enhancement][test][api][interface] Add some tests with the new framework for auto testing #5561
 - [bug][test] fix test error when do multi case test on graph #5590
 - [enhancement][test] Refine module test using auto test by yaochi #5484
 - [enhancement][test] Add autotest for BatchNorm2d #5734
 - [enhancement][test] RTH_update_op_test #5823
 - [enhancement][test] dev adamw graph config #5745
 - [feature][test][api][interface] Add new autotest #5562
 - [bug][test] restore test of alexnet graph #5798
 - [enhancement][test][interface] add zhangshen op-test #5600
 - [feature][bug][tooling][test][interface] Record autotest wrong code #5923
 - [enhancement][feature][test][api] add randint #5718
 - [bug][test] fix multi machine test #5984
 - [enhancement][test][interface] some op test #6095