show L2 norm of parameters during training. #3925

csukuangfj · 2020-02-12T01:07:30Z

In addition, set affine to false for batchnorm layers and switch to SGD optimizer.

The training is still running and
a screenshot of the L2-norms of the training parameters is as follows:

I will post the decoding results once it is done.

In addition, set affine to false for batchnorm layers and switch to SGD optimizer.

danpovey · 2020-02-12T02:33:53Z

Make sure that the batch-norm for the input has affine=False as well. (I see input_batch_norm.weight etc.) Or at least try that.. I'm not 100% sure what effect it will have.

…

On Wed, Feb 12, 2020 at 9:07 AM Fangjun Kuang ***@***.***> wrote: In addition, set affine to false for batchnorm layers and switch to SGD optimizer. The training is still running and a screenshot of the L2-norms of the training parameters is as follows: [image: Screen Shot 2020-02-12 at 09 05 51] <https://user-images.githubusercontent.com/5284924/74293834-fc253d00-4d76-11ea-9b37-a04953891ee1.png> I will post the decoding results once it is done. ------------------------------ You can view, comment on, or merge this pull request online at: #3925 Commit Summary - show L2 norm of parameters during training. File Changes - *M* egs/aishell/s10/chain/inference.py <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-0> (16) - *M* egs/aishell/s10/chain/model.py <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-1> (22) - *M* egs/aishell/s10/chain/options.py <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-2> (7) - *M* egs/aishell/s10/chain/tdnnf_layer.py <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-3> (6) - *M* egs/aishell/s10/chain/train.py <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-4> (40) - *M* egs/aishell/s10/local/run_chain.sh <https://github.com/kaldi-asr/kaldi/pull/3925/files#diff-5> (3) Patch Links: - https://github.com/kaldi-asr/kaldi/pull/3925.patch - https://github.com/kaldi-asr/kaldi/pull/3925.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO2WI45JN2EUQEERBTDRCNDV5A5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMY3BPQ>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO7XSLN3SIVS66K4D33RCNDV5ANCNFSM4KTPRNKQ> .

csukuangfj · 2020-02-12T02:37:15Z

I kept affine==True for input batchnorm since I wanted it to simulate LDA, which is an affine transform.

I will set affine==False now.

* change stride kernel(3,1) to stride kernel(2,2) * make subsampling readable * make model trainable

danpovey · 2020-02-12T13:19:22Z

Should I merge this?

csukuangfj · 2020-02-12T13:20:49Z

I am running training with affine == False for the input batch norm.

Please wait for a moment.

csukuangfj · 2020-02-12T15:12:46Z

I am comparing

adam + affine == false/true
sgd + affine == false/true

where affine == false/true is for the input batch norm layer.

I will add DistributedDataParallel to use multiple GPUs to reduce training time.

fanlu · 2020-02-13T02:06:42Z

	Pytorch TDNNF stride kernel(2,2)	+ shuffle=True in egs dataset DataLoader
test_cer	8.45	8.33
test_wer	17.37	17.16
dev_cer	7.03	6.84
dev_wer	15.22	15.00

this result is based on this recipe
data generated by #3868 tdnn_1c recipe till stage 16 and run_chain.sh begin at stage 3
the lda mat is at exp/chain_cleaned_1c/tdnn1c_sp/lda.mat
adam optimizer multi step lr with milestones [1,3,5]

csukuangfj · 2020-02-14T00:09:48Z

I have performed 4 experiments with the following settings:

adam + affine == false
adam + affine == true
sgd + affine == false
sgd + affine == true

where affine == false means the input batchnorm is with affine == false

The results are as follows

	adam (false)	adam (true)	sgd (false)	sgd (true)
test cer	9.08	9.10	10.34	10.37
test wer	17.90	17.84	19.41	19.45
dev cer	7.28	7.35	8.27	8.43
dev wer	15.44	15.51	16.68	16.90

For the AIShell dataset:

adam has a better performance than sgd
we should set affine==false for the input batchnorm.

csukuangfj · 2020-02-14T00:12:04Z

@fanlu 's results show that lda has a lower cer/wer than input batchnorm.

The problem is that we have to build a nnet3 network explicitly to get lda.mat in the current
kaldi's implementation.

csukuangfj · 2020-02-14T02:03:02Z

enabling shuffle in egs dataloader improves a little bit of cer/wer.

	adam (affine==false), without shuffle	adam (affine == false), with shuffle
test cer	9.17	9.05
test wer	17.91	17.79
dev cer	7.34	7.15
dev wer	15.52	15.29

danpovey · 2020-02-14T07:17:17Z

Guys, for now the aim is to reproduce the Kaldi system, so please use SGD not Adam. Anyway you can't really compare properly without doing a parameter sweep e.g. on learning rate or l2; you'd have to look at the train/valid difference to see whether the SGD one is overfitting or underfitting. RE the LDA: please use deltas, which are as good, and easy to implement. You'll have to pad the input with 2 more frames. Look at steps/libs/nnet3/xconfig/trivial_layers.py ``` class XconfigDeltaLayer(XconfigLayerBase): """This class is for parsing lines like 'delta-layer name=delta input=idct' which appends the central frame with the delta features (i.e. -1,0,1 since scale equals 1) and delta-delta features (i.e. 1,0,-2,0,1), and then applies batchnorm to it. ``` So the numbers above are scaling factors, e.g. -1,0,1 means frame -1*(frame t-1 ) + 1*(frame t). Do batchnorm after that.

…

On Fri, Feb 14, 2020 at 10:03 AM Fangjun Kuang ***@***.***> wrote: enabling shuffle in egs dataloader improves a little bit of cer/wer. adam (affine==false), without shuffle adam (affine == false), with shuffle test cer 9.17 9.05 test wer 17.91 17.79 dev cer 7.34 7.15 dev wer 15.52 15.29 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO4U46GB77TKBZXS4R3RCX3VPA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELXJPHQ#issuecomment-586061726>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3ZLFFWCV3C5CMSWF3RCX3VPANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-14T07:32:52Z

BTW, it seems the current PyTorch model (TDNN_1c, that is @csukuangfj' model in #3892, not @fanlu's new model) is some sort of underfitting, result will get better (best until now) with lower l2_regularize.

	TDNN-F(Pytorch)	tdnn_1c_rd_rmc_rng
dev_cer	6.70	5.99
dev_wer	14.75	13.86
test_cer	8.13	7.08
test_wer	16.83	15.72

the config of mine is

-    opts.l2_regularize = 5e-4
+    opts.l2_regularize = 1e-4
     opts.leaky_hmm_coefficient = 0.1
+    opts.xent_regularize = 0.1
+    opts.out_of_range_regularize = 0.01

I used Adam with affine=true.

Besides l2_regularize, I guess the threshold of clip_grad_value_ will matter as well, I tried to set clip_grad_value_ (model.parameters(), 4.8) instead of 5.0 in the current script, but it will lead warning of nnet outputs outside of range.

Lower l2_regularize should be tried, for comparing, this is current global average objf:

2020-02-14 14:39:04,638 INFO [train.py:102] Process 9100/9444(96.357476%) global average objf: -0.039630 over 51210880.0 frames, current batch average objf: -0.036466 over 6400 frames, epoch 5
2020-02-14 14:39:24,135 INFO [train.py:102] Process 9200/9444(97.416349%) global average objf: -0.039632 over 51779328.0 frames, current batch average objf: -0.033343 over 6400 frames, epoch 5
2020-02-14 14:39:43,281 INFO [train.py:102] Process 9300/9444(98.475222%) global average objf: -0.039640 over 52335744.0 frames, current batch average objf: -0.031165 over 6400 frames, epoch 5
2020-02-14 14:40:02,384 INFO [train.py:102] Process 9400/9444(99.534096%) global average objf: -0.039642 over 52895360.0 frames, current batch average objf: -0.050952 over 3840 frames, epoch 5
2020-02-14 14:40:10,763 INFO [common.py:61] Save checkpoint to exp/chain_cleaned_pybind/tdnn1c_sp/best_model.pt: epoch=5, learning_rate=1.5625e-05, objf=-0.039643600787482226

danpovey · 2020-02-14T07:44:22Z

Great! Was that with SGD? I don't think gradient clipping should be necessary (or even helpful) for a non-recurrent network. I suspect, also, that that failure was random and not connected to such a tiny change in the gradient clipping (anything less than a factor of 2 is no change at all, IMO).

…

On Fri, Feb 14, 2020 at 3:32 PM Haowen Qiu ***@***.***> wrote: BTW, it seems the current PyTorch model (TDNN_1c, that is @csukuangfj <https://github.com/csukuangfj>' model in #3892 <#3892>, not @fanlu <https://github.com/fanlu>'s new model) is some sort of underfitting, result will get better (*best* until now) with lower l2_regularize. TDNN-F(Pytorch) tdnn_1c_rd_rmc_rng dev_cer 6.70 5.99 dev_wer 14.75 13.86 test_cer 8.13 7.08 test_wer 16.83 15.72 the config of mine is - opts.l2_regularize = 5e-4 + opts.l2_regularize = 1e-4 opts.leaky_hmm_coefficient = 0.1 + opts.xent_regularize = 0.1 + opts.out_of_range_regularize = 0.01 I used Adam with affine=true. Besides l2_regularize, I guess the threshold of clip_grad_value_ will matter as well, I tried to set clip_grad_value_ (model.parameters(), 4.8) instead of 5.0 in the current script, but it will lead warning of nnet outputs outside of range. Lower l2_regularize should be tried, for comparing, this is current global average objf: 2020-02-14 14:39:04,638 INFO [train.py:102] Process 9100/9444(96.357476%) global average objf: -0.039630 over 51210880.0 frames, current batch average objf: -0.036466 over 6400 frames, epoch 5 2020-02-14 14:39:24,135 INFO [train.py:102] Process 9200/9444(97.416349%) global average objf: -0.039632 over 51779328.0 frames, current batch average objf: -0.033343 over 6400 frames, epoch 5 2020-02-14 14:39:43,281 INFO [train.py:102] Process 9300/9444(98.475222%) global average objf: -0.039640 over 52335744.0 frames, current batch average objf: -0.031165 over 6400 frames, epoch 5 2020-02-14 14:40:02,384 INFO [train.py:102] Process 9400/9444(99.534096%) global average objf: -0.039642 over 52895360.0 frames, current batch average objf: -0.050952 over 3840 frames, epoch 5 2020-02-14 14:40:10,763 INFO [common.py:61] Save checkpoint to exp/chain_cleaned_pybind/tdnn1c_sp/best_model.pt: epoch=5, learning_rate=1.5625e-05, objf=-0.039643600787482226 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO266CQH3SJHQ7B6K6TRCZCKLA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELX3I6A#issuecomment-586134648>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO3NSUJNYENEPO7JVO3RCZCKLANCNFSM4KTPRNKQ> .

csukuangfj · 2020-02-14T07:48:00Z

global average objf: -0.039642 is quite high.

Mine is about -0.06. I'll try what you have suggested.

I think the default value for opts.out_of_range_regularize is 0.01.

kaldi/src/chain/chain-training.h

Line 74 in 793191b

ChainTrainingOptions(): l2_regularize(0.0), out_of_range_regularize(0.01),

I will

set l2_regularize to 1e-4
set xent_regularize to 0.1
disable gradient clipping

and to see what will happen. And then to use conv1d to compute delta-delta features.

qindazhu · 2020-02-14T07:48:51Z

The above result was gotten with Adam instead of SGD, I tried SGD, but it seems it may require more tricks on the learning rate.

danpovey · 2020-02-14T07:52:26Z

Why do you say that SGD may require more tricks-- what happened? And did you tune the learning rate? opts.out_of_range_regularize is a newly added, relatively obscure feature designed to stop the outputs getting very large and overflowing the denominator computation (which is not done in log space). You can ignore it for now, unless it's already implemented.

…

On Fri, Feb 14, 2020 at 3:48 PM Haowen Qiu ***@***.***> wrote: The above result was gotten with Adam instead of SGD, I tried SGD, but it seems it may require more tricks on the learning rate. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO6JNCFH4AHBQEAZMBLRCZEGJA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELX4NSQ#issuecomment-586139338>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO567GXKKKUIWVCAR4DRCZEGJANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-14T07:52:37Z

global average objf: -0.039642 is quite high.

Mine is about -0.06. I'll try what you have suggested.

I think the default value for opts.out_of_range_regularize is 0.01.

kaldi/src/chain/chain-training.h

Line 74 in 793191b

ChainTrainingOptions(): l2_regularize(0.0), out_of_range_regularize(0.01),

I will

set l2_regularize to 1e-4

set xent_regularize to 0.1

disable gradient clipping

and to see what will happen. And then to use conv1d to compute delta-delta features.

@csukuangfj , the higher of objf, the better...see your code and logs.

danpovey · 2020-02-14T07:55:11Z

BTW, guys, you should monitor the train and valid objective functions separately. Too-big difference means the model is overfitting, which will normally mean the (learning rate * l2) is too small, so one of those should be increased. And vice versa if the difference is quite small.

…

On Fri, Feb 14, 2020 at 3:52 PM Haowen Qiu ***@***.***> wrote: global average objf: -0.039642 is quite high. Mine is about -0.06. I'll try what you have suggested. I think the default value for opts.out_of_range_regularize is 0.01. https://github.com/kaldi-asr/kaldi/blob/793191be209357cd22dd20f4958a570098ac0cf8/src/chain/chain-training.h#L74 I will - set l2_regularize to 1e-4 - set xent_regularize to 0.1 - disable gradient clipping and to see what will happen. And then to use conv1d to compute delta-delta features. @csukuangfj <https://github.com/csukuangfj> , the higher of objf, the better...see your code and logs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLOZUOU5RR6W7QPGF5MTRCZEUNA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELX4WNQ#issuecomment-586140470>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOZ4347XXZC4L73QS5TRCZEUNANCNFSM4KTPRNKQ> .

csukuangfj · 2020-02-14T07:59:08Z

@danpovey
thanks.

I will compute the objective function value of
train_diagnostic.cegs and valid_diagnostic.cegs
generated by get_egs.sh.

qindazhu · 2020-02-14T08:01:10Z

I tried different learning rate value and different rate decay in SGD, but it all gets worse result than Adam. Do you have any idea about how to figure out the reason?

BTW, in my last try with Adam (above), it seems the objf has been up and down at the corresponding learning rate?

 learning_rate=1.5625e-05
2020-02-14 14:36:32,193 INFO [train.py:102] Process 8300/9444(87.886489%) global average objf: -0.039626 over 46693888.0 frames, current batch average objf: -0.037104 over 6400 frames, epoch 5
2020-02-14 14:36:51,451 INFO [train.py:102] Process 8400/9444(88.945362%) global average objf: -0.039618 over 47272320.0 frames, current batch average objf: -0.045614 over 3840 frames, epoch 5
2020-02-14 14:37:10,245 INFO [train.py:102] Process 8500/9444(90.004235%) global average objf: -0.039622 over 47832064.0 frames, current batch average objf: -0.040992 over 6400 frames, epoch 5
2020-02-14 14:37:29,049 INFO [train.py:102] Process 8600/9444(91.063109%) global average objf: -0.039637 over 48383360.0 frames, current batch average objf: -0.043111 over 6400 frames, epoch 5
2020-02-14 14:37:47,949 INFO [train.py:102] Process 8700/9444(92.121982%) global average objf: -0.039640 over 48935296.0 frames, current batch average objf: -0.052075 over 3840 frames, epoch 5
2020-02-14 14:38:07,174 INFO [train.py:102] Process 8800/9444(93.180856%) global average objf: -0.039634 over 49503360.0 frames, current batch average objf: -0.046243 over 4736 frames, epoch 5
2020-02-14 14:38:26,456 INFO [train.py:102] Process 8900/9444(94.239729%) global average objf: -0.039627 over 50077312.0 frames, current batch average objf: -0.038564 over 6400 frames, epoch 5
2020-02-14 14:38:45,522 INFO [train.py:102] Process 9000/9444(95.298602%) global average objf: -0.039628 over 50638592.0 frames, current batch average objf: -0.043415 over 3840 frames, epoch 5
2020-02-14 14:39:04,638 INFO [train.py:102] Process 9100/9444(96.357476%) global average objf: -0.039630 over 51210880.0 frames, current batch average objf: -0.036466 over 6400 frames, epoch 5

danpovey · 2020-02-14T08:11:35Z

With the SGD run, compare the norms of the parameter matrices with those of Kaldi's model and see if any are significantly different. If those printed values are for minibatches, the variation is probably normal; sometimes you'll get easier or harder examples.

qindazhu · 2020-02-14T08:18:02Z

With the SGD run, compare the norms of the parameter matrices with those of Kaldi's model and see if any are significantly different. If those printed values are for minibatches, the variation is probably normal; sometimes you'll get easier or harder examples.

Ok, thanks Dan. I'll try lower l2 first, and then try to tune it with SGD with your suggestion.

And I suggest we should not (at least for now) change the model struct, it seems that we may get close result with the same configuration of tdnn_1c_rd_rmc_rng. Update model config may make things complex.

danpovey · 2020-02-14T08:23:49Z

If you are referring to the delta thing, I think you should make a version of that baseline script that uses deltas, since that's easier to implement, and anyway that's the recommended pattern right now. (Assuming the features are MFCCs). You can refer to the mini_librispeech setup.

…

On Fri, Feb 14, 2020 at 4:18 PM Haowen Qiu ***@***.***> wrote: With the SGD run, compare the norms of the parameter matrices with those of Kaldi's model and see if any are significantly different. If those printed values are for minibatches, the variation is probably normal; sometimes you'll get easier or harder examples. Ok, thanks Dan. I'll try lower l2 first, and then try to tune it with SGD with your suggestion. And I suggest we should not (at least for now) change the model struct, it seems that we may get close result with the same configuration of tdnn_1c_rd_rmc_rng. Update model config may make things complex. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO66ARPOAXGJMUTCZTDRCZHTXA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELX6RVQ#issuecomment-586148054>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO5PVTZ6B3AJ7BT3EELRCZHTXANCNFSM4KTPRNKQ> .

csukuangfj · 2020-02-14T08:24:33Z

@qindazhu
I agree with you.

For my previous pullrequst, I used [-1, 0, 1] for the first linear layer, which
has left context ==1 and right context ==1.

But the current model structure has kernel size 2 for the first linear layer
and kernel size 2 for the second affine layer. I thin the combined TDNN-F layer
has only one-sided context, i.e., either the left context == 2 , right context == 0
or the left context == 0, right context == 2, depending on how PyTorch implements it.

I merged it since fanlu said it has a better cer/wer.

I am wondering whether the left/right context used in generating egs is still relevant
for the current model structure.

qindazhu · 2020-02-14T08:33:45Z

@csukuangfj I just kept all other configuration in your first PR, that is almost the same with tdnn_1c_rd_rmc_rng

model_left_context=28
model_right_context=28
egs_left_context=$[$model_left_context + 1]
egs_right_context=$[$model_right_context + 1]
frames_per_eg=150,110,90
frames_per_iter=1500000
minibatch_size=128

num_epochs=6
lr=1e-3

hidden_dim=1024
bottleneck_dim=128
time_stride_list="1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1" # comma separated list
conv_stride_list="1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1" # comma separated list

The only difference is that I re-used data/features for tdnn_1c_rd_rmc_rng in get_egs, but that is only because I want to avoid extract feature again in PyTorch setup, and I think that does not make a difference.

qindazhu · 2020-02-14T08:36:35Z

Anyway, let's keep the same model configuration (your first pr, tdnn_1c_rd_rmc_rng) for now to compare result easier?

danpovey · 2020-02-18T09:03:25Z

That probably indicates the learning rate needs to be reduced.

…

On Tue, Feb 18, 2020 at 4:59 PM Haowen Qiu ***@***.***> wrote: BTW, for training set, it seems that the objf is frozen after epoch 2? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLOYXD5EK7ZXAURQWU5LRDOPPXA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBEDSQ#issuecomment-587350474>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO5KU7Y3S4RCDF2KBNTRDOPPXANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-18T09:05:04Z

Ok, thanks! Let me just try more epochs first as the l2 and initial learning rate seem right according to my previous experiments (large or small l2/lr would not make objf get better in the epoch 0). And then try to decrease rate in the last epochs

csukuangfj · 2020-02-18T09:11:34Z

$ steps/info/chain_dir_info.pl exp-haowen/chain_cleaned_1c/tdnn1c_sp/
exp-haowen/chain_cleaned_1c/tdnn1c_sp/: num-iters=79 nj=3..12 
num-params=9.3M dim=40->3456 combine=-0.032->-0.032 (over 2) 
xent:train/valid[51,78,final]=(-0.726,-0.550,-0.542/-0.735,-0.584,-0.576) 
logprob:train/valid[51,78,final]=(-0.050,-0.033,-0.033/-0.052,-0.042,-0.041)

yes, I am wondering why I get worse results than haowen's and fanlu's.

I think haowen used the same network architecture as me for the PyTorch, i.e.,
[-1, 0, 1] for the first linear layer.

danpovey · 2020-02-18T09:15:36Z

I assume you were already using a learning rate schedule where it decreases in later epochs (?) That is definitely necessary. Aim for a factor of 10 or 20, maybe.

…

On Tue, Feb 18, 2020 at 5:11 PM Fangjun Kuang ***@***.***> wrote: $ steps/info/chain_dir_info.pl exp-haowen/chain_cleaned_1c/tdnn1c_sp/ exp-haowen/chain_cleaned_1c/tdnn1c_sp/: num-iters=79 nj=3..12 num-params=9.3M dim=40->3456 combine=-0.032->-0.032 (over 2) xent:train/valid[51,78,final]=(-0.726,-0.550,-0.542/-0.735,-0.584,-0.576) logprob:train/valid[51,78,final]=(-0.050,-0.033,-0.033/-0.052,-0.042,-0.041) yes, I am wondering why I get worse results than haowen's and fanlu's. I think haowen used the same network architecture as me for the PyTorch, i.e., [-1, 0, 1] for the first linear layer. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLOYDLME7PU2TUZ27R5DRDOQ4PA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBFHXI#issuecomment-587355101>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO2RPFGYY634SVCHQOLRDOQ4PANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-18T09:18:55Z

Do you follow scripts in this pr #3868? I guess maybe alignment before nnet training matters which differs from your PyTorch scripts?

csukuangfj · 2020-02-18T09:24:31Z

    opts.l2_regularize = 5e-5
    opts.leaky_hmm_coefficient = 0.1
    opts.xent_regularize = 0.1
    opts.out_of_range_regularize = 0.01

    optimizer = optim.Adam(model.parameters(),
                          lr=learning_rate
                          weight_decay=5e-4)

    learning_rate = 1e-3 * (0.4 ^ epoch),  where epoch <- [0,1,2,3,4,5]

I used the above settings suggested by haowen and replaced the multistep learning rate scheduler with

learning_rate = 1e-3 * (0.4 ^ epoch),  where epoch <- [0,1,2,3,4,5]

I guess maybe alignment before nnet training matters

My alignment generated by run.sh is different from your run_kaldi.sh.

Should I replace run.sh with your run_kaldi.sh ?

I will double the number of epochs, i.e., from 6 to 12.

qindazhu · 2020-02-18T09:24:50Z

I assume you were already using a learning rate schedule where it decreases in later epochs (?) That is definitely necessary. Aim for a factor of 10 or 20, maybe.
…
On Tue, Feb 18, 2020 at 5:11 PM Fangjun Kuang @.***> wrote: $ steps/info/chain_dir_info.pl exp-haowen/chain_cleaned_1c/tdnn1c_sp/ exp-haowen/chain_cleaned_1c/tdnn1c_sp/: num-iters=79 nj=3..12 num-params=9.3M dim=40->3456 combine=-0.032->-0.032 (over 2) xent:train/valid[51,78,final]=(-0.726,-0.550,-0.542/-0.735,-0.584,-0.576) logprob:train/valid[51,78,final]=(-0.050,-0.033,-0.033/-0.052,-0.042,-0.041) yes, I am wondering why I get worse results than haowen's and fanlu's. I think haowen used the same network architecture as me for the PyTorch, i.e., [-1, 0, 1] for the first linear layer. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLOYDLME7PU2TUZ27R5DRDOQ4PA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBFHXI#issuecomment-587355101>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2RPFGYY634SVCHQOLRDOQ4PANCNFSM4KTPRNKQ .

Yes, current learning rate is

 optimizer = optim.SGD(model.parameters(),
                           lr=learning_rate,
                           momentum=0.9)

    learning_rate = 1e-3 * (0.5 ^ epoch),  where epoch <- [0,1,2,3,4,5]

the initial lr is 1e-3 and the final lr is 3.125e-5, which is initial_lr * 0.0315

qindazhu · 2020-02-18T09:40:05Z

Should I replace run.sh with your run_kaldi.sh

I guess so, maybe @fanlu can help to confirm this as I never run run.sh before, I just re-use data from run_kaldi.sh to run experiment in PyTorch.

The other difference is that I use files in your first pr #3892, here is my local git log

commit 154e36680bf401bc3eeff3403f899b4be68667c1
Author: Fangjun Kuang <[email protected]>
Date:   Fri Jan 31 12:22:48 2020 +0800

    update training scripts.

diff --git a/egs/aishell/s10/chain/model.py b/egs/aishell/s10/chain/model.py
index 99ab1a0..39d7acb 100644
--- a/egs/aishell/s10/chain/model.py
+++ b/egs/aishell/s10/chain/model.py
@@ -201,6 +201,14 @@ class ChainModel(nn.Module):

         return nnet_output, xent_output

+    def constrain_orthonormal(self):
+        for i in range(len(self.tdnnfs)):
+            self.tdnnfs[i].constrain_orthonormal()
+
+        self.prefinal_l.constrain_orthonormal()
+        self.prefinal_chain.constrain_orthonormal()
+        self.prefinal_xent.constrain_orthonormal()
+

 if __name__ == '__main__':
     feat_dim = 43
@@ -212,3 +220,4 @@ if __name__ == '__main__':
     x = torch.arange(N * T * C).reshape(N, T, C).float()
     nnet_output, xent_output = model(x)
     print(x.shape, nnet_output.shape, xent_output.shape)
+    model.constrain_orthonormal()

commit 7c7dda3bd4bb94071364797a97647073816f08b4
Author: Fangjun Kuang <[email protected]>
Date:   Thu Jan 30 21:10:28 2020 +0800

    update model to use TDNNF.
     
     ......

+                    bottleneck_dim,
+                    time_stride_list,
+                    conv_stride_list,
                     lda_mat_filename=None):
     model = ChainModel(feat_dim=feat_dim,
                        output_dim=output_dim,
                        lda_mat_filename=lda_mat_filename,
                        hidden_dim=hidden_dim,
-                       kernel_size_list=kernel_size_list,
-                       stride_list=stride_list)
+                       time_stride_list=time_stride_list,
+                       conv_stride_list=conv_stride_list)
     return model
    
    ......

fanlu · 2020-02-18T09:55:17Z

@csukuangfj please follow haowen's pr #3868 and rerun egs/aishell/s10/local/run_tdnn_1c.sh before stage 16, and then run your recipe egs/aishell/s10/local/run_chain.sh start with stage 3
the difference is speed_perturb and tree leaves number

csukuangfj · 2020-02-18T09:56:22Z

The model in this pullrequest is equivalent to my first pullrequest.

I will switch to run_kaldi.sh now.

qindazhu · 2020-02-18T11:54:37Z

For SGD, I just find the l2-norm of the parameters are too big compared with Kaldi's:

kaldi:  [tdnn1.affine:13.066 tdnnf2.linear:9.77364 tdnnf2.affine:13.2666 tdnnf3.linear:9.99066 tdnnf3.affine:12.3348 tdnnf4.linear:9.14051 tdnnf4.affine:12.1615 tdnnf5.linear:7.58906 tdnnf5.affine:11.355 tdnnf6.linear:9.04031 tdnnf6.affine:12.423 tdnnf7.linear:8.99408 tdnnf7.affine:12.4773 tdnnf8.linear:8.75175 tdnnf8.affine:12.3263 tdnnf9.linear:8.6019 tdnnf9.affine:12.028 tdnnf10.linear:8.49032 tdnnf10.affine:11.9081 tdnnf11.linear:8.28274 tdnnf11.affine:11.9213 tdnnf12.linear:8.25842 tdnnf12.affine:11.7487 tdnnf13.linear:8.19186 tdnnf13.affine:11.6624 prefinal-l:13.9765 prefinal-chain.affine:11.6501 prefinal-chain.linear:12.3917 output.affine:22.0577 prefinal-xent.affine:10.6057 prefinal-xent.linear:9.92937 output-xent.affine:50.9147 ]
pytorch:[tdnn1.affine:112.8931,tdnnf2.linear:102.4325,tdnnf2.affine:75.7402,tdnnf3.linear:86.0951,tdnnf3.affine:62.5637,tdnnf4.linear:78.2900,tdnnf4.affine:56.5584,tdnnf5.linear:48.1624,tdnnf5.affine:46.7811,tdnnf6.linear:89.7642,tdnnf6.affine:65.7036,tdnnf7.linear:90.6984,tdnnf7.affine:69.4275,tdnnf8.linear:89.6295,tdnnf8.affine:66.2355,tdnnf9.linear:87.3545,tdnnf9.affine:63.2102,tdnnf10.linear:83.4296,tdnnf10.affine:60.1824,tdnnf11.linear:83.5425,tdnnf11.affine:58.1394,tdnnf12.linear:82.4880,tdnnf12.affine:57.8074,tdnnf13.linear:85.2806,tdnnf13.affine:62.5397,prefinal_l:95.0566,prefinal_chain.affine:78.2842,prefinal_chain.linear:95.9710,output_affine:110.7365,prefinal_xent.affine:60.7356,prefinal_xent.linear:75.7621,output_xent_affine:121.6411]

How about yours? just make sure I don't compute it incorrectly

for name, param in model.named_parameters():
    if param.requires_grad and name.endswith('.weight'):
        change_name_to_align_with_kaldi()
        norm_str = norm_str + '{}:{:.4f},'.format(name, torch.norm(param, 2))

csukuangfj · 2020-02-18T12:00:52Z

I would suggest you using ``` torch.norm(param, p='fro') ``` to compute the frobenius norm. Refer to https://pytorch.org/docs/stable/torch.html#torch.norm From your results, it seems that you forgot to take the square root of the PyTorch values.

…

On Tue, Feb 18, 2020 at 7:54 PM Haowen Qiu ***@***.***> wrote: For SGD, I just find the l2-norm of the parameters are too big compared with Kaldi's: kaldi: [tdnn1.affine:13.066 tdnnf2.linear:9.77364 tdnnf2.affine:13.2666 tdnnf3.linear:9.99066 tdnnf3.affine:12.3348 tdnnf4.linear:9.14051 tdnnf4.affine:12.1615 tdnnf5.linear:7.58906 tdnnf5.affine:11.355 tdnnf6.linear:9.04031 tdnnf6.affine:12.423 tdnnf7.linear:8.99408 tdnnf7.affine:12.4773 tdnnf8.linear:8.75175 tdnnf8.affine:12.3263 tdnnf9.linear:8.6019 tdnnf9.affine:12.028 tdnnf10.linear:8.49032 tdnnf10.affine:11.9081 tdnnf11.linear:8.28274 tdnnf11.affine:11.9213 tdnnf12.linear:8.25842 tdnnf12.affine:11.7487 tdnnf13.linear:8.19186 tdnnf13.affine:11.6624 prefinal-l:13.9765 prefinal-chain.affine:11.6501 prefinal-chain.linear:12.3917 output.affine:22.0577 prefinal-xent.affine:10.6057 prefinal-xent.linear:9.92937 output-xent.affine:50.9147 ] pytorch:[tdnn1.affine:112.8931,tdnnf2.linear:102.4325,tdnnf2.affine:75.7402,tdnnf3.linear:86.0951,tdnnf3.affine:62.5637,tdnnf4.linear:78.2900,tdnnf4.affine:56.5584,tdnnf5.linear:48.1624,tdnnf5.affine:46.7811,tdnnf6.linear:89.7642,tdnnf6.affine:65.7036,tdnnf7.linear:90.6984,tdnnf7.affine:69.4275,tdnnf8.linear:89.6295,tdnnf8.affine:66.2355,tdnnf9.linear:87.3545,tdnnf9.affine:63.2102,tdnnf10.linear:83.4296,tdnnf10.affine:60.1824,tdnnf11.linear:83.5425,tdnnf11.affine:58.1394,tdnnf12.linear:82.4880,tdnnf12.affine:57.8074,tdnnf13.linear:85.2806,tdnnf13.affine:62.5397,prefinal_l:95.0566,prefinal_chain.affine:78.2842,prefinal_chain.linear:95.9710,output_affine:110.7365,prefinal_xent.affine:60.7356,prefinal_xent.linear:75.7621,output_xent_affine:121.6411] How about yours? just make sure I don't compute it incorrectly for name, param in model.named_parameters(): if param.requires_grad and name.endswith('.weight'): change_name_to_align_with_kaldi() norm_str = norm_str + '{}:{:.4f},'.format(name, torch.norm(param, 2)) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=ABIKIPF56LOQR4AULDTYQNDRDPD75A5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMBWJII#issuecomment-587424929>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABIKIPGRIB7H7UKDOZ5GF4TRDPD75ANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-18T12:12:40Z

I think norm(param, 2) or norm(param) is equivalent to norm(param, p='fro'), and why should I sqrt again after getting norm as it has been sqrt in norm?

csukuangfj · 2020-02-18T15:06:04Z

@qindazhu

oh, yes, you're right.

Now I have changed the data processing pipeline to match yours
and the script is still running. I will post the results tomorrow.

fanlu · 2020-02-18T16:39:44Z

this is my sgd model's l2-norm

tdnn1_affine: 145.0203 tdnnfs.0.linear: 143.8808 tdnnfs.0.affine: 113.1185 tdnnfs.1.linear: 127.6596 tdnnfs.1.affine: 92.0355 tdnnfs.2.linear: 110.5569 tdnnfs.2.affine: 86.0150 tdnnfs.3.linear: 82.6030 tdnnfs.3.affine: 67.5434 tdnnfs.4.linear: 115.3349 tdnnfs.4.affine: 101.5720 tdnnfs.5.linear: 116.2821 tdnnfs.5.affine: 99.2401 tdnnfs.6.linear: 116.0093 tdnnfs.6.affine: 98.3118 tdnnfs.7.linear: 119.6492 tdnnfs.7.affine: 103.3342 tdnnfs.8.linear: 118.5094 tdnnfs.8.affine: 104.8650 tdnnfs.9.linear: 125.1008 tdnnfs.9.affine: 109.7343 tdnnfs.10.linear: 126.0465 tdnnfs.10.affine: 111.6607 tdnnfs.11.linear: 141.6897 tdnnfs.11.affine: 129.4766 prefinal_chain.affine: 277.1615 prefinal_chain.linear: 466.6338 output_affine: 147.2012 prefinal_xent.affine: 82.3923 prefinal_xent.linear: 115.3727 output_xent_affine: 135.4102

csukuangfj · 2020-02-19T01:40:17Z

@qindazhu

It turns out that your run_kaldi.sh generates a higher number of batches than my run.sh,
i.e., 9447 vs 3161.

There is more data generated in run_kaldi.sh and this may be the reason for
better performance.

In addition, the training time is nearly tripled.

qindazhu · 2020-02-19T01:49:34Z

Yes, it is because of perturb data as Kaldi's latest scripts do. I just thought you have done this as Fanlu and I have put logs in this thread of conversation. Anyway, let's keep this setting as it will make the model be more invariant to test data.

qindazhu · 2020-02-19T01:56:00Z

this is my sgd model's l2-norm

tdnn1_affine: 145.0203 tdnnfs.0.linear: 143.8808 tdnnfs.0.affine: 113.1185 tdnnfs.1.linear: 127.6596 tdnnfs.1.affine: 92.0355 tdnnfs.2.linear: 110.5569 tdnnfs.2.affine: 86.0150 tdnnfs.3.linear: 82.6030 tdnnfs.3.affine: 67.5434 tdnnfs.4.linear: 115.3349 tdnnfs.4.affine: 101.5720 tdnnfs.5.linear: 116.2821 tdnnfs.5.affine: 99.2401 tdnnfs.6.linear: 116.0093 tdnnfs.6.affine: 98.3118 tdnnfs.7.linear: 119.6492 tdnnfs.7.affine: 103.3342 tdnnfs.8.linear: 118.5094 tdnnfs.8.affine: 104.8650 tdnnfs.9.linear: 125.1008 tdnnfs.9.affine: 109.7343 tdnnfs.10.linear: 126.0465 tdnnfs.10.affine: 111.6607 tdnnfs.11.linear: 141.6897 tdnnfs.11.affine: 129.4766 prefinal_chain.affine: 277.1615 prefinal_chain.linear: 466.6338 output_affine: 147.2012 prefinal_xent.affine: 82.3923 prefinal_xent.linear: 115.3727 output_xent_affine: 135.4102

Thanks @fanlu . Seems the L2-norm is really too big. Maybe I should try larger L2-regularize values and tune learning rate again.

qindazhu · 2020-02-19T02:03:17Z

In addition, the training time is nearly tripled.

BTW, for training time, it seems that PyTorch takes the same time with Kaldi. Maybe we should do more things to make it faster. But I suggest we should focus on the WER first and put SPEED aside just for now.

danpovey · 2020-02-19T03:07:34Z

You could try, say, having 4 times the l2 and one quarter the learning rate.

…

On Wed, Feb 19, 2020 at 10:03 AM Haowen Qiu ***@***.***> wrote: In addition, the training time is nearly tripled. BTW, for training time, it seems that PyTorch takes the same time with Kaldi. Maybe we should do more things to make it faster. But I suggest we should focus on the WER first and put SPEED aside just for now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3925?email_source=notifications&email_token=AAZFLO6QGCB3DIQ7B2T74QTRDSHONA5CNFSM4KTPRNK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMGB6UA#issuecomment-587997008>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO4TLM5CVIDUJTPA5VLRDSHONANCNFSM4KTPRNKQ> .

qindazhu · 2020-02-19T03:09:10Z

sure, thanks Dan.

csukuangfj · 2020-02-19T03:56:38Z

Here are the results for the current pullrequest.

	this pullrequest	haowen's PyTorch	fanlu's Pytorch (with kerne [2, 2])	haowen's kaldi
test cer	7.91	7.86	8.33	7.08
test wer	16.49	16.56	17.16	15.72
dev cer	6.48	6.47	6.84	5.99
dev wer	14.48	14.45	15.00	13.86

training time for 6 epochs in total: 3 hours, 0 minute, 54 seconds
time per epoch: about 36 minutes

@danpovey

Please merge this if necessary.

Part of the training log is as follows. It seems that the objf value keeps at -0.04
and stops increasing. Possibly a lower learning rate should be used.

2020-02-19 11:06:56,382 INFO [train.py:156] Process 8100/9447(85.741505%) global average objf: -0.040148 over 45543040.0 frames, current batch average
 objf: -0.053073 over 3840 frames, epoch 5
2020-02-19 11:07:18,288 INFO [train.py:156] Process 8200/9447(86.800042%) global average objf: -0.040145 over 46109568.0 frames, current batch average
 objf: -0.040103 over 6400 frames, epoch 5
2020-02-19 11:07:40,595 INFO [train.py:156] Process 8300/9447(87.858579%) global average objf: -0.040147 over 46676992.0 frames, current batch average
 objf: -0.040909 over 6400 frames, epoch 5
2020-02-19 11:08:02,810 INFO [train.py:156] Process 8400/9447(88.917117%) global average objf: -0.040150 over 47242496.0 frames, current batch average
 objf: -0.055194 over 3840 frames, epoch 5
2020-02-19 11:08:25,285 INFO [train.py:156] Process 8500/9447(89.975654%) global average objf: -0.040137 over 47827456.0 frames, current batch average
 objf: -0.046951 over 3840 frames, epoch 5
2020-02-19 11:08:29,973 INFO [train.py:171] Validation average objf: -0.052603 over 17181.0 frames
2020-02-19 11:08:52,365 INFO [train.py:156] Process 8600/9447(91.034191%) global average objf: -0.040139 over 48395904.0 frames, current batch average
 objf: -0.034007 over 6400 frames, epoch 5
2020-02-19 11:09:14,369 INFO [train.py:156] Process 8700/9447(92.092728%) global average objf: -0.040137 over 48954752.0 frames, current batch average
 objf: -0.052936 over 3840 frames, epoch 5
2020-02-19 11:09:36,613 INFO [train.py:156] Process 8800/9447(93.151265%) global average objf: -0.040130 over 49527168.0 frames, current batch average
 objf: -0.050902 over 3840 frames, epoch 5
2020-02-19 11:09:59,323 INFO [train.py:156] Process 8900/9447(94.209802%) global average objf: -0.040127 over 50103040.0 frames, current batch average
 objf: -0.046969 over 4736 frames, epoch 5
2020-02-19 11:10:21,018 INFO [train.py:156] Process 9000/9447(95.268339%) global average objf: -0.040128 over 50650752.0 frames, current batch average
 objf: -0.034439 over 6400 frames, epoch 5
2020-02-19 11:10:25,709 INFO [train.py:171] Validation average objf: -0.053590 over 17181.0 frames
2020-02-19 11:10:47,536 INFO [train.py:156] Process 9100/9447(96.326876%) global average objf: -0.040129 over 51209856.0 frames, current batch average
 objf: -0.035524 over 6400 frames, epoch 5
2020-02-19 11:11:09,646 INFO [train.py:156] Process 9200/9447(97.385413%) global average objf: -0.040129 over 51772160.0 frames, current batch average
 objf: -0.041880 over 4736 frames, epoch 5
2020-02-19 11:11:31,622 INFO [train.py:156] Process 9300/9447(98.443950%) global average objf: -0.040129 over 52336128.0 frames, current batch average
 objf: -0.036277 over 6400 frames, epoch 5
2020-02-19 11:11:53,590 INFO [train.py:156] Process 9400/9447(99.502488%) global average objf: -0.040132 over 52897280.0 frames, current batch average
 objf: -0.040822 over 6400 frames, epoch 5
2020-02-19 11:12:03,591 INFO [common.py:61] Save checkpoint to exp/chain/train/best_model.pt: epoch=5, learning_rate=1.0240000000000004e-05, objf=-0.0
40137736733864254

danpovey · 2020-02-19T03:59:36Z

Fantastic! And of course thanks to all the others who provided results for comparison: those are valuable too, even if not directly merged.

fanlu · 2020-02-19T05:20:50Z

sync kernel(2,2) latest result

	kernel(2,2)
test cer	7.92
test wer	16.58
dev cer	6.53
dev wer	14.59

danpovey · 2020-02-19T05:22:13Z

cool...

show L2 norm of parameters during training.

b3e6623

In addition, set affine to false for batchnorm layers and switch to SGD optimizer.

csukuangfj mentioned this pull request Feb 12, 2020

pytorch new tdnnf structure #3923

Closed

change stride kernel(3,1) to stride kernel(2,2) (#1)

926f7d1

* change stride kernel(3,1) to stride kernel(2,2) * make subsampling readable * make model trainable

switch to adam and set affine to false for the input batchnorm layer.

55e9750

enable shuffle in dataloader.

8c32349

replace the data processing pipeline with haowen's.

523f9a4

csukuangfj force-pushed the fangjun-tdnnf branch from 9a07339 to 523f9a4 Compare February 18, 2020 23:28

danpovey merged commit f5875be into kaldi-asr:pybind11 Feb 19, 2020

csukuangfj mentioned this pull request Feb 19, 2020

add PyTorch's DistributedDataParallel training. #3940

Merged

csukuangfj mentioned this pull request Feb 27, 2020

WIP: begin to add CTC training with kaldi pybind and PyTorch. #3947

Open

show L2 norm of parameters during training. #3925

show L2 norm of parameters during training. #3925

Conversation

csukuangfj commented Feb 12, 2020

danpovey commented Feb 12, 2020 via email

csukuangfj commented Feb 12, 2020

danpovey commented Feb 12, 2020

csukuangfj commented Feb 12, 2020

csukuangfj commented Feb 12, 2020

fanlu commented Feb 13, 2020 • edited Loading

csukuangfj commented Feb 14, 2020 • edited Loading

csukuangfj commented Feb 14, 2020 • edited Loading

csukuangfj commented Feb 14, 2020

danpovey commented Feb 14, 2020 via email

qindazhu commented Feb 14, 2020

danpovey commented Feb 14, 2020 via email

csukuangfj commented Feb 14, 2020

qindazhu commented Feb 14, 2020

danpovey commented Feb 14, 2020 via email

qindazhu commented Feb 14, 2020

danpovey commented Feb 14, 2020 via email

csukuangfj commented Feb 14, 2020

qindazhu commented Feb 14, 2020

danpovey commented Feb 14, 2020

qindazhu commented Feb 14, 2020

danpovey commented Feb 14, 2020 via email

csukuangfj commented Feb 14, 2020

qindazhu commented Feb 14, 2020

qindazhu commented Feb 14, 2020

danpovey commented Feb 18, 2020 via email

qindazhu commented Feb 18, 2020

csukuangfj commented Feb 18, 2020

danpovey commented Feb 18, 2020 via email

qindazhu commented Feb 18, 2020

csukuangfj commented Feb 18, 2020

qindazhu commented Feb 18, 2020

qindazhu commented Feb 18, 2020

fanlu commented Feb 18, 2020

csukuangfj commented Feb 18, 2020

qindazhu commented Feb 18, 2020

csukuangfj commented Feb 18, 2020 via email

qindazhu commented Feb 18, 2020

csukuangfj commented Feb 18, 2020

fanlu commented Feb 18, 2020

csukuangfj commented Feb 19, 2020

qindazhu commented Feb 19, 2020

qindazhu commented Feb 19, 2020

qindazhu commented Feb 19, 2020

danpovey commented Feb 19, 2020 via email

qindazhu commented Feb 19, 2020

csukuangfj commented Feb 19, 2020

danpovey commented Feb 19, 2020

fanlu commented Feb 19, 2020 • edited Loading

danpovey commented Feb 19, 2020

fanlu commented Feb 13, 2020 •

edited

Loading

csukuangfj commented Feb 14, 2020 •

edited

Loading

csukuangfj commented Feb 14, 2020 •

edited

Loading

fanlu commented Feb 19, 2020 •

edited

Loading