Releases: kingo233/FCT-Pytorch
FCT paper reproduce result
Experiment details
According to the FCT paper.They used filters = [16, 32, 64, 128, 384, 128, 64, 32, 16],attn heads = [2, 4, 8, 12, 16, 12, 8, 4, 2].
But I came into an error Embed_dim must be divisible by num_heads . I modify attn heads to [2, 4, 8, 8, 16, 8, 8, 4, 2]
When I trained on the author's original pytorch code,I came into overfitting.So I use dropout=0.5 in Convolutional Attention.
What's more,I used 1e4 to scale the grad to solve the gradient vanishing which can lead to dice stay same.
But the gradient vanishing have not been solved completely.You can still find the some parts of block_1 from tensorboard.
How to use
tar -xvf dice_87.tar you will get a dir called outputmodel.
The fct model is in output_model/$time/model/fct.pt, you can find out that the size is smaller than original author's pretrained model .
Run pip install tensorboard to install tensorboard.
cd output_model && tensorboard --logdir . will let you visualize my training process in your browser.