Description
Hi Jiahui,
Thanks for the great work. I'm trying to reproduce AutoSlim for CIFAR-10 (Table 2).
Could you please provide a detailed hyperparameter you used for it?
I'm able to train the baseline MobileNetV2 1.0x to 7.9 Top-1 error using the following hyperparameters:
- 0.1 initial learning rate
- linear learning rate decay
- 128 batch size
- 300 epochs of training
- 5e-4 weight decay
- 0.9 nesterov momentum
- no label smoothing
- no weight decay for bias and gamma
To train AutoSlim, I use MobileNetV2 1.5x with the exact same hyperparameter but only trained for 50 epochs on a training set (80% of the real training set). Then, during greedy slimming, I use the extra 20% training set as a validation set to decide channel counts. For greedy slimming, I shrink each layer by a step of 10%, which makes it 10 groups as mentioned in the paper.
The final architecture is trained with the same hyperparameters listed above. But I failed to obtain Top-1 error 6.8% as reported in the paper. I'm getting around 7.8%.
Could you please share with me the final architecture for AutoSlim-MobileNetV2 CIFAR-10 with 88MFLOPs? Also, it would be great if you can let me know the hyperparameters you used for CIFAR experiments.
Thanks,
Rudy