Description
Prerequisite
- I have searched the existing and past issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
🐞 Describe the bug
Hello,
I am training a yolo v8 from a pretrained backbone. I would like to freeze this backbone and train the head for the first part of the training.
Using only torch, I would use something like this:
# only give the head parameters to the otimizer
optim = optim.SGD(model.head.paramters(), lr=1e-4)
# train with only head being updated
train()
# update the optimizer with all parameters
optim = optim.SGD(model.paramters(), lr=1e-4)
Is it possible to do something similar using mmcv?
From what I found in the code, the DefaultOptimizerConstructor is called, and we can pass specific hyperparameters for each layer of the model, jsut like this exemple given in the torch documentation:
optim.SGD([
{'params': model.base.parameters(), 'lr': 1e-2},
{'params': model.classifier.parameters()}
], lr=1e-3, momentum=0.9)
Also the exemple 2 given in the docstring of the function mentions that we can give specific hyperparameters for the backbone, but not if we can remove it from the parameters to update.
The main goal is to speed up the training since by not needing to go through the backbone, it should only update a tiny fraction of the whole model. Until now, I froze my backbone's parameters only by setting all requires_grad attribute of all parameters to False. But this still makes the optimizer go throught the whole model, hence taking the same time as without freezing the backbone.
Thanks for the help !
Maxime
Environment
sys.platform: linux
Python: 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA RTX A4000
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 11.5, V11.5.119
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 1.11.0+cu115
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.5
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.3.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.5, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0+cu102
OpenCV: 4.9.0
MMEngine: 0.10.3
MMCV: 2.0.1
MMDetection: 3.3.0
MMYOLO: 0.6.0+9c72cea
Additional information
No response
Activity