-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).
Environment
OrderedDict([('sys.platform', 'linux'), ('Python', '3.9.23 (main, Jun 5 2025, 13:40:20) [GCC 11.2.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3090'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 12.1, V12.1.105'), ('GCC', 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0'), ('PyTorch', '2.1.2+cu121'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 12.1\n - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n - CuDNN 8.9.2\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.16.2+cu121'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.7'), ('MMPose', '1.3.2+')])
mmcv 2.1.0
mmdet 3.3.0
mmengine 0.10.7
mmpose 1.3.2 /root/autodl-tmp/mmpose-main
mmpretrain 1.2.0
Reproduces the problem - code sample
I reproduced ViTPose-S on a single RTX 3090 without modifying any code. https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py
Reproduces the problem - command or script
python tools/train.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py
Reproduces the problem - error message
Table 1: Training Stage (snapshot at the 50-th iter of each epoch)
| epoch | source | loss ‰ | acc_pose ‰ | lr (×10⁻⁵) | grad_norm ‰ |
|---|---|---|---|---|---|
| 20 | official | 0.8 | 71.8 | 12.7 | 2.0 |
| 20 | mine | 0.86 | 71.8 | 2.75 | 3.0 |
| 30 | official | 0.8 | 74.1 | 12.7 | 1.9 |
| 30 | mine | 0.83 | 77.5 | 2.75 | 2.8 |
| 40 | official | 0.8 | 75.8 | 12.7 | 1.8 |
| 40 | mine | 0.81 | 73.4 | 2.75 | 3.0 |
| 50 | official | 0.7 | 76.4 | 12.7 | 1.8 |
| 50 | mine | 0.81 | 68.3 | 2.75 | 2.9 |
| 60 | official | 0.7 | 76.7 | 12.7 | 1.6 |
| 60 | mine | 0.79 | 72.9 | 2.75 | 3.0 |
| 70 | official | 0.7 | 77.3 | 12.7 | 1.3 |
| 70 | mine | 0.79 | 71.7 | 2.75 | 3.0 |
Table 2: Validation Stage (end-of-epoch COCO metrics)
| epoch | source | AP | AP.5 | AP.75 | AP(M) | AP(L) | AR | AR.5 | AR.75 | AR(M) | AR(L) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | official | 65.1 | 86.9 | 72.5 | 58.2 | 66.7 | 71.4 | 91.5 | 78.2 | 67.2 | 77.3 |
| 20 | mine | 65.1 | 87.0 | 73.0 | 62.1 | 71.0 | 71.3 | 91.6 | 78.3 | 67.0 | 77.5 |
| 30 | official | 66.3 | 87.6 | 73.7 | 59.1 | 68.4 | 72.5 | 92.1 | 79.2 | 68.1 | 78.8 |
| 30 | mine | 66.4 | 87.5 | 74.1 | 63.4 | 72.2 | 72.3 | 91.8 | 79.3 | 68.1 | 78.3 |
| 40 | official | 67.7 | 88.0 | 75.1 | 60.9 | 69.6 | 73.8 | 92.3 | 80.5 | 69.7 | 79.7 |
| 40 | mine | 66.4 | 87.3 | 74.3 | 63.6 | 72.4 | 72.8 | 91.8 | 80.1 | 68.7 | 78.8 |
| 50 | official | 68.4 | 88.2 | 76.1 | 61.5 | 70.4 | 74.4 | 92.5 | 81.5 | 70.4 | 80.2 |
| 50 | mine | 67.1 | 87.5 | 74.9 | 64.2 | 72.9 | 73.3 | 92.0 | 80.3 | 69.1 | 79.2 |
| 60 | official | 69.1 | 88.3 | 76.9 | 62.3 | 70.7 | 75.1 | 92.6 | 82.0 | 71.2 | 80.7 |
| 60 | mine | 67.5 | 88.0 | 75.4 | 64.9 | 73.0 | 73.5 | 92.1 | 80.7 | 69.7 | 79.0 |
| 70 | official | 69.5 | 88.7 | 77.5 | 62.6 | 71.6 | 75.4 | 92.9 | 82.4 | 71.3 | 81.2 |
| 70 | mine | 67.6 | 87.9 | 75.5 | 64.7 | 73.4 | 73.7 | 92.3 | 81.0 | 69.6 | 79.5 |
Is this normal? The gap is widening, need help!
Additional information
No response