Skip to content

[Bug] Unable to reproduce the results of VITPOSE-S. #3254

@909009-yqx

Description

@909009-yqx

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.9.23 (main, Jun 5 2025, 13:40:20) [GCC 11.2.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3090'), ('CUDA_HOME', '/usr/local/cuda'), ('NVCC', 'Cuda compilation tools, release 12.1, V12.1.105'), ('GCC', 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0'), ('PyTorch', '2.1.2+cu121'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 12.1\n - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n - CuDNN 8.9.2\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.16.2+cu121'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.7'), ('MMPose', '1.3.2+')])

mmcv 2.1.0
mmdet 3.3.0
mmengine 0.10.7
mmpose 1.3.2 /root/autodl-tmp/mmpose-main
mmpretrain 1.2.0

Reproduces the problem - code sample

I reproduced ViTPose-S on a single RTX 3090 without modifying any code. https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py

Reproduces the problem - command or script

python tools/train.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_ViTPose-small_8xb64-210e_coco-256x192.py

Reproduces the problem - error message

Table 1: Training Stage (snapshot at the 50-th iter of each epoch)

epoch source loss ‰ acc_pose ‰ lr (×10⁻⁵) grad_norm ‰
20 official 0.8 71.8 12.7 2.0
20 mine 0.86 71.8 2.75 3.0
30 official 0.8 74.1 12.7 1.9
30 mine 0.83 77.5 2.75 2.8
40 official 0.8 75.8 12.7 1.8
40 mine 0.81 73.4 2.75 3.0
50 official 0.7 76.4 12.7 1.8
50 mine 0.81 68.3 2.75 2.9
60 official 0.7 76.7 12.7 1.6
60 mine 0.79 72.9 2.75 3.0
70 official 0.7 77.3 12.7 1.3
70 mine 0.79 71.7 2.75 3.0

Table 2: Validation Stage (end-of-epoch COCO metrics)

epoch source AP AP.5 AP.75 AP(M) AP(L) AR AR.5 AR.75 AR(M) AR(L)
20 official 65.1 86.9 72.5 58.2 66.7 71.4 91.5 78.2 67.2 77.3
20 mine 65.1 87.0 73.0 62.1 71.0 71.3 91.6 78.3 67.0 77.5
30 official 66.3 87.6 73.7 59.1 68.4 72.5 92.1 79.2 68.1 78.8
30 mine 66.4 87.5 74.1 63.4 72.2 72.3 91.8 79.3 68.1 78.3
40 official 67.7 88.0 75.1 60.9 69.6 73.8 92.3 80.5 69.7 79.7
40 mine 66.4 87.3 74.3 63.6 72.4 72.8 91.8 80.1 68.7 78.8
50 official 68.4 88.2 76.1 61.5 70.4 74.4 92.5 81.5 70.4 80.2
50 mine 67.1 87.5 74.9 64.2 72.9 73.3 92.0 80.3 69.1 79.2
60 official 69.1 88.3 76.9 62.3 70.7 75.1 92.6 82.0 71.2 80.7
60 mine 67.5 88.0 75.4 64.9 73.0 73.5 92.1 80.7 69.7 79.0
70 official 69.5 88.7 77.5 62.6 71.6 75.4 92.9 82.4 71.3 81.2
70 mine 67.6 87.9 75.5 64.7 73.4 73.7 92.3 81.0 69.6 79.5

Is this normal? The gap is widening, need help!

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions