Description
Prerequisite
- I have searched Issues and Discussions but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version (master) or latest version (1.x).
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmrotate
Environment
sys.platform: linux
Python: 3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.64
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0
OpenCV: 4.10.0
MMCV: 1.7.2
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMRotate: 0.3.4+
Reproduces the problem - code sample
Copyright (c) OpenMMLab. All rights reserved.
import argparse
import os
import os.path as osp
import time
import warnings
import mmcv
import torch
from mmcv import Config, DictAction
from mmcv.cnn import fuse_conv_bn
from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
wrap_fp16_model)
from mmdet.apis import multi_gpu_test, single_gpu_test
from mmdet.datasets import build_dataloader, replace_ImageToTensor
from mmrotate.datasets import build_dataset
from mmrotate.models import build_detector
from mmrotate.utils import (build_ddp, build_dp, compat_cfg, get_device,
setup_multi_processes)
def parse_args():
"""Parse parameters."""
parser = argparse.ArgumentParser(
description='MMDet test (and eval) a model')
parser.add_argument('--config', default='/media/kou/Data3/shy/PKINet/work-dir/train/oriented_rcnn_r50_fpn_1x_dota_le90/oriented_rcnn_r50_fpn_1x_dota_le90.py', help='test config file path')
parser.add_argument('--checkpoint', default='/media/kou/Data3/shy/PKINet/work-dir/train/oriented_rcnn_r50_fpn_1x_dota_le90/epoch_12.pth', help='checkpoint file')
parser.add_argument(
'--work-dir',
default='/media/kou/Data3/shy/PKINet/work-dir/test/oriented_rcnn_r50_fpn_1x_dota_le90',
help='the directory to save the file containing evaluation metrics')
parser.add_argument('--out', help='output result file in pickle format')
parser.add_argument(
'--fuse-conv-bn',
action='store_true',
help='Whether to fuse conv and bn, this will slightly increase'
'the inference speed')
parser.add_argument(
'--gpu-ids',
type=int,
nargs='+',
help='ids of gpus to use '
'(only applicable to non-distributed testing)')
parser.add_argument(
'--format-only',
action='store_true',
help='Format the output results without perform evaluation. It is'
'useful when you want to format the result to a specific format and '
'submit it to the test server')
parser.add_argument(
'--eval',
type=str,
nargs='+',
default='mAP',
help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
parser.add_argument('--show', action='store_true', help='show results')
parser.add_argument(
'--show-dir', help='directory where painted images will be saved')
parser.add_argument(
'--show-score-thr',
type=float,
default=0.3,
help='score threshold (default: 0.3)')
parser.add_argument(
'--gpu-collect',
action='store_true',
help='whether to use gpu to collect results.')
parser.add_argument(
'--tmpdir',
help='tmp directory used for collecting results from multiple '
'workers, available when gpu-collect is not specified')
parser.add_argument(
'--cfg-options',
nargs='+',
action=DictAction,
help='override some settings in the used config, the key-value pair '
'in xxx=yyy format will be merged into config file. If the value to '
'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
'Note that the quotation marks are necessary and that no white space '
'is allowed.')
parser.add_argument(
'--eval-options',
nargs='+',
action=DictAction,
help='custom options for evaluation, the key-value pair in xxx=yyy '
'format will be kwargs for dataset.evaluate() function')
parser.add_argument(
'--launcher',
choices=['none', 'pytorch', 'slurm', 'mpi'],
default='none',
help='job launcher')
parser.add_argument('--local_rank', type=int, default=0)
args = parser.parse_args()
if 'LOCAL_RANK' not in os.environ:
os.environ['LOCAL_RANK'] = str(args.local_rank)
return args
def main():
args = parse_args()
assert args.out or args.eval or args.format_only or args.show \
or args.show_dir, \
('Please specify at least one operation (save/eval/format/show the '
'results / save the results) with the argument "--out", "--eval"'
', "--format-only", "--show" or "--show-dir"')
if args.eval and args.format_only:
raise ValueError('--eval and --format_only cannot be both specified')
if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
raise ValueError('The output file must be a pkl file.')
cfg = Config.fromfile(args.config)
if args.cfg_options is not None:
cfg.merge_from_dict(args.cfg_options)
cfg = compat_cfg(cfg)
if args.format_only and cfg.mp_start_method != 'spawn':
warnings.warn(
'`mp_start_method` in `cfg` is set to `spawn` to use CUDA '
'with multiprocessing when formatting output result.')
cfg.mp_start_method = 'spawn'
# set multi-process settings
setup_multi_processes(cfg)
# set cudnn_benchmark
if cfg.get('cudnn_benchmark', False):
torch.backends.cudnn.benchmark = True
cfg.model.pretrained = None
if cfg.model.get('neck'):
if isinstance(cfg.model.neck, list):
for neck_cfg in cfg.model.neck:
if neck_cfg.get('rfp_backbone'):
if neck_cfg.rfp_backbone.get('pretrained'):
neck_cfg.rfp_backbone.pretrained = None
elif cfg.model.neck.get('rfp_backbone'):
if cfg.model.neck.rfp_backbone.get('pretrained'):
cfg.model.neck.rfp_backbone.pretrained = None
if args.gpu_ids is not None:
cfg.gpu_ids = args.gpu_ids
else:
cfg.gpu_ids = range(1)
# init distributed env first, since logger depends on the dist info.
if args.launcher == 'none':
distributed = False
if len(cfg.gpu_ids) > 1:
warnings.warn(
f'We treat {cfg.gpu_ids} as gpu-ids, and reset to '
f'{cfg.gpu_ids[0:1]} as gpu-ids to avoid potential error in '
'non-distribute testing time.')
cfg.gpu_ids = cfg.gpu_ids[0:1]
else:
distributed = True
init_dist(args.launcher, **cfg.dist_params)
test_dataloader_default_args = dict(
samples_per_gpu=1, workers_per_gpu=2, dist=distributed, shuffle=False)
# in case the test dataset is concatenated
if isinstance(cfg.data.test, dict):
cfg.data.test.test_mode = True
if 'samples_per_gpu' in cfg.data.test:
warnings.warn('`samples_per_gpu` in `test` field of '
'data will be deprecated, you should'
' move it to `test_dataloader` field')
test_dataloader_default_args['samples_per_gpu'] = \
cfg.data.test.pop('samples_per_gpu')
if test_dataloader_default_args['samples_per_gpu'] > 1:
# Replace 'ImageToTensor' to 'DefaultFormatBundle'
cfg.data.test.pipeline = replace_ImageToTensor(
cfg.data.test.pipeline)
elif isinstance(cfg.data.test, list):
for ds_cfg in cfg.data.test:
ds_cfg.test_mode = True
if 'samples_per_gpu' in ds_cfg:
warnings.warn('`samples_per_gpu` in `test` field of '
'data will be deprecated, you should'
' move it to `test_dataloader` field')
samples_per_gpu = max(
[ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])
test_dataloader_default_args['samples_per_gpu'] = samples_per_gpu
if samples_per_gpu > 1:
for ds_cfg in cfg.data.test:
ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
test_loader_cfg = {
**test_dataloader_default_args,
**cfg.data.get('test_dataloader', {})
}
rank, _ = get_dist_info()
# allows not to create
if args.work_dir is not None and rank == 0:
mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
json_file = osp.join(args.work_dir, f'eval_{timestamp}.json')
# build the dataloader
dataset = build_dataset(cfg.data.test)
data_loader = build_dataloader(dataset, **test_loader_cfg)
# build the model and load checkpoint
cfg.model.train_cfg = None
cfg.device = get_device()
model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
# fp16 setting
fp16_cfg = cfg.get('fp16', None)
if fp16_cfg is not None or cfg.device == 'npu':
wrap_fp16_model(model)
checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
if args.fuse_conv_bn:
model = fuse_conv_bn(model)
# old versions did not save class info in checkpoints, this walkaround is
# for backward compatibility
if 'CLASSES' in checkpoint.get('meta', {}):
model.CLASSES = checkpoint['meta']['CLASSES']
else:
model.CLASSES = dataset.CLASSES
if not distributed:
model = build_dp(model, cfg.device, device_ids=cfg.gpu_ids)
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
args.show_score_thr)
else:
find_unused_parameters = cfg.get('find_unused_parameters', False)
# Sets the `find_unused_parameters` parameter in
# torch.nn.parallel.DistributedDataParallel
model = build_ddp(
model,
cfg.device,
device_ids=[int(os.environ['LOCAL_RANK'])],
broadcast_buffers=False,
find_unused_parameters=find_unused_parameters)
outputs = multi_gpu_test(model, data_loader, args.tmpdir,
args.gpu_collect)
rank, _ = get_dist_info()
if rank == 0:
if args.out:
print(f'\nwriting results to {args.out}')
mmcv.dump(outputs, args.out)
kwargs = {} if args.eval_options is None else args.eval_options
if args.format_only:
dataset.format_results(outputs, **kwargs)
if args.eval:
eval_kwargs = cfg.get('evaluation', {}).copy()
# hard-code way to remove EvalHook args
for key in [
'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
'rule', 'dynamic_intervals'
]:
eval_kwargs.pop(key, None)
eval_kwargs.update(dict(metric=args.eval, **kwargs))
metric = dataset.evaluate(outputs, **eval_kwargs)
print(metric)
metric_dict = dict(config=args.config, metric=metric)
if args.work_dir is not None and rank == 0:
mmcv.dump(metric_dict, json_file)
if name == 'main':
main()
Reproduces the problem - command or script
python ./tools/train.py --config configs/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90.py --work-dir work-dir/train/oriented_rcnn_r50_fpn_1x_dota_le90 --gpu-ids 1
Reproduces the problem - error message
File "/media/kou/Data3/shy/PKINet/tools/test.py", line 275, in
main()
File "/media/kou/Data3/shy/PKINet/tools/test.py", line 235, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmdet/apis/test.py", line 29, in single_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 51, in forward
return super().forward(*inputs, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
return old_func(*args, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 174, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 147, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/media/kou/Data3/shy/PKINet/mmrotate/models/detectors/two_stage.py", line 179, in simple_test
proposal_list = self.rpn_head.simple_test_rpn(x, img_metas)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmdet/models/dense_heads/dense_test_mixins.py", line 130, in simple_test_rpn
proposal_list = self.get_bboxes(*rpn_outs, img_metas=img_metas)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 208, in new_func
return old_func(*args, **kwargs)
File "/media/kou/Data3/shy/PKINet/mmrotate/models/dense_heads/rotated_rpn_head.py", line 423, in get_bboxes
proposals = self._get_bboxes_single(cls_score_list, bbox_pred_list,
File "/media/kou/Data3/shy/PKINet/mmrotate/models/dense_heads/oriented_rpn_head.py", line 275, in _get_bboxes_single
, keep = batched_nms(hproposals, scores, ids, cfg.nms)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/ops/nms.py", line 350, in batched_nms
dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/utils/misc.py", line 340, in new_func
output = old_func(*args, **kwargs)
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/ops/nms.py", line 175, in nms
inds = NMSop.apply(boxes, scores, iou_threshold, offset, score_threshold,
File "/home/junjie/.conda/envs/pkinet/lib/python3.8/site-packages/mmcv/ops/nms.py", line 28, in forward
inds = ext_module.nms(
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Process finished with exit code 134
Additional information
When I use the gpu0 to run test.py, the code works correctly. When I select any other gpu like gpu1 or gpu2, this error will happen. I have no idea about it.