-
Notifications
You must be signed in to change notification settings - Fork 601
Open
Labels
type: bugSomething isn't workingSomething isn't working
Description
Bug description
I created the labes.json file and the images.
I run the commad:
python references\detection\train.py db_resnet50 --epochs 20 --train_path C:\RBEE\DO\DetectionTrain --val_path C:\RBEE\DO\DetectionValidate --pretrained --name DtectDO --output_dir C:\RBEE\DO\DetectionTrain\models
Each try is immediately finished with error:
(env12) C:\Work\doctr2\doctr>python references\detection\train.py db_resnet50 --epochs 20 --train_path C:\RBEE\DO\DetectionTrain --val_path C:\RBEE\DO\DetectionValidate --pretrained --name DtectDO --output_dir C:\RBEE\DO\DetectionTrain\models
Namespace(backend='nccl', device=None, arch='db_resnet50', output_dir='C:\\RBEE\\DO\\DetectionTrain\\models', train_path='C:\\RBEE\\DO\\DetectionTrain', val_path='C:\\RBEE\\DO\\DetectionValidate', name='DtectDO', epochs=20, batch_size=2, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, clearml=False, push_to_hub=False, pretrained=True, rotation=False, eval_straight=False, optim='adam', sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.01304s (78 samples in 39 batches)
Train set loaded in 0.002001s (8 samples in 4 batches)
0%| | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Work\doctr2\doctr\references\detection\train.py", line 650, in <module>
main(args)
File "C:\Work\doctr2\doctr\references\detection\train.py", line 521, in main
train_loss, actual_lr = fit_one_epoch(
^^^^^^^^^^^^^^
File "C:\Work\doctr2\doctr\references\detection\train.py", line 115, in fit_one_epoch
pbar = tqdm(train_loader, dynamic_ncols=True, disable=(rank != 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\tqdm\asyncio.py", line 33, in __init__
self.iterable_iterator = iter(iterable)
^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 494, in __iter__
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 427, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 1172, in __init__
w.start()
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\context.py", line 337, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\popen_spawn_win32.py", line 95, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.<lambda>'
0%| | 0/4 [00:00<?, ?it/s]
(env12) C:\Work\doctr2\doctr>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input
Code snippet to reproduce the bug
I created the labes.json file and the images.
I run the commad:
python references\detection\train.py db_resnet50 --epochs 20 --train_path C:\RBEE\DO\DetectionTrain --val_path C:\RBEE\DO\DetectionValidate --pretrained --name DtectDO --output_dir C:\RBEE\DO\DetectionTrain\models
Error traceback
(env12) C:\Work\doctr2\doctr>python references\detection\train.py db_resnet50 --epochs 20 --train_path C:\RBEE\DO\DetectionTrain --val_path C:\RBEE\DO\DetectionValidate --pretrained --name DtectDO --output_dir C:\RBEE\DO\DetectionTrain\models
Namespace(backend='nccl', device=None, arch='db_resnet50', output_dir='C:\\RBEE\\DO\\DetectionTrain\\models', train_path='C:\\RBEE\\DO\\DetectionTrain', val_path='C:\\RBEE\\DO\\DetectionValidate', name='DtectDO', epochs=20, batch_size=2, save_interval_epoch=False, input_size=1024, lr=0.001, weight_decay=0, workers=None, resume=None, test_only=False, freeze_backbone=False, show_samples=False, wb=False, clearml=False, push_to_hub=False, pretrained=True, rotation=False, eval_straight=False, optim='adam', sched='poly', amp=False, find_lr=False, early_stop=False, early_stop_epochs=5, early_stop_delta=0.01)
Validation set loaded in 0.01304s (78 samples in 39 batches)
Train set loaded in 0.002001s (8 samples in 4 batches)
0%| | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Work\doctr2\doctr\references\detection\train.py", line 650, in <module>
main(args)
File "C:\Work\doctr2\doctr\references\detection\train.py", line 521, in main
train_loss, actual_lr = fit_one_epoch(
^^^^^^^^^^^^^^
File "C:\Work\doctr2\doctr\references\detection\train.py", line 115, in fit_one_epoch
pbar = tqdm(train_loader, dynamic_ncols=True, disable=(rank != 0))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\tqdm\asyncio.py", line 33, in __init__
self.iterable_iterator = iter(iterable)
^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 494, in __iter__
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 427, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Work\doctr2\env12\Lib\site-packages\torch\utils\data\dataloader.py", line 1172, in __init__
w.start()
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\context.py", line 337, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\popen_spawn_win32.py", line 95, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'main.<locals>.<lambda>'
0%| | 0/4 [00:00<?, ?it/s]
(env12) C:\Work\doctr2\doctr>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Arhat\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\spawn.py", line 132, in _main
self = reduction.pickle.load(from_parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E
[labels.json](https://github.com/user-attachments/files/22547178/labels.json)
OFError: Ran out of input
Environment
Under windows 11
Collecting environment information...
DocTR version: 1.0.1a0
PyTorch version: 2.8.0+cu129 (torchvision 0.23.0+cu129)
OpenCV version: 4.12.0
OS: Microsoft Windows 11 Pro
Python version: 3.12.4
Is CUDA available (PyTorch): Yes
CUDA runtime version: 12.9.41
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Metadata
Metadata
Assignees
Labels
type: bugSomething isn't workingSomething isn't working