-
Notifications
You must be signed in to change notification settings - Fork 247
Description
Training with my own dataset appear error:
2024-04-16 18:56:22 - DEBUG - Training epoch 0 with 0 samples
File "/home/hyq/anaconda3/envs/cvnets/bin/cvnets-train", line 8, in
sys.exit(main_worker())
File "/home/hyq/文档/ml-cvnets/main_train.py", line 235, in main_worker
main(opts=opts, **kwargs)
File "/home/hyq/anaconda3/envs/cvnets/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/hyq/文档/ml-cvnets/main_train.py", line 174, in main
training_engine.run(train_sampler=train_sampler)
File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 606, in run
train_loss, train_ckpt_metric = self.train_epoch(epoch)
File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 357, in train_epoch
avg_loss = train_stats.avg_statistics(
File "/home/hyq/文档/ml-cvnets/metrics/stats.py", line 148, in avg_statistics
logger.error(
File "/home/hyq/文档/ml-cvnets/utils/logger.py", line 46, in error
traceback.print_stack()
2024-04-16 18:56:22 - LOGS - Training took 00:00:02.11
2024-04-16 18:56:22 - ERROR - total_loss not present in the dictionary. Available keys are: []. Exiting!!!
train to use:cvnets-train --common.config-file /home/hyq/下载/pspnet-mobilevitv2-1.0.yaml --common.results-loc segmentation_results
pspnet-mobilevitv2-1.0.yaml:
common:
run_label: "run_1"
accum_freq: 1
accum_after_epoch: -1
log_freq: 200
auto_resume: false
mixed_precision: true
grad_clip: 10.0
dataset:
root_train: "/media/hyq/西部数据2TB/ml-cvnets_data/"
root_val: "/media/hyq/西部数据2TB/ml-cvnets_data/"
name: "ade20k1"
category: "segmentation"
train_batch_size0: 4 # effective batch size is 16 ( 4 * 4 GPUs)
val_batch_size0: 4
eval_batch_size0: 1
workers: 4
persistent_workers: false
pin_memory: false
image_augmentation:
random_crop:
enable: true
seg_class_max_ratio: 0.75
pad_if_needed: true
mask_fill: 0 # background idx is 0
random_horizontal_flip:
enable: true
resize:
enable: true
size: [512, 512]
interpolation: "bicubic"
random_short_size_resize:
enable: true
interpolation: "bicubic"
short_side_min: 256
short_side_max: 768
max_img_dim: 1024
photo_metric_distort:
enable: true
random_rotate:
enable: true
angle: 10
mask_fill: 0 # background idx is 0
random_gaussian_noise:
enable: true
sampler:
name: "batch_sampler"
bs:
crop_size_width: 512
crop_size_height: 512
loss:
category: "segmentation"
ignore_idx: -1
segmentation:
name: "cross_entropy"
cross_entropy:
aux_weight: 0.4
optim:
name: "sgd"
weight_decay: 1.e-4
no_decay_bn_filter_bias: true
sgd:
momentum: 0.9
scheduler:
name: "cosine"
is_iteration_based: false
max_epochs: 120
cosine:
max_lr: 0.02
min_lr: 0.0002
model:
segmentation:
name: "encoder_decoder"
lr_multiplier: 1
seg_head: "pspnet"
output_stride: 8
use_aux_head: true
activation:
name: "relu"
pspnet:
psp_dropout: 0.1
psp_out_channels: 512
psp_pool_sizes: [ 1, 2, 3, 6 ]
classification:
name: "mobilevit_v2"
mitv2:
width_multiplier: 1.0
attn_norm_layer: "layer_norm_2d"
activation:
name: "swish"
normalization:
name: "sync_batch_norm"
momentum: 0.1
activation:
name: "swish"
inplace: false
layer:
global_pool: "mean"
conv_init: "kaiming_uniform"
linear_init: "normal"
ema:
enable: true
momentum: 0.0005
stats:
val: [ "loss", "iou" ]
train: [ "loss", "grad_norm" ]
checkpoint_metric: "iou"
checkpoint_metric_max: true