Skip to content

Custom Dataset Low Map and No Convergence(with images!) #313

@ghost

Description

I am using custom data. I have modified the callbacks in train.py so that it is now monitoring validation loss from the validation set, as opposed to the training set. Original learning rate was 1e-4
`def create_callbacks(saved_weights_name, tensorboard_logs, model_to_save):
makedirs(tensorboard_logs)

early_stop = EarlyStopping(
    monitor     = 'val_loss', 
    min_delta   = 0.001, 
    patience    = 50, 
    mode        = 'min', 
    verbose     = 1
)
checkpoint = CustomModelCheckpoint(
    model_to_save   = model_to_save,
    filepath        = saved_weights_name,# + '{epoch:02d}.h5', 
    monitor         = 'val_loss', 
    verbose         = 1, 
    save_best_only  = True, 
    mode            = 'min', 
    period          = 1
)
reduce_on_plateau = ReduceLROnPlateau(
    monitor  = 'val_loss',
    factor   = 0.1,
    patience = 10,
    verbose  = 1,
    mode     = 'min',
    epsilon  = 0.01,
    cooldown = 0,
)
tensorboard = CustomTensorBoard(
    log_dir                = tensorboard_logs,
    write_graph            = True,
    write_images           = True,
)    
return [early_stop, checkpoint, reduce_on_plateau, tensorboard]`
history = train_model.fit_generator(
          generator        = train_generator, 
          validation_data = valid_generator,
          steps_per_epoch  = len(train_generator) * config['train']['train_times'], 
          epochs           = config['train']['nb_epochs'] + config['train']['warmup_epochs'], 
          verbose          = 2 if config['train']['debug'] else 1,
          callbacks        = callbacks, 
          workers          = 4,
          max_queue_size   = 8
      )

Here is a plot of loss, blue being validation loss and red being training loss:
loss

And here is a sample image of my dataset:

sample_img

where same numbers are top left/bottom right of a bounding box. This example was pretty messy, but other people who have used this dataset claim to reach 70%+ map on a default single shot detector with no modifications. My maximum was 0.2. I am using the default yolov3 and haven't made any configurations.
Here is my config:
`{
"model" : {
"min_input_size": 352,
"max_input_size": 352,
"anchors": [13,58, 18,14, 30,31, 47,55, 50,128, 75,24, 95,69, 126,121, 161,208],
"labels": ["nlb"]
},

"train": {
    "train_image_folder":   "/content/drive/MyDrive/yolo/nlb_train_image/",
    "train_annot_folder":   "/content/drive/MyDrive/yolo/nlb_train_annot/",
      "cache_name":		"nlb_train.pkl",
      "pretrained_weights":   "",

    "train_times":          1,
    "batch_size":           16,
    "learning_rate":        1e-4,
    "nb_epochs":            200,
    "warmup_epochs":        3,
    "ignore_thresh":        0.4,
    "gpus":                 "0",

    "grid_scales":          [1,1,1],
    "obj_scale":            5,
    "noobj_scale":          1,
    "xywh_scale":           1,
    "class_scale":          1,

    "tensorboard_dir":      "logs",
    "saved_weights_name":   "nlb.h5",
    "debug":                true
},

"valid": {
    "valid_image_folder":   "/content/drive/MyDrive/yolo/nlb_valid_image/",
    "valid_annot_folder":   "/content/drive/MyDrive/yolo/nlb_valid_image/",
    "cache_name":		"nlb_valid.pkl",

    "valid_times":          1
}

}
`

Data augmented from 612 -> 7140 for training set, valid_set = 152, test_set = 141. I have played around with increasing validation set but still get poor results. I changed initial learning rate from 1e-3 to 1e-5 but again, poor results. If anyone is willing to test it themselves, here are links to the dataset:

train images
https://drive.google.com/drive/folders/1qmkCFNNuAtsOtFzQpmEgZyaN-DeSOuyr?usp=sharing
train annot
https://drive.google.com/drive/folders/1-5SZAcuOXz1Rt_eZ19RtVvgMXr8aId6j?usp=sharing
valid images
https://drive.google.com/drive/folders/1Xxz28iVaxXMIDA_UVQPKOAs0KfU4KtVT?usp=sharing
valid annot
https://drive.google.com/drive/folders/1Ml7DjEBwS50eUaMO378zYGdZTQsdfUNB?usp=sharing
test images
https://drive.google.com/drive/folders/1tEiW-Xx74kWua810sR5e0OyWvPABhJ1z?usp=sharing
test annot
https://drive.google.com/drive/folders/12jvZpb06tRopSpYLUSoiONOlvBWPV1ud?usp=sharing
train.pkl file
https://drive.google.com/file/d/1-5a1JGGYr6xP8ydCJOBOfhTPOym93u62/view?usp=sharing
valid.pkl file
https://drive.google.com/file/d/1-86kPCtYQKmYVzxyFZfQoNIf03_YdwDA/view?usp=sharing

I have spent an embarrassing amount of time trying to solve this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions