Skip to content

C. Training Procedure

Usman Zahidi edited this page Mar 24, 2025 · 38 revisions

Training Procedure

Training Package and Model Selection

The "FruitDetector" module employs Mask-RCNN to predict the classified masks overlapping a fruit in a given image. Mask-RCNN is proposed and developed by a team at Facebook AI Research (FAIR) as an extension to Faster-RCNN as an instance segmentation tool. There are several packages that help in training, prediction and evaluation of Mask-RCNN models such as torchvision, mmdetection and detectron2.

We selected detectron2 over others because it comes from the originator and is well-maintained. There are several pretrained and baseline models available for detectron2 at [1]. The pretrained models are trained on selection of datasets that are native to the detectron2 package. If a new dataset has to be trained, then it should be added as a custom dataset and should be trained on a defined pretrained dataset.

Training Procedure

FruitDetector Overview

The FruitDetector module has two modes of execution that primarily varies according to the required output. If the prediction results visualisation is required together with its COCO json file, then it is executed in debug mode. If low-latency execution with in-memory json message is required then it runs in optimized mode. The module include training, prediction and evaluation options which is mainly controlled through the configuration file.

Fruitdetector Overview

FruitDetector Overview: FruitDetector components and execution overview.

Installation

The installation of dependencies are defined in the requirements file available at [2]. The installation of packages is performed by a single command;

pip install -r fd_only_requirements.txt
Hardware requirements

The minimum requirement for host PC with x64 architecture is 16 GB RAM and a GPU capable of running CUDA 10. We configured the FPN101 Resnet pre-trained model, which is limited with a minimum GPU memory requirement of 5.2 GB.

_BASE_: "../Base-RetinaNet.yaml"
MODEL:
  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
  RESNETS:
    DEPTH: 101
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 270000

Annotation

The annotation is performed by creating labeled masks on each fruit. We have two different types of annotations such as 1. Fruit Only 2. Ripeness categories (ripe, unripe). Several tools available for annotations such as labelbox, V7 Darwin that are paid platforms, contrarily CVAT is Intel's free annotation platform. A screenshot of the CVAT annotation is shown in the example below. The example shows image and metadata info along with the mask coordinates under the segmentation section.

CVAT screenshot

CVAT Annotation environment.

The annotations are exported to COCO 1.0 format, readable in detectron2. Parts of annotations are shown here for illustration.

"info": {
    "description": "Exported from AOC_Json_Exporter",
    "url": "https://www.lincoln.ac.uk/home/liat/",
    "version": "1.0",
    "year": 2021,
    "contributor": "Lincoln Institute of Agri-food Technology",
    "date_created": "2024-11-16 15:02:16.135036"
  },
  "licenses": [
    {
      "url": "https://www.lincoln.ac.uk/home/liat/",
      "id": "1",
      "name": "placeholder license"
    }
  ],
  "images": [
    {
      "license": 0,
      "file_name": "20231128-150802.jpg",
      "coco_url": "",
      "height": 1080,
      "width": 1920,
      "date_captured": "",
      "flickr_url": "n/a",
      "darwin_url": "",
      "darwin_workview_url": "",
      "id": 1
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "segmentation": [
        [
          1583.0,
          545.5,
          1582.0,
          545.5,
          1581.0,
          545.5

If the annotation is performed as labeled masks only, then a package like MaskToCOCOJson available at [3] could be used to convert it to JSON format.

Configuration

After annotation, all image subsets and their respective annotation files should be placed in three folders (train, test,val). As the user-defined dataset is non-native and custom for the detectron2 base, therefore its registration is required. Five different types of configurations are described below;

Datasets

This category comprises the user-defined train and test dataset names and download URLs for train and test datasets if unavailable in the file directories. If under the settings category, download_assets is set to true; then all these datasets will be downloaded to the data directory before the start of the training.

Files

The files category has multiple file settings, such as defining the "pretrained_model_file" entry, which trains on top of the user-defined pre-trained model. The "model_file" entry has the model file name used for prediction. If one is training a model, it is recommended to put a blank "model_file" entry as the output model from training would be placed here. If the pre-trained model setting is the empty string, then the base Resnet 101 is used; this ResNet model is defined in the "config_file" entry. The "train_metadata_catalog_file" and "test_metadata_catalog_file" entries define paths to the metadata catalog file of detectron2. The files save class names, dataset information, and colour descriptions for annotation. The file name in these entries will be created during training. The train and test datasets' annotations are in "train_annotation_file" and "test_annotation_file" entries, respectively.

Directories

The directories with images for datasets are defined in "train_image_dir" and "test_image_dir" entries. The "training_output_dir" is the directory that holds all iterative outputs saved after 5000 iterations and other statistical measures used for evaluation purposes. The "prediction_output_dir" is the directory where annotated prediction images are saved. The "prediction_json_dir" is the path where all predicted JSON files are saved. Both these entries are enabled only in the debug mode.

Training

The configuration related to training such as number of iteration, class labels and learning rate are defined in "epochs", "number_of_classes", and "learning_rate" entries. The user may select between SGD or Adam in the "optimizer" entry.

Settings

The settings category is for general execution options, such as "download_assets" is enabled when pre-trained models are to be downloaded before the training process starts automatically from the module. The "segm_masks_only" and "bbox" are for annotation output settings to set segmentation masks or include the bounding box. The module outputs the orientation of fruits by utilizing the PCA method.

datasets:
  train_dataset_name: 'aoc_train_dataset'
  test_dataset_name: 'aoc_test_dataset'
  validation_dataset_name: 'aoc_validation_dataset'
  dataset_train_annotation_url: 'https://lncn.ac/aocanntrain' 
  dataset_train_images_url: 'https://lncn.ac/aocdatatrain'
  dataset_test_annotation_url: 'https://lncn.ac/aocanntest' 
  dataset_test_images_url: 'https://lncn.ac/aocdatatest'
files:
  # pretrained model is used as a training base model; if set as empty, the config file will use the imagenet trained model as a base.
  pretrained_model_file: ''
  # model_file: './model/aoc_tomato_ripeness_151_40k.pth'
  model_file: './model/aoc_strawberry_class_ripeness.pth' #'./model/aoc_model.pth'
  config_file: 'COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml'
  test_metadata_catalog_file: './data/dataset_catalogs/tom_test_metadata_catalog.pkl'
  train_dataset_catalog_file: './data/dataset_catalogs/tom_train_dataset_catalog.pkl'
  train_annotation_file: './data/tomato_dataset/train/annotations/ripeness_class_annotations.json'
  test_annotation_file: './data/tomato_dataset/test/annotations/ripeness_class_annotations.json'
  validation_annotation_file: './data/tomato_dataset/val/annotations/ripeness_class_annotations.json'
  model_url: 'https://lncn.ac/aocmodel'
  meta_catalog_url: 'https://lncn.ac/aocmeta'
  train_catalog_url: 'https://lncn.ac/aoccat'
directories:
  train_image_dir: './data/strawberry_dataset/train/'
  test_image_dir: './data/bag/rgbd/' #'./data/strawberry_dataset/test/'
  validation_image_dir: './data/tomato_dataset/val/'
  training_output_dir: './data/training_output/'
  prediction_output_dir: './data/prediction_output/test_images/'
  prediction_json_dir: './data/annotations/predicted/' 
training:
  epochs: 40000
  number_of_classes: 2
  optimizer: 'SGD'
  learning_rate: 0.0025
settings:
  download_assets: false # if assets such as model and datasets should be downloaded
  rename_pred_images: false #rename the predicted images in img_000001.png like format
  segm_masks: true
  bbox: false
  show_orientation: true
  fruit_type: 'strawberry' # ONLY required for fruit orientations. Currently supported for "strawberry" or "tomato"
  validation_period: 500 # Smaller validation will increase training time. The value is set to have 100 validations during training

Training and Evaluation

All folders above should be registered as a separate dataset with unique names. In the following example they are "aoc_train_dataset" and "aoc_test_dataset". The dataset should represent the image folder and the annotation file. The directory associated with the dataset is defined in the "train_image_dir" under the "directories" category of the configuration file. The annotation, a JSON file, is described in the "train_annotation_file" entry. These three entries would make a custom dataset for aoc_train_dataset in our example. The configuration file controls the module, whose entries are explained in the configuration section.

Calls to trainer and predictor are illustrated in call_predictor and call_trainer functions in predictor.py file, which serves as the starting point.

python predictor.py

The detectron_predictor.py script also evaluates the predictions.

Validation

The validation during training is configured by parameter entries such as,

  1. validation_dataset_name
  2. validation_image_dir
  3. validation_annotation_file
  4. validation_period

The first three parameters set the resource name and paths, and the "validation_period" sets the training epochs, after which validation will occur. If validation is not sought, then "validation_period" could be disabled by setting it to more than the "epochs".

Output

There are three types of output from prediction;

  1. masks
  2. Confidence in prediction
  3. Centroid and Orientation of fruit

The mask output for both strawberry and tomato models are shown in Figures;

Output masks

Predicted masks outline for strawberries for fruit only class

Output masks

Predicted masks for tomatoes with bounding boxes and ripeness classes

The predicted JSON file is a non-standard COCO-JSON, as it includes confidence in the prediction of percentage and the centroid and orientation of fruit for each mask.

Fruit Orientation

The fruit orientation angle and centroid are also included in the annotation of each segmentation mask in the JSON message and file. The key for orientation information is "Orientation" in the JSON output. In debug mode, the vectors between which the angles are calculated are annotated on predicted images with text annotation, as shown in the figure.

Output masks

Strawberry: Orientation vector (in yellow), y-axis as reference vector (in blue)

Output masks

Tomato: Orientation vector (in yellow), y-axis as reference vector (in blue)

References

[1] Detectron2, Facebook Artificial Intelligence Research Team. Model zoo for detectron2 package, https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md?plain=1
[2] Package dependencies for FruitDetector, https://github.com/LCAS/aoc_fruit_detector/blob/main/scripts/fd_only_requirements.txt
[3] Mask to COCO JSON converter, https://github.com/usmanzahidi/MaskToCOCOJson