Skip to content

Latest commit

 

History

History
804 lines (605 loc) · 31 KB

File metadata and controls

804 lines (605 loc) · 31 KB

DeepBench: Overview

DeepBench is a command-line interface tool that performs systematic evaluation of image classification models by:

  1. Loading original images
  2. Applying various augmentation methods to create modified versions
  3. Classifying both original and augmented images using selected models
  4. Storing the results in a database for analysis

This approach allows researchers and developers to assess how well different models perform under various image transformations and distortions, providing insights into model robustness and reliability. Deepbench is supposed to be run on a gpu cluster or similar.

Table of Contents

  1. Features
  2. Project TAHAI
  3. Requirements
  4. Getting Started
  5. Setup
  6. Augmentation Methods
  7. Running Benchmarks
  8. CLI Arguments
  9. Supported Models
  10. Results and Output
  11. Adding New Models
  12. Contributing
  13. Trouble Shooting
  14. License
  15. More Information on Project TAHAI
  16. Authors and Acknowledgment

Features

  • Comprehensive Model Support: Test a wide range of image classification models from Hugging Face or Ollama API including:

    • Vision Language Models (Multi modal models), Traditional CNN models and Vision Transformer models
  • Extensive Augmentation Library: Apply various image transformations to test model robustness:

    • Basic transformations (brightness, contrast, rotation, flips)
    • Noise additions (Gaussian noise, salt & pepper)
    • Blur effects (Gaussian blur, motion blur)
    • Weather simulations (rain, clouds, shadows)
    • Geometric transformations (perspective, grid distortion)
  • Flexible Configuration: Customize benchmarks through TOML configuration files:

    • Select models and image scaling parameters
    • Configure augmentation methods and parameters
    • Define experiment names and output settings
  • Systematic Testing: Evaluate models with:

    • Individual augmentations
    • Ramp testing (incrementally increasing augmentation intensity)
    • Use case-specific augmentation combinations
  • Results Storage: Store results in:

    • MongoDB database for shared access and analysis
    • Local TinyDB for standalone operation
  • Debug Mode: Save augmented images for visual inspection

Project TAHAI

DeepBench was developed as part of the TAHAI (Trustworthy AI for Human-Augmented Intelligence) project, which focuses on assessing the robustness of various image classification models. Robustness, in this context, refers to the models' ability to consistently produce stable and reliable results, even when faced with disturbances or variations in input data.

Requirements

DeepBench requires Python 3.10-3.12 and the following key dependencies:

  • PyTorch
  • Transformers (Hugging Face)
  • OpenCV
  • NumPy
  • Pandas
  • MongoDB (for database storage)
  • TinyDB (for local storage)

For a complete list of dependencies, refer to the pyproject.toml file.

Getting Started

Local Installation

  1. Clone the repository:

    git clone <repository-url>
    cd deepbench
  2. Create a Python virtual environment:

    python -m venv deepbench
  3. Activate the virtual environment:

    • Windows: deepbench\Scripts\activate
    • Linux/Mac: source deepbench/bin/activate
  4. Install DeepBench and its dependencies:

    pip install -e .
  5. For development tools:

    pip install -e .[dev]
  6. Set up the database (see Database Setup)

  7. Run DeepBench with a configuration file:

    python src/deepbench.py -c "configs/default_config.toml"

GPU Cluster Installation

  1. Install the VS Code Extension Remote - SSH and connect to the GPU cluster:

    username@ipadressofcluster
    
  2. Clone the repository on the cluster

  3. Create a conda environment:

    conda create --name deepbench python=3.11
    conda activate deepbench
  4. Install CUDA and PyTorch:

    conda install pytorch
  5. Install DeepBench:

    cd deepbench
    pip install -e .
  6. Create an SBATCH file for running on the cluster (example in deepbench.sbatch)

  7. Run DeepBench using SLURM:

    sbatch deepbench.sbatch

Setup

Database Setup

  1. Database Connection: DeepBench uses MongoDB to store experiment results. By default, the results are saved in a remote MongoDB service hosted on cloud.mongodb.com or on a local instance that is linked to the TAHAI project. If you want to use your own MongoDB instance, update the MongoDB credentials in your .env file.

  2. Database Authentication: Create a new file named .env in the root directory with the following content:

    DBUSER = db_write
    DBPASSWD = your_password
    MONGODB_URI = your_host_address
    

    The project database could have two user roles:

    • db_write: For running DeepBench (can write to the database)
    • db_read: For viewing results (read-only access)
  3. Storage Options: If you prefer to store results locally instead of in MongoDB, use the -l or --local flag:

    python src/deepbench.py -c "configs/default_config.toml" -l

    Or if you want it inside the main src/deepbench.py function check if the following line is used:

       ResultLocal(infer_result_list)

    This will store results in a TinyDB database (JSON format) in the deepbench/output directory.

    If you onl want to store the results in the mongodb use the following line from the main src/deepbench.py:

       ResultDatabase(infer_result_list)

Input Images Setup

DeepBench requires a CSV file that contains image paths and their corresponding ground truth labels. There are several ways to create this file:

1. Manual CSV Creation

Create a CSV file with the following format:

image_path,ground_truth
/path/to/image1.jpg,0
/path/to/image2.jpg,1

The ground truth should be the class index (integer) for the image.

2. Using create_image_subset.py Tool

DeepBench provides a powerful tool called create_image_subset.py in the /tools directory that can automatically generate CSV files and mapping files from your image datasets.

Basic Usage:

# Create a CSV file with paths from a directory (all images)
python tools/create_image_subset.py -nfp /path/to/dataset/folder -n 0

# Create a CSV file with a subset of images (e.g., 100 images)
python tools/create_image_subset.py -nfp /path/to/dataset/folder -n 100

Parameters:

  • -nfp, --new_paths_file: Path to the dataset directory. The tool will recursively find all images and create a CSV file.
  • -f, --filelist: Text file with a list of absolute file paths (alternative to -nfp).
  • -n: Number of examples to include in the subset (use 0 for all images).
  • -m, --mapping: Path to the mapping file for class names (default is ImageNet1k mapping).
  • -o, --output: Output file path for the CSV file (default: filepaths_TIMESTAMP.csv).

How It Works:

  1. When using -nfp, the tool:

    • Recursively finds all image files in the specified directory
    • Creates a text file with all file paths (new_file_paths.txt)
    • Automatically generates a mapping file (new_mapping.txt) based on subfolder names
    • Creates a CSV file with image paths and class indices
  2. The mapping file format is:

    class_folder_name class_description
    
  3. The tool assumes that images are organized in a folder structure where each subfolder represents a class:

    dataset/
    ├── class1/
    │   ├── image1.jpg
    │   └── image2.jpg
    ├── class2/
    │   ├── image3.jpg
    │   └── image4.jpg
    

Example:

# Create a CSV with all images from a medical dataset
python tools/create_image_subset.py -nfp /datasets/medical_images -n 0

# Create a CSV with 50 random images per class from a dataset
python tools/create_image_subset.py -nfp /datasets/food_images -n 50 -o food_dataset.csv

The tool will generate:

  • A CSV file with image paths and class indices
  • A mapping file that maps folder names to class indices
  • A text file with all file paths

These files can then be used in your DeepBench configuration.

Configuration Files

All configurations for DeepBench are located in the /configs directory. The main configuration files are:

Model Configuration

The model configuration file (e.g., default_config.toml) contains settings for:

[cli]
experiment_name = "model-name"            # Name for this experiment run
debug = false                             # Save augmented images if true
input = './path/to/images.csv'            # Path to input images
primer_img_name = 'image.png'             # if you to save the actual image data as a np-array for a specific image
output = './output'                       # Output directory
local = false                             # Use local storage if true

[database]
mongodb = "mongodb+srv://DBUSER:DBPASSWD@MONGODB_URI/"

[models]
hugging_face = "google/vit-base-patch16-224"  # Model identifier
ollama_model = "gemma3:27b"                   # if you want to use the Olama API, leave empty if not needed
img_scaling = [224, 224]                      # Image size for model input
top_k = 5                                     # Number of top predictions to save
multimodal_classes = [
    "./tools/dataset_mapping.txt", # relative to Python cwd or absolute path
    # 1. Option: add classes as a list[str] -> ["car","cat","dog"]
    # 2. Option: add path of mapping file, will be mapped according to "gt" in input.csv file
    # 3. Option: leave empty to use imagenet1k labels/classes
]

[augmentation]
augment_config = [
    "./configs/augmentation/augm_defaults.toml",  # Default augmentation settings
    "./configs/augmentation/augm_use_case.toml",  # Use case augmentations
]

Augmentation Configuration

Augmentation configurations define how images are transformed:

  1. Default Augmentations (augm_defaults.toml):

    [augmentation.imgMethod.GaussianBlur]
    kernel_size = 9
    sigma_limit = 0
    
    [augmentation.imgMethod.ImageRotation]
    angle_degrees = 45
  2. Ramp Augmentations (gradually increasing intensity):

    [augmentation.Ramp.Brightness]
    active = true
    ramp_var = "brightness"
    range = [-100, 100]
    step_size = 25
  3. Use Case Augmentations (domain-specific combinations):

    [augmentation.UseCase.MedicalDiagnosis]
    active = true
    
    [augmentation.UseCase.MedicalDiagnosis.HistEqualization]
    active = true

Augmentation Methods

DeepBench includes a comprehensive set of image augmentation methods to test model robustness:

Basic Augmentations

  • Brightness: Adjust image brightness
  • Contrast: Modify image contrast
  • Rotation: Rotate image by specified degrees
  • Flips: Horizontal and vertical image flipping
  • Gaussian Blur: Apply blur with configurable kernel size
  • Motion Blur: Simulate motion blur effects
  • Gaussian Noise: Add random noise to the image
  • Salt & Pepper Noise: Add random white and black pixels
  • Histogram Equalization: Enhance image contrast
  • Global Color Shift: Modify color channels
  • Grid Distortion: Apply grid-based distortion
  • Grid Elastic Deformation: Apply elastic deformations
  • Perspective Transformation: Change image perspective
  • Rain: Simulate rain effects
  • Cloud Generator: Add cloud overlays
  • Shadow: Add shadow effects

Ramp Augmentations

Ramp augmentations apply a method with gradually increasing intensity to test how model performance degrades:

[augmentation.Ramp.Brightness]
active = true
ramp_var = "brightness"
range = [-100, 100]
step_size = 25

This will test the model with brightness values of -100, -75, -50, -25, 0, 25, 50, 75, and 100.

Use Case Augmentations

Use case augmentations combine multiple methods to simulate real-world scenarios:

  • Medical Diagnosis: Adjustments relevant for medical imaging
  • Autonomous Driving: Weather and lighting conditions for driving
  • Manufacturing Quality: Transformations for industrial inspection
  • Handheld Devices: Camera shake and lighting variations
  • People Recognition: Variations in facial recognition scenarios
  • Satellite Imaging: Atmospheric and perspective effects

LLM-based Augmentation Selector

The Augmentation Selector is a separate tool that uses OpenAI's API to automatically recommend and configure augmentation methods tailored to a specific domain. It generates a TOML configuration file with inline comments explaining each augmentation and logs evaluation results in a CSV file. These can be run by Deepbench without any further steps.

Features

  • Automatically selects appropriate augmentations for domain-specific datasets.
  • Queries OpenAI's API with a high-level description of the application domain.
  • Dynamically parses available augmentations from a template file.
  • Generates a configuration file in TOML format with inline comments explaining each augmentation.
  • Logs evaluation results across runs in a CSV file, including selected augmentations, use case name, and model used.

OpenAI API Key

How to set up OpenAI API Key

To use the Augmentation Selector, set up your OpenAI API key:

  1. Create a .env file in the project root.
  2. Add the following entry: OPENAI_API_KEY=your-openai-api-key

Usage

  • Run the script by providing a key, a high-level description of your application domain and the path to the augmentation template file.:

    > python src/augmentation_selector/main.py "MedicalImaging" "high-resolution medical imaging Dataset for Knee Arthritis" "augmentation_template.toml"

  • Specify the OpenAI model (optional, default is gpt-4o):

    > python src/augmentation_selector/main.py "Landscape" "Aerial Landscape Dataset" "augmentation_template.toml" --model gpt-3.5-turbo

  • The script generates:

    • A configuration file in the configs/ directory: > configs/generated_config.toml

    • Evaluation results logged in a CSV file > evaluation_results.csv.

Output

The generated configuration file is in TOML format. Example:
[augmentation.UseCase.MedicalImaging]
active = true

# Explanation: Flipping images horizontally allows the model to learn from the symmetrical nature of knee anatomy.
[augmentation.UseCase.MedicalImaging.ImageFlipHorizontal]
# Description: Flips the image along the vertical axis, mirroring it horizontally.
active = true

# Explanation: Slightly rotating images introduces variability to account for different imaging angles.
[augmentation.UseCase.MedicalImaging.Ramp.ImageRotation]
# Description: Rotates the image by a specified angle, keeping its contents intact.
active = true
ramp_var = "angle_degrees"
range = [-150, 150]
step_size = 30

Evaluation Results Logging

Evaluation results for each run are logged in evaluation_results.csv in the following format:
Run UseCase
Model
Brightness CloudGenerator Contrast GaussianBlur GaussianNoise GlobalColourShift GridDistortion GridElasticDeformation HistEqualization ImageFlipHorizontal ImageFlipVertical ImageRotation MotionBlur PerspectiveTransformation Rain SaltPepperNoise Shadow UseCaseName
1 medical_diagnosis gpt-4o-mini X X X X X X X X X X
2 medical_diagnosis gpt-4o X X X X X X X X X X
3 auto_driving gpt-4o-mini X X X X X X X X X
4 auto_driving gpt-4o X X X X X X X X X X
5 manufacturing_quality gpt-4o-mini X X X X X X X X X X X X
6 manufacturing_quality gpt-4o X X X X X X X X X X X X X
7 people_recognition gpt-4o-mini X X X X X X X X X X
8 people_recognition gpt-4o X X X X X X X X X X
9 satellite_imaging gpt-4o-mini X X X X X X X X X X X X
10 satellite_imaging gpt-4o X X X X X X X X X X
11 handheld gpt-4o-mini X X X X X X X X X X
12 handheld gpt-4o X X X X X X X X X X
  • Run: Incremental ID for each run.
  • UseCase: The name of the application domain.
  • Model: The OpenAI model used.
  • Augmentation Columns: An X indicates inclusion of the augmentation in the run.

Augmentation Configuration

How to configure available augmentations and parameters

All augmentation parameters for the Augmentation Selector are defined in the default_augmentation_config.toml file. This file lists all available augmentation methods and their adjustable parameters, including:

  • active: Determines whether an augmentation method is enabled (true) or disabled (false).
  • range: Defines the acceptable range of values for parameters.
  • step_size: Specifies the increment for parameter adjustment.

By adjusting these parameters, you can plug in your own augmentation methods and tailor them to fit the needs of your specific domain.

Running Benchmarks

Basic Usage with Hugging Face Models

For Hugging Face models, you can use the default batch size (128) for efficient processing. For gated models that require authentication, add your HuggingFace token to the .env file:

HUGGING_FACE_HUB_TOKEN = yourtoken

And then start with:

python src/deepbench.py -c "configs/model_resnet-50.toml"

Basic Usage with Ollama Models

When using Ollama models, you need to:

  1. Configure Environment Variables:

    • Create or update your .env file to include the Ollama server address:
      OLLAMA_ADDRESS=http://localhost:11434
      
    • For remote Ollama servers, replace localhost with the appropriate IP address
  2. Configure Model in TOML Configuration:

    • Create a configuration file with ollama_model instead of hugging_face:
      [models]
      hugging_face = ""  # Leave empty when using Ollama
      ollama_model = "llava:34b"  # Specify the Ollama model name
      img_scaling = [224, 224]
      top_k = 5
  3. Modify the batch size in src/deepbench.py to 1 (most Ollama models process one image at a time):

    # Change this line
    batch_size = 128  # for Ollama set to 1, for HuggingFace set to 128 or higher
    # To
    batch_size = 1  # for Ollama set to 1, for HuggingFace set to 128 or higher
  4. Ensure Ollama is running on your local machine or remote server

  5. Run DeepBench with your Ollama configuration:

    python src/deepbench.py -c "configs/ollama_test_benchm.toml"

Advanced Usage

For large-scale benchmarking across multiple models and use cases, DeepBench provides shell scripts and SBATCH configurations that automate the process. This is particularly useful when running on GPU clusters with job scheduling systems like SLURM.

Setting Up Batch Submission

  1. Navigate to the sbatch folder in the DeepBench directory.

  2. Configure the batch_submit_main.sh script:

    • Define the dataset size (number of images to process):

      NUM_DATA="0100"  # 100 images per class
    • Specify the models to benchmark (comma-separated list):

      HUGGING_FACE_MODELS="\
      google/gemma-3-4b-it,\
      llava-hf/llava-v1.6-mistral-7b-hf,\
      "
    • Set the input image sizes for each model:

      INPUT_SIZE_LIST="\
      512,\
      224,\
      "
    • Uncomment or add the SBATCH job submissions for the use cases you want to test:

      sbatch sbatch/handheld.sbatch "$NUM_DATA" "google/gemma-3-4b-it" "512"
      sbatch sbatch/handheld.sbatch "$NUM_DATA" "llava-hf/llava-v1.6-mistral-7b-hf" "224"
      # Add more as needed
  3. Customize SBATCH files for specific use cases:

    • Each use case has its own SBATCH file (e.g., auto_driving.sbatch, handheld.sbatch, medical.sbatch)
    • These files contain SLURM configuration parameters like:
      #SBATCH --job-name=TAHAI_handheld
      #SBATCH --time=6-18:00:00
      #SBATCH --gres=gpu:1
      #SBATCH --nodes=1
      #SBATCH --cpus-per-gpu=64
      #SBATCH --mem-per-gpu=32G
    • Adjust these parameters based on your cluster's resources and job requirements

Running Batch Jobs

  1. Make the script executable:

    chmod +x sbatch/batch_submit_main.sh
  2. Run the batch submission script:

    ./sbatch/batch_submit_main.sh
  3. Monitor job status:

    squeue  # View job queue
    sacct   # View job history
  4. Cancel jobs if needed:

    scancel JOB_ID

Specialized Batch Scripts

DeepBench includes several specialized batch scripts for different benchmarking scenarios:

  • batch_submit_clip_variations.sh: Tests multiple CLIP model variants
  • batch_submit_medical_specialists.sh: Focuses on medical imaging models
  • batch_submit_satellite_specialists.sh: Tests satellite/aerial imaging models

Creating Custom Batch Scripts

You can create custom batch scripts for your specific benchmarking needs:

  1. Copy an existing script as a template:

    cp sbatch/batch_submit_main.sh sbatch/batch_submit_custom.sh
  2. Modify the model list and other parameters to suit your requirements

  3. Create or modify SBATCH files for specific use cases or datasets

This approach allows you to efficiently run multiple benchmarks in parallel, maximizing GPU utilization and automating the testing process across different models and augmentation methods.

CLI Arguments

DeepBench supports the following command-line arguments:

  • -c, --config [CONFIG_FILE]: Path to the configuration file (TOML format)
  • -d, --debug: Execute in debug mode, augmented pictures will be saved
  • -i, --input [INPUT_PATH]: Path to folder containing images to use
  • -o, --output [OUTPUT_PATH]: Path to folder to store results
  • -l, --local: Use local TinyDB storage for results instead of MongoDB

Supported Models

DeepBench supports a wide range of image classification models through both Hugging Face and Ollama API integrations.

Hugging Face Models

DeepBench primarily uses models from Hugging Face. Supported Hugging Face model types include:

  1. Traditional CNN Models:

    • ResNet (microsoft/resnet-50, microsoft/resnet-101)
    • VGG (timm/vgg16.tv_in1k)
    • EfficientNet (google/efficientnet-b2)
  2. Vision Transformer Models:

    • ViT (google/vit-base-patch16-224)
  3. Vision Language Models:

    • CLIP (openai/clip-vit-base-patch32)
    • SigLIP (google/siglip-base-patch16-224)
    • LLaVA (llava-hf/llava-v1.6-mistral-7b-hf)
    • Phi (microsoft/Phi-3.5-vision-instruct)
    • Qwen (Qwen/Qwen2-VL-2B-Instruct)
    • BLIP, CogVLM, PaLI-Gemma, and more

Ollama API Models

DeepBench also supports using models through the Ollama API. This is particularly useful for:

  1. Testing locally hosted models
  2. Using models not available on Hugging Face
  3. Benchmarking open-source LLMs with vision capabilities

To use the Ollama API integration you need an Ollama server locally or an a remote machine. For supported models look at ollama.ai

Note: When using Ollama models, inference will be slower compared to Hugging Face models running on GPU, especially when processing many images. Consider using smaller datasets for testing. New models can be added by creating appropriate configuration files and model handler classes.

Results and Output

DeepBench stores results in either MongoDB or a local TinyDB database. Each experiment creates a collection with the format:

model-name-shortened_timestamp

For example: model-X-09-30-13_39_26

Each document in the collection represents an image (original or augmented) with the following structure:

{
   "_id": 1721239530471350311,
   "experiment_name": "model-X-07-17-20_05_25",
   "git": "aecd9ce32f64790766306fdbec5820531c0755bb",
   "image": "ILSVRC2012_val_00036423.jpg",
   "gt": "99",
   "resolution": {
      "original": [500, 376],
      "scaled": [224, 224]
   },
   "augment_method": {},
   "model": "model-X",
   "label_score": {
      "80": 0.0074752201326191425,
      "86": 0.002994032809510827,
      "99": 0.8502982258796692,
      "703": 0.011575303040444851,
      "912": 0.05776028335094452
   }
}

For augmented images, the augment_method field contains details about the applied augmentation. The Original (unaugmented/uncorrupted) images the augment_method field contains NoAugmentCategory.

If debug mode is enabled (-d flag), augmented images are saved to the output directory for visual inspection.

Adding New Models

To add a new model to DeepBench:

  1. Create a New Branch: Start by creating a new branch from the main branch for testing.

  2. Create a Configuration File: Create a new TOML file in the configs directory:

    [cli]
    experiment_name = "your-model-name"
    
    [models]
    hugging_face = "publisher/your-model-name"
    img_scaling = [224, 224]
    top_k = 5
  3. Create a Model Handler Class: If the model requires special handling, create a new class in src/deepbench/ml/image/classification/.

  4. Update the Model Registry: Add your model class to the registry in src/deepbench/ml/image/imgclassifier.py.

  5. Test Your Implementation: Run DeepBench with your new configuration and verify the results.

  6. Create a Pull Request: Once tested, create a pull request to merge your changes into the main branch.

Contributing

Contributions to DeepBench are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests to ensure functionality
  5. Submit a pull request

Trouble Shooting

It might happen that the flash-attention version is incompatible with your system. If that is the case just comment out the following lines in the pyproject.toml:

#"flash-attn @  https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl",

and the line in src\deepbench\ml\image\classification\hugging_llava.py:

   # self.model.config.attn_implementation = "flash_attention_2"

License

DeepBench is licensed under the MIT License.

More Information on Project TAHAI

Authors and Acknowledgment