DeepBench: Overview

DeepBench is a command-line interface tool that performs systematic evaluation of image classification models by:

Loading original images
Applying various augmentation methods to create modified versions
Classifying both original and augmented images using selected models
Storing the results in a database for analysis

This approach allows researchers and developers to assess how well different models perform under various image transformations and distortions, providing insights into model robustness and reliability. Deepbench is supposed to be run on a gpu cluster or similar.

Features
Project TAHAI
Requirements
Getting Started
- Local Installation
- GPU Cluster Installation
Setup
Augmentation Methods
Running Benchmarks
CLI Arguments
Supported Models
- Hugging Face Models
- Ollama API Models
Results and Output
Adding New Models
Contributing
Trouble Shooting
License
More Information on Project TAHAI
Authors and Acknowledgment

Features

Comprehensive Model Support: Test a wide range of image classification models from Hugging Face or Ollama API including:
- Vision Language Models (Multi modal models), Traditional CNN models and Vision Transformer models
Extensive Augmentation Library: Apply various image transformations to test model robustness:
- Basic transformations (brightness, contrast, rotation, flips)
- Noise additions (Gaussian noise, salt & pepper)
- Blur effects (Gaussian blur, motion blur)
- Weather simulations (rain, clouds, shadows)
- Geometric transformations (perspective, grid distortion)
Flexible Configuration: Customize benchmarks through TOML configuration files:
- Select models and image scaling parameters
- Configure augmentation methods and parameters
- Define experiment names and output settings
Systematic Testing: Evaluate models with:
- Individual augmentations
- Ramp testing (incrementally increasing augmentation intensity)
- Use case-specific augmentation combinations
Results Storage: Store results in:
- MongoDB database for shared access and analysis
- Local TinyDB for standalone operation
Debug Mode: Save augmented images for visual inspection

Project TAHAI

DeepBench was developed as part of the TAHAI (Trustworthy AI for Human-Augmented Intelligence) project, which focuses on assessing the robustness of various image classification models. Robustness, in this context, refers to the models' ability to consistently produce stable and reliable results, even when faced with disturbances or variations in input data.

Requirements

DeepBench requires Python 3.10-3.12 and the following key dependencies:

PyTorch
Transformers (Hugging Face)
OpenCV
NumPy
Pandas
MongoDB (for database storage)
TinyDB (for local storage)

For a complete list of dependencies, refer to the pyproject.toml file.

Getting Started

Local Installation

Clone the repository:
```
git clone <repository-url>
cd deepbench
```
Create a Python virtual environment:
```
python -m venv deepbench
```
Activate the virtual environment:
- Windows: deepbench\Scripts\activate
- Linux/Mac: source deepbench/bin/activate
Install DeepBench and its dependencies:
```
pip install -e .
```
For development tools:
```
pip install -e .[dev]
```
Set up the database (see Database Setup)

Run DeepBench with a configuration file:

python src/deepbench.py -c "configs/default_config.toml"

GPU Cluster Installation

Install the VS Code Extension Remote - SSH and connect to the GPU cluster:
```
username@ipadressofcluster
```
Clone the repository on the cluster

Create a conda environment:

conda create --name deepbench python=3.11
conda activate deepbench

Install CUDA and PyTorch:
```
conda install pytorch
```
Install DeepBench:
```
cd deepbench
pip install -e .
```
Create an SBATCH file for running on the cluster (example in deepbench.sbatch)
Run DeepBench using SLURM:
```
sbatch deepbench.sbatch
```

Setup

Database Setup

Database Connection: DeepBench uses MongoDB to store experiment results. By default, the results are saved in a remote MongoDB service hosted on cloud.mongodb.com or on a local instance that is linked to the TAHAI project. If you want to use your own MongoDB instance, update the MongoDB credentials in your .env file.
Database Authentication: Create a new file named .env in the root directory with the following content:
```
DBUSER = db_write
DBPASSWD = your_password
MONGODB_URI = your_host_address
```
The project database could have two user roles:
- db_write: For running DeepBench (can write to the database)
- db_read: For viewing results (read-only access)
Storage Options: If you prefer to store results locally instead of in MongoDB, use the -l or --local flag:
```
python src/deepbench.py -c "configs/default_config.toml" -l
```
Or if you want it inside the main src/deepbench.py function check if the following line is used:
```
   ResultLocal(infer_result_list)
```
This will store results in a TinyDB database (JSON format) in the deepbench/output directory.

If you onl want to store the results in the mongodb use the following line from the main src/deepbench.py:
```
   ResultDatabase(infer_result_list)
```

Input Images Setup

DeepBench requires a CSV file that contains image paths and their corresponding ground truth labels. There are several ways to create this file:

1. Manual CSV Creation

Create a CSV file with the following format:

image_path,ground_truth
/path/to/image1.jpg,0
/path/to/image2.jpg,1

The ground truth should be the class index (integer) for the image.

2. Using create_image_subset.py Tool

DeepBench provides a powerful tool called create_image_subset.py in the /tools directory that can automatically generate CSV files and mapping files from your image datasets.

Basic Usage:

# Create a CSV file with paths from a directory (all images)
python tools/create_image_subset.py -nfp /path/to/dataset/folder -n 0

# Create a CSV file with a subset of images (e.g., 100 images)
python tools/create_image_subset.py -nfp /path/to/dataset/folder -n 100

Parameters:

-nfp, --new_paths_file: Path to the dataset directory. The tool will recursively find all images and create a CSV file.
-f, --filelist: Text file with a list of absolute file paths (alternative to -nfp).
-n: Number of examples to include in the subset (use 0 for all images).
-m, --mapping: Path to the mapping file for class names (default is ImageNet1k mapping).
-o, --output: Output file path for the CSV file (default: filepaths_TIMESTAMP.csv).

How It Works:

When using -nfp, the tool:
- Recursively finds all image files in the specified directory
- Creates a text file with all file paths (new_file_paths.txt)
- Automatically generates a mapping file (new_mapping.txt) based on subfolder names
- Creates a CSV file with image paths and class indices
The mapping file format is:
```
class_folder_name class_description
```

The tool assumes that images are organized in a folder structure where each subfolder represents a class:

dataset/
├── class1/
│   ├── image1.jpg
│   └── image2.jpg
├── class2/
│   ├── image3.jpg
│   └── image4.jpg

Example:

# Create a CSV with all images from a medical dataset
python tools/create_image_subset.py -nfp /datasets/medical_images -n 0

# Create a CSV with 50 random images per class from a dataset
python tools/create_image_subset.py -nfp /datasets/food_images -n 50 -o food_dataset.csv

The tool will generate:

A CSV file with image paths and class indices
A mapping file that maps folder names to class indices
A text file with all file paths

These files can then be used in your DeepBench configuration.

Configuration Files

All configurations for DeepBench are located in the /configs directory. The main configuration files are:

Model Configuration

The model configuration file (e.g., default_config.toml) contains settings for:

[cli]
experiment_name = "model-name"            # Name for this experiment run
debug = false                             # Save augmented images if true
input = './path/to/images.csv'            # Path to input images
primer_img_name = 'image.png'             # if you to save the actual image data as a np-array for a specific image
output = './output'                       # Output directory
local = false                             # Use local storage if true

[database]
mongodb = "mongodb+srv://DBUSER:DBPASSWD@MONGODB_URI/"

[models]
hugging_face = "google/vit-base-patch16-224"  # Model identifier
ollama_model = "gemma3:27b"                   # if you want to use the Olama API, leave empty if not needed
img_scaling = [224, 224]                      # Image size for model input
top_k = 5                                     # Number of top predictions to save
multimodal_classes = [
    "./tools/dataset_mapping.txt", # relative to Python cwd or absolute path
    # 1. Option: add classes as a list[str] -> ["car","cat","dog"]
    # 2. Option: add path of mapping file, will be mapped according to "gt" in input.csv file
    # 3. Option: leave empty to use imagenet1k labels/classes
]

[augmentation]
augment_config = [
    "./configs/augmentation/augm_defaults.toml",  # Default augmentation settings
    "./configs/augmentation/augm_use_case.toml",  # Use case augmentations
]

Augmentation Configuration

Augmentation configurations define how images are transformed:

Default Augmentations (augm_defaults.toml):

[augmentation.imgMethod.GaussianBlur]
kernel_size = 9
sigma_limit = 0

[augmentation.imgMethod.ImageRotation]
angle_degrees = 45

Ramp Augmentations (gradually increasing intensity):

[augmentation.Ramp.Brightness]
active = true
ramp_var = "brightness"
range = [-100, 100]
step_size = 25

Use Case Augmentations (domain-specific combinations):

[augmentation.UseCase.MedicalDiagnosis]
active = true

[augmentation.UseCase.MedicalDiagnosis.HistEqualization]
active = true

Augmentation Methods

DeepBench includes a comprehensive set of image augmentation methods to test model robustness:

Basic Augmentations

Brightness: Adjust image brightness
Contrast: Modify image contrast
Rotation: Rotate image by specified degrees
Flips: Horizontal and vertical image flipping
Gaussian Blur: Apply blur with configurable kernel size
Motion Blur: Simulate motion blur effects
Gaussian Noise: Add random noise to the image
Salt & Pepper Noise: Add random white and black pixels
Histogram Equalization: Enhance image contrast
Global Color Shift: Modify color channels
Grid Distortion: Apply grid-based distortion
Grid Elastic Deformation: Apply elastic deformations
Perspective Transformation: Change image perspective
Rain: Simulate rain effects
Cloud Generator: Add cloud overlays
Shadow: Add shadow effects

Ramp Augmentations

Ramp augmentations apply a method with gradually increasing intensity to test how model performance degrades:

[augmentation.Ramp.Brightness]
active = true
ramp_var = "brightness"
range = [-100, 100]
step_size = 25

This will test the model with brightness values of -100, -75, -50, -25, 0, 25, 50, 75, and 100.

Use Case Augmentations

Use case augmentations combine multiple methods to simulate real-world scenarios:

Medical Diagnosis: Adjustments relevant for medical imaging
Autonomous Driving: Weather and lighting conditions for driving
Manufacturing Quality: Transformations for industrial inspection
Handheld Devices: Camera shake and lighting variations
People Recognition: Variations in facial recognition scenarios
Satellite Imaging: Atmospheric and perspective effects

LLM-based Augmentation Selector

The Augmentation Selector is a separate tool that uses OpenAI's API to automatically recommend and configure augmentation methods tailored to a specific domain. It generates a TOML configuration file with inline comments explaining each augmentation and logs evaluation results in a CSV file. These can be run by Deepbench without any further steps.

Features

Automatically selects appropriate augmentations for domain-specific datasets.
Queries OpenAI's API with a high-level description of the application domain.
Dynamically parses available augmentations from a template file.
Generates a configuration file in TOML format with inline comments explaining each augmentation.
Logs evaluation results across runs in a CSV file, including selected augmentations, use case name, and model used.

OpenAI API Key

How to set up OpenAI API Key

To use the Augmentation Selector, set up your OpenAI API key:

Create a .env file in the project root.
Add the following entry: OPENAI_API_KEY=your-openai-api-key

Usage

Run the script by providing a key, a high-level description of your application domain and the path to the augmentation template file.:

> python src/augmentation_selector/main.py "MedicalImaging" "high-resolution medical imaging Dataset for Knee Arthritis" "augmentation_template.toml"
Specify the OpenAI model (optional, default is gpt-4o):

> python src/augmentation_selector/main.py "Landscape" "Aerial Landscape Dataset" "augmentation_template.toml" --model gpt-3.5-turbo
The script generates:
- A configuration file in the configs/ directory: > configs/generated_config.toml
- Evaluation results logged in a CSV file > evaluation_results.csv.

Output

The generated configuration file is in TOML format. Example:

[augmentation.UseCase.MedicalImaging]
active = true

# Explanation: Flipping images horizontally allows the model to learn from the symmetrical nature of knee anatomy.
[augmentation.UseCase.MedicalImaging.ImageFlipHorizontal]
# Description: Flips the image along the vertical axis, mirroring it horizontally.
active = true

# Explanation: Slightly rotating images introduces variability to account for different imaging angles.
[augmentation.UseCase.MedicalImaging.Ramp.ImageRotation]
# Description: Rotates the image by a specified angle, keeping its contents intact.
active = true
ramp_var = "angle_degrees"
range = [-150, 150]
step_size = 30

Evaluation Results Logging

Evaluation results for each run are logged in evaluation_results.csv in the following format:

Run	UseCase	Model	Brightness	CloudGenerator	Contrast	GaussianBlur	GaussianNoise	GlobalColourShift	GridDistortion	GridElasticDeformation	HistEqualization	ImageFlipHorizontal	ImageFlipVertical	ImageRotation	MotionBlur	PerspectiveTransformation	Rain	SaltPepperNoise	Shadow
1	medical_diagnosis	gpt-4o-mini			`X`	`X`	`X`			`X`	`X`	`X`		`X`	`X`			`X`	`X`
2	medical_diagnosis	gpt-4o	`X`		`X`	`X`	`X`			`X`	`X`	`X`		`X`	`X`			`X`
3	auto_driving	gpt-4o-mini			`X`		`X`		`X`			`X`		`X`	`X`	`X`	`X`		`X`
4	auto_driving	gpt-4o	`X`		`X`		`X`	`X`				`X`		`X`	`X`	`X`	`X`		`X`
5	manufacturing_quality	gpt-4o-mini			`X`	`X`	`X`	`X`	`X`		`X`			`X`	`X`	`X`	`X`	`X`	`X`
6	manufacturing_quality	gpt-4o	`X`		`X`	`X`	`X`		`X`	`X`	`X`	`X`		`X`	`X`	`X`		`X`	`X`
7	people_recognition	gpt-4o-mini	`X`		`X`		`X`	`X`		`X`	`X`	`X`		`X`	`X`			`X`
8	people_recognition	gpt-4o	`X`		`X`	`X`	`X`			`X`		`X`		`X`		`X`		`X`	`X`
9	satellite_imaging	gpt-4o-mini		`X`			`X`	`X`		`X`	`X`	`X`	`X`		`X`	`X`	`X`	`X`	`X`
10	satellite_imaging	gpt-4o	`X`	`X`	`X`		`X`					`X`	`X`	`X`	`X`	`X`			`X`
11	handheld	gpt-4o-mini					`X`	`X`	`X`	`X`	`X`	`X`		`X`	`X`			`X`	`X`
12	handheld	gpt-4o	`X`		`X`		`X`	`X`		`X`		`X`		`X`		`X`		`X`	`X`

Run: Incremental ID for each run.
UseCase: The name of the application domain.
Model: The OpenAI model used.
Augmentation Columns: An X indicates inclusion of the augmentation in the run.

Augmentation Configuration

How to configure available augmentations and parameters

All augmentation parameters for the Augmentation Selector are defined in the default_augmentation_config.toml file. This file lists all available augmentation methods and their adjustable parameters, including:

active: Determines whether an augmentation method is enabled (true) or disabled (false).
range: Defines the acceptable range of values for parameters.
step_size: Specifies the increment for parameter adjustment.

By adjusting these parameters, you can plug in your own augmentation methods and tailor them to fit the needs of your specific domain.

Running Benchmarks

Basic Usage with Hugging Face Models

For Hugging Face models, you can use the default batch size (128) for efficient processing. For gated models that require authentication, add your HuggingFace token to the .env file:

HUGGING_FACE_HUB_TOKEN = yourtoken

And then start with:

python src/deepbench.py -c "configs/model_resnet-50.toml"

Basic Usage with Ollama Models

When using Ollama models, you need to:

Configure Environment Variables:
- Create or update your .env file to include the Ollama server address:
```
OLLAMA_ADDRESS=http://localhost:11434
```
- For remote Ollama servers, replace localhost with the appropriate IP address

Configure Model in TOML Configuration:

Create a configuration file with ollama_model instead of hugging_face:

[models]
hugging_face = ""  # Leave empty when using Ollama
ollama_model = "llava:34b"  # Specify the Ollama model name
img_scaling = [224, 224]
top_k = 5

Modify the batch size in src/deepbench.py to 1 (most Ollama models process one image at a time):

# Change this line
batch_size = 128  # for Ollama set to 1, for HuggingFace set to 128 or higher
# To
batch_size = 1  # for Ollama set to 1, for HuggingFace set to 128 or higher

Ensure Ollama is running on your local machine or remote server

Run DeepBench with your Ollama configuration:

python src/deepbench.py -c "configs/ollama_test_benchm.toml"

Advanced Usage

For large-scale benchmarking across multiple models and use cases, DeepBench provides shell scripts and SBATCH configurations that automate the process. This is particularly useful when running on GPU clusters with job scheduling systems like SLURM.

Setting Up Batch Submission

Navigate to the sbatch folder in the DeepBench directory.

Configure the batch_submit_main.sh script:

Define the dataset size (number of images to process):
```
NUM_DATA="0100"  # 100 images per class
```

Specify the models to benchmark (comma-separated list):

HUGGING_FACE_MODELS="\
google/gemma-3-4b-it,\
llava-hf/llava-v1.6-mistral-7b-hf,\
"

Set the input image sizes for each model:
```
INPUT_SIZE_LIST="\
512,\
224,\
"
```

Uncomment or add the SBATCH job submissions for the use cases you want to test:

sbatch sbatch/handheld.sbatch "$NUM_DATA" "google/gemma-3-4b-it" "512"
sbatch sbatch/handheld.sbatch "$NUM_DATA" "llava-hf/llava-v1.6-mistral-7b-hf" "224"
# Add more as needed

Customize SBATCH files for specific use cases:
- Each use case has its own SBATCH file (e.g., auto_driving.sbatch, handheld.sbatch, medical.sbatch)
- These files contain SLURM configuration parameters like:
```
#SBATCH --job-name=TAHAI_handheld
#SBATCH --time=6-18:00:00
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --cpus-per-gpu=64
#SBATCH --mem-per-gpu=32G
```
- Adjust these parameters based on your cluster's resources and job requirements

Running Batch Jobs

Make the script executable:
```
chmod +x sbatch/batch_submit_main.sh
```
Run the batch submission script:
```
./sbatch/batch_submit_main.sh
```

Monitor job status:

squeue  # View job queue
sacct   # View job history

Cancel jobs if needed:
```
scancel JOB_ID
```

Specialized Batch Scripts

DeepBench includes several specialized batch scripts for different benchmarking scenarios:

batch_submit_clip_variations.sh: Tests multiple CLIP model variants
batch_submit_medical_specialists.sh: Focuses on medical imaging models
batch_submit_satellite_specialists.sh: Tests satellite/aerial imaging models

Creating Custom Batch Scripts

You can create custom batch scripts for your specific benchmarking needs:

Copy an existing script as a template:

cp sbatch/batch_submit_main.sh sbatch/batch_submit_custom.sh

Modify the model list and other parameters to suit your requirements
Create or modify SBATCH files for specific use cases or datasets

This approach allows you to efficiently run multiple benchmarks in parallel, maximizing GPU utilization and automating the testing process across different models and augmentation methods.

CLI Arguments

DeepBench supports the following command-line arguments:

-c, --config [CONFIG_FILE]: Path to the configuration file (TOML format)
-d, --debug: Execute in debug mode, augmented pictures will be saved
-i, --input [INPUT_PATH]: Path to folder containing images to use
-o, --output [OUTPUT_PATH]: Path to folder to store results
-l, --local: Use local TinyDB storage for results instead of MongoDB

Supported Models

DeepBench supports a wide range of image classification models through both Hugging Face and Ollama API integrations.

Hugging Face Models

DeepBench primarily uses models from Hugging Face. Supported Hugging Face model types include:

Traditional CNN Models:
- ResNet (microsoft/resnet-50, microsoft/resnet-101)
- VGG (timm/vgg16.tv_in1k)
- EfficientNet (google/efficientnet-b2)
Vision Transformer Models:
- ViT (google/vit-base-patch16-224)
Vision Language Models:
- CLIP (openai/clip-vit-base-patch32)
- SigLIP (google/siglip-base-patch16-224)
- LLaVA (llava-hf/llava-v1.6-mistral-7b-hf)
- Phi (microsoft/Phi-3.5-vision-instruct)
- Qwen (Qwen/Qwen2-VL-2B-Instruct)
- BLIP, CogVLM, PaLI-Gemma, and more

Ollama API Models

DeepBench also supports using models through the Ollama API. This is particularly useful for:

Testing locally hosted models
Using models not available on Hugging Face
Benchmarking open-source LLMs with vision capabilities

To use the Ollama API integration you need an Ollama server locally or an a remote machine. For supported models look at ollama.ai

Note: When using Ollama models, inference will be slower compared to Hugging Face models running on GPU, especially when processing many images. Consider using smaller datasets for testing. New models can be added by creating appropriate configuration files and model handler classes.

Results and Output

DeepBench stores results in either MongoDB or a local TinyDB database. Each experiment creates a collection with the format:

model-name-shortened_timestamp

For example: model-X-09-30-13_39_26

Each document in the collection represents an image (original or augmented) with the following structure:

{
   "_id": 1721239530471350311,
   "experiment_name": "model-X-07-17-20_05_25",
   "git": "aecd9ce32f64790766306fdbec5820531c0755bb",
   "image": "ILSVRC2012_val_00036423.jpg",
   "gt": "99",
   "resolution": {
      "original": [500, 376],
      "scaled": [224, 224]
   },
   "augment_method": {},
   "model": "model-X",
   "label_score": {
      "80": 0.0074752201326191425,
      "86": 0.002994032809510827,
      "99": 0.8502982258796692,
      "703": 0.011575303040444851,
      "912": 0.05776028335094452
   }
}

For augmented images, the augment_method field contains details about the applied augmentation. The Original (unaugmented/uncorrupted) images the augment_method field contains NoAugmentCategory.

If debug mode is enabled (-d flag), augmented images are saved to the output directory for visual inspection.

Adding New Models

To add a new model to DeepBench:

Create a New Branch: Start by creating a new branch from the main branch for testing.

Create a Configuration File: Create a new TOML file in the configs directory:

[cli]
experiment_name = "your-model-name"

[models]
hugging_face = "publisher/your-model-name"
img_scaling = [224, 224]
top_k = 5

Create a Model Handler Class: If the model requires special handling, create a new class in src/deepbench/ml/image/classification/.
Update the Model Registry: Add your model class to the registry in src/deepbench/ml/image/imgclassifier.py.
Test Your Implementation: Run DeepBench with your new configuration and verify the results.
Create a Pull Request: Once tested, create a pull request to merge your changes into the main branch.

Contributing

Contributions to DeepBench are welcome! Please follow these steps:

Fork the repository
Create a feature branch
Make your changes
Run tests to ensure functionality
Submit a pull request

Trouble Shooting

It might happen that the flash-attention version is incompatible with your system. If that is the case just comment out the following lines in the pyproject.toml:

#"flash-attn @  https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl",

and the line in src\deepbench\ml\image\classification\hugging_llava.py:

   # self.model.config.attn_implementation = "flash_attention_2"

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DeepBench: Overview

Table of Contents

Features

Project TAHAI

Requirements

Getting Started

Local Installation

GPU Cluster Installation

Setup

Database Setup

Input Images Setup

1. Manual CSV Creation

2. Using create_image_subset.py Tool

Configuration Files

Model Configuration

Augmentation Configuration

Augmentation Methods

Basic Augmentations

Ramp Augmentations

Use Case Augmentations

LLM-based Augmentation Selector

Features

OpenAI API Key

Usage

Output

Evaluation Results Logging

Augmentation Configuration

Running Benchmarks

Basic Usage with Hugging Face Models

Basic Usage with Ollama Models

Advanced Usage

Setting Up Batch Submission

Running Batch Jobs

Specialized Batch Scripts

Creating Custom Batch Scripts

CLI Arguments

Supported Models

Hugging Face Models

Ollama API Models

Results and Output

Adding New Models

Contributing

Trouble Shooting

License

More Information on Project TAHAI

Authors and Acknowledgment