This folder serves as the replication package for the paper: “SETS: A Simple yet Effective Approach for DNN Test Selection”. It includes the source code of the proposed approach and baselines, all necessary scripts to reproduce the experiments, the data required for running the experiments, and the original experimental results.
.
├── Input_data
│ ├── Fault_clusters
│ ├── Pretrained_model
│ └── Retrain
├── Experiment_results
│ ├── RQ1
│ ├── RQ2&3
│ └── RQ4
├── Readme.md
├── requirements.txt
└── Source_code
The requirements.txt lists the dependencies required to run the Python code.
Run the following command to install dependency packages:
pip install -r requirements.txt
The Source_code\SETS.pyfile contains the implementation of the SETS approach.
Run the following function to perform test selection for a given test selection problem:
def sets(size, index, features, output_probability, uncertainty, diversity, a)The input parameters include:
size: test budget, i.e., number of test inputs that will be selected (int)index: indexes of all the test inputs (list)features: features of all the test inputs (numpy.ndarray)output_probability: output probabilities of all the test inputs (numpy.ndarray)uncertainty: metric for uncertainty evaluation, "maxp" or "gini" (str)diversity: metric for diversity evaluation, "gd" or "std" (str)a: the reduction coefficient, 3 by default (int)
The output will be a list of indexes of the selected test inputs and the execution time.
The experiment uses a total of six datasets, among which MNIST, Fashion-MNIST, and CIFAR-10 can be directly loaded from keras.datasets. Fruit-360, SVHN and TinyImageNet datasets can be downloaded from the following links:
- Fruit-360 :
https://github.com/fruits-360/fruits-360-100x100/tree/2f981c83e352a9d4c15fb8c886034c817052c80b - SVHN :
http://ufldl.stanford.edu/housenumbers/ - TinyImageNet:
https://www.kaggle.com/c/tiny-imagenet/data?select=test.zip
We provide all the pretrained DNN models in the Pretrained_model folder. You can also download them from the following links:
| Dataset | Model | Pretrained_download_link |
|---|---|---|
| MNIST | LeNet1 | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| MNIST | LeNet5 | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| Fashion | LeNet4 | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| CIFAR-10 | 12Conv | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| CIFAR-10 | ResNet20 | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| SVHN | LeNet5 | https://github.com/ZOE-CA/DeepGD/tree/main/Retraining/Org_model |
| Fruit-360 | ResNet50 | in the pretrained_model folder |
| TinyImageNet | ResNet101 | https://drive.google.com/drive/folders/1RLyQIcJ8qNqds9US-Oo2a0uQEL0t6kSZ |
The official replication package of is available at DeepGD.
Since the DeepGD approach in its original replication package is implemented in Jupyter Notebook (.ipynb), we have encapsulated it into the Source_code/DeepGD.py file. Note that the DeepGD approach involves randomness and may produce slightly different results in different runs.
Run the following function to perform test selection for a given test selection problem:
def deepgd(size, index, gini_scores, features)The input parameters include:
size: test budget, i.e., number of test inputs that will be selected (int)index: indexes of all the test inputs (list)gini_scores: DeepGini uncertainty values of all the test inputs (numpy.ndarray)features: features of all the test inputs (numpy.ndarray)
The output will be a list of indexes of the selected test inputs and the execution time.
The Source_code/RS.py file provides the implementation of the RS approach.
Run the following function to perform test selection for a given test selection problem:
def rs(size, index)The input parameters include:
size: test budget, i.e., number of test inputs that will be selected (int)index: indexes of all the test inputs (list)
The output will be a list of indexes of the selected test inputs and the execution time.
The Input_data directory contains pre-processed files for running test selection approaches, and reproducing the experiments. It includes:
/Pretrained_model: this folder contains all the pretrained DNN models./Fault_clusters: this folder contains contains pre-processed data for each subject, which will be used as the input for running the test selection approach and evaluate the results:features_test.npy: features of all the test inputs,output_probability.npy: output probabilities of all the test inputs,- cluster_results.npy`: clustering label of each test input (for computing the Fault Detection Rate),
mis_index_test.npy: indexes of misclassified test inputs (for computing the Fault Detection Rate).
/Retrain: this folder contains a test set (T) and a validation set (V) for each dataset used for evaluating DNN retraining performance (RQ4).
In order to generate the files in /Fault_clusters, use the following data processing scripts:
- Run
python feature.pyto generatefeatures_test.npyandoutput_probability.npyfiles. This involves loading the dataset and model, performing predictions to obtain the output probabilities (the prediction time is recorded at the same time), and extracting features using a VGG16 model. - Run
python cluster.pyto generatecluster_results.npyandmis_index_test.npyfiles. Note that the DBSCAN clustering algorithm involves randomness and may produce slightly different results in different runs.
After obtaining all the required input data in Input_data (either by using the pre-generated data directly, or running data processing scripts to generate such data), run the following command to perform the experiments of each research question:
-
RQ1 (Configuration):
Runpython exp_1.py [n] [data_path] [output_path]to run SETS with different combinations of uncertainty and diversity metrics, and with different values of the reduction coefficient on all subjects.[n]: The number of times to repeat the experiment (e.g., 30).[data_path]: The path to theFault_clustersfolder.[output_path]: The path to save the result files.
-
RQ2&3 (Efficiency and Effectiveness):
Runpython exp_2_3.py [approach] [n] [data_path] [output_path]to apply a specific approach to perform test selection on all subjects.[approach]: The test selection method to use. Available options aresets,deepgd, andrs.[n]: The number of times to repeat the experiment (e.g., 30).[data_path]: The path to theFault_clustersfolder.[output_path]: The path to save the result files.
-
RQ4 (DNN Enhancement):
Runpython exp_4.py [n] [data_path] [output_path]to apply the SETS and DeepGD approaches to perform test selection on the test set T on all subjects.[n]: The number of times to repeat the experiment (e.g., 30).[data_path]: The path to theInput_datafolder.[output_path]: The path to save the selected subsets. Then, run the corresponding retraining scripts (retrain_four.py,retrain_fruit.py, andretrain_tiny.py), which will read the selected subsets you saved in theoutput_pathas input for model retraining. The accuracy of the retrained model will then be evaluated.
You will get results that are consistent with those in the Experiment_results folder after running the commands.
The Experiment_results folder provides all raw experimental results reported in the paper:
- RQ1 (Configuration): We provide the Fault Detection Rate (FDR) of SETS with different metric combinations in the
Metric_combinationfolder and the FDR and time cost of SETS with different reduction coefficients in thereduction_coefficientfolder (Results of 10 repeated runs). - RQ2&3 (Efficiency and Effectiveness): We provide the FDR and execution time of SETS and baseline approaches in their respective folders (Results of 30 repeated runs).
- RQ4 (DNN Enhancement): We provide the retraining results of SETS and DeepGD in the
Retrain_resultsfolder, as well as the selected subsets in their respective folders (Results of 5 repeated runs).