docker run -it --name rep -v /WORKING_DIRECTORY:/my_home ubuntu
apt-get update
apt-get install libboost-all-dev
We use the same version released by SABD authors: fast-dbrd. The code is under REP folder. Note: please run SABD data preprocessing first, as the data used by this approach is generated by SABD.
An example to run
./build/bin/fast-dbrd -n rep_mozilla_no_version -r ./ranknet-configs/full-textual-no-version.cfg --ts /YOUR_HOME_DIRECTORY/SABD/dataset/eclipse/timestamp_file.txt --time-constraint 365 --training-duplicates 922 --recommend /YOUR_HOME_DIRECTORY/SABD/dataset/eclipse/dbrd_test.txtNotes:
Please check the issues under fast-dbrd-modified repo to understand what each argument means. Simply put:
-n xxxxmeans the name of the output file-rchoose the configurations--tstimestamp file is generated bySABD, do link to there--time-constraint 365, we use one-year time window--training-duplicates, the number is the number of duplicate bug reports in the training data (including training and validation)--recommend xxx.dbrd_test.txtis also generated bySABD
Create the environment from the SABD/environment.yml file:
conda env create -f environment.ymlgit clone https://github.com/stanfordnlp/GloVe gloveInstall Make (can use the same docker as REP)
apt-get update & apt-get install cmakeAllow run and run demo.sh + the project name. Please check our sample SABD/demo.sh.
chmod 777 demo.sh
./demo.sh eclipseDownload glove.42B.300d and unzip
wget http://nlp.stanford.edu/data/glove.42B.300d.zip
unzip glove.42B.300d.zipCreate the environment from the HINDBR/hindbr.yml file:
conda env create -f hindbr.ymlfor HINDBR py2 environment
conda env create -f py27-env.ymldocker pull mysql
docker run --name dbrd-mysql -e MYSQL_ROOT_PASSWORD=12345678 -d mysqlPlease download the data from here.
You can also download the processed word embeddings from here.
Please check each folder for the commands to run the approaches.
Please check the result folder. result-log
Please refer to the notebook.
Thanks the everyone kindly share their implementations and be patient to answer our questions.
Please consider citing our work:
@article{zhang2022duplicate,
author = {Zhang, Ting and Han, DongGyun and Vinayakarao, Venkatesh and Irsan, Ivana Clairine and Xu, Bowen and Thung, Ferdian and Lo, David and Jiang, Lingxiao},
title = {Duplicate Bug Report Detection: How Far Are We?},
year = {2022},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1049-331X},
url = {https://doi.org/10.1145/3576042},
doi = {10.1145/3576042},
abstract = {Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.},
note = {Just Accepted},
journal = {ACM Trans. Softw. Eng. Methodol.},
month = {dec},
keywords = {Bug Reports, Empirical Study, Duplicate Bug Report Detection, Deep Learning}
}
If you have any questions, feel free to contact Ting Zhang (email: happygirlzt@gmail.com or tingzhang.2019@phdcs.smu.edu.sg).