Mutation-Generation-Study

1. Environment

Python 3.7
PyTorch 1.3
Defects4J V2.0
Java 8
ConDefects

2.RUN

Defects4J V2.0 should be installed in this directory. For the installation path of Defects4J, please refer to https://github.com/rjust/defects4j. After the installation is complete, run the download.py script to download the fixed and buggy versions of the code for the Defects4J project.

(1) Generate the mutations

LLMS: For GPT-3.5, GPT-4o, GPT-4omini, DeepSeek, CodeLlama and StarChat, you can directly run generate.py and then choose the corresponding model to generate mutations. You can also select to generate mutations for Defects4J or ConDefects, or define your own storage path.It is important to note that running GPT-3.5, GPT-4o, GPT-4omini, and DeepSeek requires you to apply for an API key yourself, while CodeLlama and StarChat need the models to be downloaded in advance to a specified location.

LEAM: To use LEAM to generate mutations, first you need to git clone https://github.com/tianzhaotju/LEAM.git in the Leam directory. Then, use Leam/testDefects4jV1Fixed.py and Leam/run.py to replace the corresponding files in the downloaded Leam-main folder. Next, put Leam/get_location.py into the Leam-main.py file. Finally, run Leam/Leam-main/get_location.py, Leam/Leam-main/testDefects4jV1Fixed.py, and Leam/leamdiff.py in order to generate the mutations, which will be saved in the Leam/mutant directory.

mBert: To use mBERT to generate mutations, you first need to run download-codebert.py to download and store the language model CodeBERT. Then, run mbert_generate_mutant.py to generate the mutations, which will be stored in the mBERTm/mbert_mutant folder.

Major: First, set up the major environment and change the Java compiler. Then, run major/major_generate.py and major/get_json_mutant.py to obtain the mutations generated by MAJOR. The generated mutations will be saved in the major/mutant directory.

Pitest: You can directly run pitest/pitest_generate.py to generate mutations, and the generated mutations will be stored in the pitest/Pitest_mutant directory.

(2) Test the mutations

For each method(LLMS, LEAM, mBert, Major, Pitest), you can directly run test.py to test the generated mutations. The test results will be stored in the respective tested directories (need to choose the storage path yourself). And the number of all generated mutants and the number of mutants that passed compilation will be output.

For the ConDefects data, you need to first download the test cases by referring to https://github.com/appmlk/ConDefects. After extracting Test.zip, you should run the contest.py script.

3.Evaluate

All the data generated during the experiment has been stored here: https://drive.google.com/drive/folders/170jWUSRuwqjOBNwIGLtk9HKCg1vS_8pj, To evaluate the generated mutants, you need to download the data from this link and then place the data into the corresponding folder.

Structure of the Directories

 |--- README.md                   :  User guidance
 |--- Codellama                   :  The mutants and test results generated by CodeLlama.
 |--- ConDefects-main             :  All mutants and test results generated on the ConDefects dataset.
      |--- java20240304_mutant    :  Mutants generated on the ConDefects dataset between March and April 2024.
      |--- java20240406-mutant    :  Mutants generated on the ConDefects dataset between April and June 2024.
 |--- Deepseek                    :  The mutants and test results generated by Deepseek.
 |--- Gpt
      |--- gpt3.5                 :  The mutants and test results generated by gpt-3.5-turbo.
      |--- gpt4o                  :  The mutants and test results generated by gpt-4o.
      |--- gpt4omini              :  The mutants and test results generated by gpt-4o-mini.
 |--- Leam                        :  The mutants and test results generated by Leam.
      |---Leam-main               :  he storage path of Leam.
 |--- gumtree-spoon-ast-diff      :  A tool to compute the AST difference between two Spoon abstract syntax trees using the Gumtree algorithm. Referring to 'https://github.com/SpoonLabs/gumtree-spoon-ast-diff'
 |--- lib                         :  Pitest jar files
 |--- Major                       :  The mutants and test results generated by Major.
 |--- mBert                       :  The mutants and test results generated by mBert.
      |---mBERT                   :  The storage path of mBERT.
 |--- Pitest                      :  The mutants and test results generated by Pitest.
 |----pitest-1.14.4               :  The storage path of Pitest.
 |--- Starchat                    :  The mutants and test results generated by Starchat.
 |--- type                        :  The uncompiled mutants, along with their types and proportions.
 |--- Sample.xlsx                 :  Equivalent mutation sampling results.

To evaluate the generated mutations, we have defined several metrics: Compilability Rate, Useless Mutation Ratio, Real Bug Detectability, Coupling Rate, Ochiai Coefficient and so on. For all methods except pitest you can:

(1)Run behavior.py to filter the generated and compiled mutants, and obtain the number of real faults detected for each project. And calculate the Real Bug Detectability,Ochiai Coefficient for each project.

(2)Run coup.py to calculate the coupling rate for each project.

(3)Run count_num.py to sample the generated mutants 10 times, calculate the metrics mentioned above, and output the average values.

(4)Run gumtreeAST.py to calculate the AST difference between two Spoon abstract syntax trees.

(5)It's worth noting that for Pitest, you need to run piGPT.py to obtain the number of generated mutants and the number of mutants that passed compilation. And run pitest_coup.py,pitest_fault.py,pitest_ochiai.py to calculate the coupling rate,Real Bug Detectability,Ochiai Coefficient for each project. Run pitest_count_num.py to sample the generated mutants 10 times, calculate the metrics mentioned above, and output the average values.

(6)For the ConDefects dataset, in order to calculate the coupling rate, Real Bug Detectability, Ochiai, you need to run the conCoup.py and conBehavior.py scripts located in the condefect-main folder.

sample.xlsx contains the sampled mutations and results that we used to calculate equivalent mutations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!