This repository contains the dataset employed for the experimental evaluation of the nine DL models, including:
-
3 Fine-tuned encoder-decoder models
- CodeBERT
- CodeT5+
- PLBart
-
3 Fine-tuned decoder-only models
- CodeGen
- CodeGPT
- CodeParrot
-
3 k-shot prompted, instruction-tuned LLMs (k=0,4)
- DeepSeek-Coder-6.7b
- Qwen2.5-Coder-7b
- StableCode-3b
The repository also contains the script designed to run inference using the 3 instruction-tuned LLMs. We adopted the 4-bit quantized version of these model, prompting them via the llama-cli interface.
- To run inference, you need
llama-cliinstalled and properly configured. - Ensure you have the required model files, e.g, deepseek-coder-6.7b-instruct.Q4_K_M.gguf.
- Adjust the script configuration:
- Uncomment the desired model configuration block in the script.
- Update the
MODEL_PATH,NGL, andMAX_TOKENSvariables as needed.
Run the script from the command line:
python run_inference.pyIf you find this work to be useful for your research, please consider citing:
@article{improta2026reading,
title={Reading between the Lines: Context-Aware AI-based generation of software exploits},
author={Improta, Cristina and Liguori, Pietro and Natella, Roberto and Cukic, Bojan and Cotroneo, Domenico},
journal={Empirical Software Engineering},
volume={31},
number={3},
pages={60},
year={2026},
publisher={Springer}
}
For further information, contact us via email: cristina.improta@unina.it (Cristina) and pietro.liguori@unina.it (Pietro).
