May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs
FUEL (Feedback-driven fUzzing for dEep Learning frameworks via LLMs) is an advanced deep learning (DL) framework fuzzing tool designed to detect bugs in mainstream DL frameworks such as PyTorch and TensorFlow. FUEL combines the powerful generation LLM with the analysis LLM to fully leverage feedback information during the fuzzing loop, generating high-quality test cases to discover potential bugs in DL frameworks. Additionally, FUEL features a feedback-aware simulated annealing algorithm and program self-repair strategy, improving model diversity and validity, respectively.
- ๐ค Intelligent Code Generation: Leverages Large Language Models to generate complex and effective deep learning model test cases
- ๐ Feedback-Driven: Smart feedback mechanism based on code coverage, bug reports, and exception logs to continuously optimize test generation strategies via LLMs
- โค๏ธโ๐ฉน Program Self-Repair: Automatically distinguishes between framework bugs and invalid test cases, then intelligently repairs invalid models using LLM-guided analysis
- ๐ Heuristic Search: Integrates heuristic algorithms like Feedback-Aware Simulated Annealing (FASA) for intelligent API operator selection
- ๐ฌ Differential Testing: Supports multiple differential testing modes (hardware differences, compiler differences, etc.)
- ๐ Efficient Detection: Successfully discovered 104 new bugs, with 93 confirmed and 49 fixed
- โ Support for PyTorch and TensorFlow framework testing
- โ Multiple differential testing modes (CPU/CUDA hardware differences, compiler differences)
- โ Intelligent operator selection and combination
- โ Real-time code coverage feedback
- โ Exception detection and bug report generation
- โ Configurable LLM backends (local models/API services)
FUEL/
โโโ ๐ config/ # Configuration files
โ โโโ als_prompt/ # Analysis prompt configurations
โ โโโ gen_prompt/ # Generation prompt configurations
โ โโโ heuristic.yaml # Heuristic algorithm configuration
โ โโโ model/ # LLM model configuration
โโโ ๐ data/ # Data files
โ โโโ pytorch_apis.txt # PyTorch API list
โ โโโ tensorflow_apis.txt # TensorFlow API list
โโโ ๐ fuel/ # Core source code
โ โโโ difftesting/ # Differential testing module
โ โโโ exec/ # Code execution module
โ โโโ feedback/ # Feedback mechanism module
โ โโโ guidance/ # Heuristic search module
โ โโโ utils/ # Utility classes
โโโ ๐ experiments/ # Experiment and evaluation scripts
โโโ ๐ results/ # Test result outputs
Important
General test-bed requirements
- OS: Ubuntu >= 20.04;
- CPU: X86/X64 CPU;
- GPU: CUDA architecture (V100, A6000, A100, etc.);
- Memory: 128GB GPU Memory available (if you use 72B local model with vLLM);
- Storage: at least 100GB Storage available;
- Network: Good Network to GitHub and LLM API service;
You need a DeepSeek API key to invoke the DeepSeek API service (of course, you can modify the configuration in ./config/model.yaml)
git clone https://github.com/NJU-iSE/FUEL.git
cd FUEL
Firstly, we should install some necessary Python dependencies. We strongly recommend users use uv to manage the Python environments. Please follow the commands below.
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# sync the dependencies at the root directory
uv sync
# activate the environment
source .venv/bin/activate
When fuzzing the systems under tests (SUTs), we use the nightly version, in order to detect new bugs.
Here we use CUDA 12.6 as an example. Please install the nightly version based on your CUDA version. You can get the corresponding commands from https://pytorch.org/
UV_HTTP_TIMEOUT=180 uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126
In our experiment, we use DeepSeek API to invoke the LLM service. DeepSeek API service is compatible with openai interfaces.
For the below command, you should replace [YOUR_API_KEY]
with your own DeepSeek API key.
key="[YOUR_API_KEY]"
echo "$key" > ./config/deepseek-key.txt
Warning
The fuzzing process is time-consuming and may run for many hours to discover meaningful bugs.
python -m fuel.fuzz --lib pytorch run_fuzz \
--max_round 1000 \
--heuristic FASA \
--diff_type cpu_compiler
๐ Parameter Description:
--lib
: Target deep learning library (pytorch
ortensorflow
)--max_round
: Maximum number of testing rounds--heuristic
: Heuristic algorithm (FASA
,Random
, orNone
)--diff_type
: Differential testing type (hardware
,cpu_compiler
,cuda_compiler
)
Note that the fuzzing experiment is really time-consuming. Maybe you should check the results after about ~20hours.
Please check the generated models in results/fuel/pytorch
.
If you want to get the detected bugs, please check outputs/bug_reports.txt
.
Warning
These advanced features are not fully tested and are prone to instability. We will continue improving our artifact.
python -m fuel.fuzz --lib pytorch run_fuzz \
--use_local_gen \
--max_round 1000 \
--heuristic FASA
python -m fuel.fuzz --lib pytorch run_fuzz \
--op_set data/custom_operators.txt \
--op_nums 8 \
--max_round 1000
bash coverage.sh
So far, FUEL has detected 104 previously unknown new ๐bugs, with 93 already ๐ฅฐconfirmed and 49 already ๐ฅณfixed. 14 detected bugs were labeled as ๐จhigh-priority, and one was labeled as ๐คฏutmost priority. 14 detected bugs has been assigned ๐CVE IDs. The evidence can be found in Google Sheet.
- Shaoyu Yang: core developer
- Haifeng Lin: core developer
- Chunrong Fang: supervisor
We thank NNSmith, TitanFuzz, and WhiteFox for their admirable open-source spirit, which has largely inspired this work.