🚀 SmartGuard: A Multi-Stage Vulnerability Detection Framework for Smart Contracts

Overview

Smart contracts, powered by blockchain technology, have revolutionized decentralized applications by enabling trustless and tamper-proof execution of agreements. However, their immutable nature makes them particularly susceptible to vulnerabilities, which can lead to significant financial losses if exploited. Identifying and mitigating these vulnerabilities before deployment is thus a critical challenge in the blockchain ecosystem.

SmartGuard is a multi-stage vulnerability detection framework designed to address this challenge. By leveraging advanced machine learning techniques and publicly available datasets, this project aims to detect vulnerabilities in smart contract code with high accuracy and robustness.

Features

Multi-Stage Detection: A hierarchical architecture comprising Detector, Reasoner, and Verificator for accurate and interpretable vulnerability detection.
Custom Data Splitting: Ensures balanced distribution of vulnerabilities across training and testing datasets.
Feature Extraction: Combines CodeBERT, Longformer, and CodeT5 for semantic and syntactic feature extraction.
Streamlit Web Application: An interactive interface for exploring datasets, preprocessing steps, and model results.

Methodology

Data Preprocessing

The preprocessing phase involves:

Removing missing and duplicate values.
Consolidating multiple entries of raw code with different vulnerabilities into single entries.
Transforming the label-encoded column into an array to accommodate multiple vulnerabilities.

To explore effective code representations, we initially attempted to convert Solidity code into both OPCODE and bytecode formats. However, due to resource and time constraints, we opted to preprocess the code by removing comments and newline characters. This simplified representation was used as features for model training.

Feature Extraction

We explored different strategies for feature extraction:

Code: Used CodeBERT for semantic and syntactic feature extraction.
OPCODE: Extracted key parts of the code and applied LSTM or Transformer for feature extraction.
Bytecodes: Combined CodeBERT-processed code with LSTM or Transformer-processed bytecodes, concatenating their output vectors for model training.

Ultimately, we focused on CodeBERT for feature extraction. However, its max_length=512 limitation led to truncation of tokens exceeding 510. To address this, we incorporated Longformer (supporting up to 4,096 tokens) and experimented with CodeT5, an encoder-decoder model. All models output 768-dimensional vectors, providing a rich foundation for downstream vulnerability detection.

Custom Data Splitting

The multi-label nature of the data prevents standard stratified sampling. To ensure balanced distribution of vulnerabilities, we developed a custom data-splitting solution that divides the dataset into 80% training and 20% testing sets, closely matching the target proportions of nine vulnerabilities in both subsets.

Model Training

The framework consists of three stages:

VulnScreener: A binary classifier (MLP) that determines the presence of vulnerabilities.
VulnAnalyzer: A CNN that identifies specific vulnerability types.
VulnValidator: A Random Forest model that refines the Reasoner’s outputs for improved accuracy.

Experiments and Results

Model performance is evaluated using a suite of metrics—Confusion Matrix, Accuracy, Recall, Precision, and F1-score—enabling a comprehensive comparison of prediction outcomes across the different code representations and stages. Through this project, we aim not only to achieve high detection accuracy but also to provide insights into the efficacy of various code formats and model architectures for smart contract security. Finally, we discuss potential avenues for future improvement, such as incorporating additional datasets, refining feature extraction techniques, or exploring ensemble methods to further enhance detection capabilities. SmartGuard represents a step forward in building secure and trustworthy smart contract ecosystems.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

For setting up the environment using Anaconda, please refer to the Streamlit installation guide.

Download the Project

To download the project, clone the repository using:

git clone https://github.com/Rita94105/Smart_Contract_Vulnerability_Detector

Running the Project

To run the Streamlit application, execute:

streamlit run app.py

Directory Structure

app.py
home.py
README.md
requirements.txt
.streamlit/
    .streamlit/pages.toml
conclusion/
    conclusion/future.py
    conclusion/metrics.py
data/
    data/EDA.py
    data/explore.py
feature/
    feature/code.py
    feature/split.py
model/
    model/VulnScreener.py
    model/overall.py
    model/VulnAnalyzer.py
    model/VulnValidator.py

License

This project is licensed under the MIT License.

Acknowledgements

Datasets from Kaggle and Hugging Face
Streamlit for the web application framework
Contributors and developers of the libraries used in this project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 SmartGuard: A Multi-Stage Vulnerability Detection Framework for Smart Contracts

Overview

Features

Methodology

Data Preprocessing

Feature Extraction

Custom Data Splitting

Model Training

Experiments and Results

Installation

Download the Project

Running the Project

Directory Structure

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.streamlit		.streamlit
CSV		CSV
conclusion		conclusion
data		data
feature		feature
imgs		imgs
model		model
py		py
.gitignore		.gitignore
README.md		README.md
app.py		app.py
background.py		background.py
home.py		home.py
requirements.txt		requirements.txt

Rita94105/Smart_Contract_Vulnerability_Detector

Folders and files

Latest commit

History

Repository files navigation

🚀 SmartGuard: A Multi-Stage Vulnerability Detection Framework for Smart Contracts

Overview

Features

Methodology

Data Preprocessing

Feature Extraction

Custom Data Splitting

Model Training

Experiments and Results

Installation

Download the Project

Running the Project

Directory Structure

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages