Bastet is a comprehensive dataset of common smart contract vulnerabilities in DeFi along with an AI-driven automated detection process to enhance vulnerability detection accuracy and optimize security lifecycle management.
Bastet covers common vulnerabilities in DeFi, including medium- to high-risk vulnerabilities found on-chain and in audit competitions, along with corresponding secure implementations. It aims to help developers and researchers gain deeper insights into vulnerability patterns and best security practices.
In addition, Bastet integrates an AI-driven automated vulnerability detection process. By designing tailored detection workflows, Bastet enhances AI's accuracy in identifying vulnerabilities, with the goal of optimizing security lifecycle managementβfrom development and auditing to ongoing monitoring.
We strive to improve overall security coverage and warmly welcome contributions of additional vulnerability types, datasets, or improved AI detection methodologies. Please refer here to join and contribute to the Bastet dataset. Together, we can drive the industry's security development forward.
To download the dataset here
Bastet/
βββ cli/ # Python CLI package
β βββ __init__.py
β βββ main.py # CLI entry point
β βββ commands/ # CLI commands
β β βββ <module>/
β β β βββ __init__.py # CLI routing only, logic will define below
β β β βββ <function>.py
β βββ models/ # Interfaces for python type check
β β βββ <SAAS>/
β β β βββ __init__.py # For output all models in SAAS
β β β βββ <function>.py
β β βββ audit_report.py # Main Interface of output in Bastet
βββ dataset/ # dataset location
β βββ reports/ # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
β β βββ <reports>/
β βββ repos/ # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
β β βββ <repos>/
β βββ dataset.csv # dataset sheet, provide ground truth. (should be clone from google drive)
β βββ README.MD # Basic information of the dataset
βββ n8n_workflows/ # n8n workflow files
β βββ <file>.json # workflow for analyzing the smart contracts
βββ docker-compose.yaml
βββ README.md
βββ poetry.lock
βββ pyproject.toml
βββ .gitignore
- Recursive scanning of
.solfiles in specified directories - Automatic database creation and schema setup
- Integration with n8n workflows via webhooks
- Detailed processing summary and error reporting
- Results stored in PostgreSQL for further analysis
- A dataset for evaluate the prompt
- A cli interface to trigger evaluate workflow
- Python file formatter: Black
Prerequisites
- Python 3.10 or higher
- Docker installed on your machine
- Docker Compose installed on your machine
- Poetry for package management, if you want to follow our instruction the version should> 2.0.1
Installation Steps
Video tutorial
- Setup Python environment:
# Initialize virtual environment and install dependencies
poetry install
eval $(poetry env activate) # or `source .venv/bin/activate`- Configure environment variables in
.env:
cp .env.example .envUpdate the environment variables in .env file if needed.
- Start n8n and database:
docker-compose -f ./docker-compose.yml up -d-
Access the n8n dashboard, Open your browser and navigate to
http://localhost:5678 -
(First time only) Setup owner account, activate free n8n pro features
-
Click the user icon at the bottom left β Settings β Click the n8n API in the sidebar β Create an API key β Label fill Bastet β Expiration select "No Expiration" (If you want to set an expiration time, select it) β Copy the API key and paste it to
N8N_API_KEYin.envfile, because the API key will not be visible after creation, you can only create it again β Click Done. -
Back to the homepage (http://localhost:5678/home/workflows)
-
Click Create Credential in the arrow button next to the Create Workflow button β Fill in "OpenAi" in the input β You will see "OpenAi" and select it, click Continue β API Key fill your OpenAi API key, Create OpenAi credentials, and copy the value of the ID field and paste it to
N8N_OPENAI_CREDENTIAL_IDin.envfile. -
Import the workflow by executing the following code
Before the setup, make sure you fill the N8N_API_KEY, N8N_OPENAI_CREDENTIAL_ID in .env file.
poetry run python cli/main.py initYou will see the all workflows we provided currently. (default activated, if you want to skip some workflow, please deactivate it in n8n (http://localhost:5678/home/workflows)
If you appreciate our work and would like to support what weβre building, even a small contribution means a lot. πππ Your support helps us keep moving forward! Letβs make Web3 better together.
Donation Address: 0xb2BecD73347EDE268bb1A9Ff785015f3cdC83F2d
We accept donations on the following chains:
- Ethreum,
- Base
- BNB Chain
- Arbitrum
To fetch verified contracts from Etherscan by address, including all imported dependencies, first obtain
an Etherscan API key from the API Dashboard and add it to ETHERSCAN_API_KEY in .env file.
Then, run the following command to download the verified contract source code (currently support the Ethereum mainnet only):
poetry run python cli/main.py fetch --address <CONTRACT_ADDRESS>The downloaded source code will be stored in dataset/onchain-sources/<CONTRACT_ADDRESS>. Users can select the files they need for further processing or analysis.
β οΈ Important: The use of data obtained through the Etherscan API is subject to Etherscanβs API Terms of Service. Users should ensure compliance when handling downloaded contract sources.
The main script scan will recursively scan all .sol files in the specified directory:
poetry run python cli/main.py scan
# or
poetry run python cli/main.py scan --output-format csvBy default, the scan will process all contracts in the dataset/scan_queue directory using all workflows that you have activated by turning on their respective switch buttons, and generate a .csv file containing a spreadsheet-friendly summary of all detected vulnerabilities. The report will be saved in the scan_report/ directory.
You can customize the output using the --output-format option, supporting multiple formats separated by commas.
# Example: generate json and md
poetry run python cli/main.py scan --output-format json,md
# Example: generate all formats
poetry run python cli/main.py scan --output-format all- csv : Generates a CSV file for quick analysis in spreadsheet tools.
- json : Outputs structured data suitable for automation or further processing.
- md : Creates a human-readable Markdown summary report.
- pdf : Exports a printable PDF report.
- all : Generates all of the above formats: csv, json, md, and pdf.
you can use flag
--helpfor detail information of flag you can use
- Go into the workflow you want to scan.
- Click the Chat button on the bottom and input the contract content.
- import the workflow you want to evaluate
The output of the workflow need to follow the following json schema.
{
"type": "array",
"items": {
"type": "object",
"properties": {
"summary": {
"type": "string",
"description": "Brief summary of the vulnerability"
},
"severity": {
"type": "string",
"items": {
"type": "string",
"enum": ["high", "medium", "low"]
},
"description": "Severity level of the vulnerability"
},
"vulnerability_details": {
"type": "object",
"properties": {
"function_name": {
"type": "string",
"description": "Function name where the vulnerability is found"
},
"description": {
"type": "string",
"description": "Detailed description of the vulnerability"
}
},
"required": ["function_name", "description"]
},
"code_snippet": {
"type": "array",
"items": {
"type": "string"
},
"description": "Code snippet showing the vulnerability",
"default": []
},
"recommendation": {
"type": "string",
"description": "Recommendation to fix the vulnerability"
}
},
"required": [
"summary",
"severity",
"vulnerability_details",
"code_snippet",
"recommendation"
]
},
"additionalProperties": false
}The trigger point should be a webhook and this workflow should be activated (by clicking the switch at n8n home page)
You may refer
n8n_workflow/slippage_min_amount.json
-
download the latest dataset.zip and the dataset.csv from here
-
unzip the dataset.zip in the ./dataset and the folder structure should look like this
dataset/ # dataset location
βββ reports/ # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
β βββ <reports>/
βββ repos/ # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
β βββ <repos>/
βββ dataset.csv # dataset sheet, provide ground truth. (should be clone from google drive and renamed to `dataset.csv`)
βββ README.MD # Basic information of the dataset
- run the command
poetry run python cli/main.py evalyou can use flag
--helpfor detail information of flag you can use
-
import
slippage_min_amount.jsonto your n8n service. -
provide the openAI credential for the workflow
slippage_min_amountyou just create. -
make the workflow active
-
download the latest dataset.zip and the dataset.csv from here
-
unzip the dataset.zip in the ./dataset and the folder structure should look like this
dataset/ # dataset location
βββ reports/ # will be unzipped from the dataset.zip provide in google drive -> audit reports of the projects
β βββ <reports>/
βββ repos/ # will be unzipped from the dataset.zip provide in google drive -> codebase of the projects
β βββ <repos>/
βββ dataset.csv # dataset sheet, provide ground truth. (should be clone from google drive and renamed to `dataset.csv`)
βββ README.MD # Basic information of the dataset
- run
poetry run python cli/main.py evalyou shell get the confusion metrics. like this
+----------------+---------+
| Metric | Value |
+================+=========+
| True Positive | 16 |
+----------------+---------+
| True Negative | 27 |
+----------------+---------+
| False Positive | 2 |
+----------------+---------+
| False Negative | 13 |
+----------------+---------+
Note: the number shell be difference since the answer of LLM model is not stable, the answer here is created by gpt-4o-mini
Bastet supports automated CI/CD workflows for both GitHub and GitLab, enabling seamless integration into your development pipeline.
You can find example CI/CD configurations in .exmaple.github/action and .exmaple.github/workflows directories of this repository. Use these as references to build your own custom CI/CD pipeline for Bastet in GitLab. Adjust stages, environment variables, and workflow steps as needed for your project requirements.
You may customize which vulnerability you want to detect in .exmaple.github/action/action.yml
docker-compose -f docker-compose.cicd.yml exec -T bastet \
bash -c "echo 'all' | poetry run python /app/cli/main.py init --n8n-url http://n8n:5678"Add a stage to your .gitlab-ci.yml file, follow the .example.gitlab-ci.yml
These templates will automatically run Bastet scans on your smart contracts whenever you push changes or open merge requests. Customize the workflow as needed for your project.
You may customize which vulnerability you want to detect in .example.gitlab-ci.yml
docker-compose -f docker-compose.cicd.yml exec -T bastet
bash -c "echo 'all' | poetry run python /app/cli/main.py init --n8n-url http://n8n:5678"| Date | Conference Name | Topic | Slide |
|---|---|---|---|
| 2025-04-02 | ETH TAIPEI 2025 | Exploring AIβs Role in Smart Contract Security | ETH-TAIPEI-2025 |
| 2025-04-17 | CyberSec 2025 | AI-Driven Smart Contract Vulnerability Detection | CyberSec-2025 |
| 2025-08-09 | COSCUP 2025 | AI x Smart Contract: What Static Analysis Tools Can't Do, Leave It to Prompt Engineering! | COSCUP-2025 |
Bastet is for research and educational purposes only. Anyone who discovers a vulnerability should adhere to the principles of Responsible Disclosure and ensure compliance with applicable laws and regulations. We do not encourage or support any unauthorized testing, attacks, or abusive behavior, and users assume all associated risks.
Apache License 2.0




