AI-Powered Token Auditor • ML Deployer Reputation Engine • Etherscan V2 Integration • Solidity Registries
The On-Chain Security Suite is a complete, end-to-end Web3 security pipeline.
It combines:
- Static analysis of token contracts (rugpull pattern detection)
- Machine learning–based reputation scoring for deployer addresses
- Etherscan V2 integration for fetching real on-chain data
- Solidity registries to store token audits and deployer scores on-chain
This project is designed as:
- A portfolio-quality security research project
- A practical toolkit for analysts and developers
- A template for building more advanced Web3 security systems
- Motivation & Problem Statement
- High-Level Overview
- Core Components
- Architecture
- Folder Structure
- Installation & Setup
- Token Risk Auditor (Static Analyzer)
- Deployer Reputation Engine (ML Model)
- Machine Learning Training Pipeline
- Etherscan V2 Integration
- Solidity Registries
- CLI Tools & Workflows
- End-to-End Example Flow
- Assumptions & Limitations
- Ideas for Extension
- Roadmap
- License
The token ecosystem on Ethereum and EVM-compatible chains is:
- Fast-moving
- Permissionless
- Filled with both innovation and scams
Common problems:
- Rugpulls, the owner drains liquidity or mints massive supply
- Honeypots, you can buy but not sell
- Blacklist-based traps, certain addresses are silently blocked
- Tax manipulation, “fair” token suddenly applies massive fees
- Bad actors, deployers who keep launching scams
Humans cannot manually review every contract. We need tools that:
- Read Solidity source code
- Detect patterns associated with malicious behavior
- Aggregate deployer history
- Use ML to estimate how risky a deployer is
- Integrate with real on-chain data (Etherscan)
- Optionally store results on-chain for transparency
That’s exactly what this suite does.
The project has three main layers:
-
Token-Level Analysis
- Reads Solidity token contracts
- Extracts risk features
- Produces a risk score and label
-
Deployer-Level Reputation
- Aggregates all tokens deployed by an address
- Uses a machine learning model to estimate “maliciousness probability”
- Outputs a trust score and human-readable label
-
Integration & Transparency Layer
- Uses Etherscan V2 to discover deployed contracts and fetch source code
- Provides Solidity registries to store audit and reputation results on-chain
This makes it possible to:
- Analyze local contracts
- Analyze real world deployers
- Train and use real ML models
- (If desired) publish security results on-chain.
-
src/token_auditor
Static analysis for smart contracts (rugpull detection). -
src/reputation
Feature extraction and ML-based scoring for deployers. -
src/etherscan_integration
Etherscan V2 API client for fetching real on-chain history. -
src/ml
Machine learning training + model utilities. -
contracts/
Two Solidity contracts for on-chain storage of audits & reputation. -
data/
Example token contracts and a synthetic deployer dataset. -
artifacts/
Trained ML models (deployer_model.joblib).
The On-Chain Security Suite consists of three major layers:
- Token-Level Analysis
- Deployer-Level ML Reputation Scoring
- Blockchain Integration (Etherscan V2 + Solidity Registries)
Below is the full architecture diagram:
┌───────────────────────────┐
│ Solidity Token Contract │
└──────────────┬────────────┘
│
▼
┌─────────────────────────────┐
│ Static Token Auditor │
│ - Regex feature extraction │
│ - Risk scoring ruleset │
└──────────────┬──────────────┘
│
Token risk label ────┘
│
▼
┌─────────────────────────────────────────────┐
│ Deployer History (Local or Etherscan) │
│ - List of deployed contract addresses │
│ - Token risk scores/labels per contract │
└──────────────┬──────────────────────────────┘
│
▼
┌────────────────────────────────┐
│ Deployer Feature Aggregator │
│ - n_safe │
│ - n_suspicious │
│ - n_rugpull │
│ - frac_safe / frac_rugpull │
└──────────────┬─────────────────┘
│
▼
┌──────────────────────────┐
│ ML Deployer Reputation │
│ (RandomForest Model) │
│ - P(bad deployer) │
│ - Trust Score (0–100) │
│ - Risk Class │
└──────────────┬───────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Final Reputation Results │
│ { score, risk_class, label, features } │
└──────────────┬──────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Optional On-Chain Registries (Solidity) │
│ │
│ TokenAuditRegistry.sol DeployerReputationRegistry.sol │
│ - Report token risk - Store deployer trust score │
│ - Store details JSON - Expose transparent on-chain data │
└──────────────────────────────────────────────────────────────────────┘
▲
│
┌────────────────┴────────────────┐
│ Etherscan V2 Integration │
│ - Fetch deployer tx list │
│ - Fetch contract source code │
│ - Auto-classify tokens │
└─────────────────────────────────┘
And Etherscan V2 is used to auto-generate the “Deployer History (JSON)” layer by discovering and classifying real contracts.
onchain-security-suite/
│
├── contracts/
│ ├── TokenAuditRegistry.sol
│ └── DeployerReputationRegistry.sol
│
├── data/
│ ├── tokens/
│ │ ├── safe_token_1.sol
│ │ ├── rugpull_token_1.sol
│ │ └── suspicious_token_1.sol
│ └── deployers_example.json
│
├── src/
│ ├── token_auditor/
│ │ ├── features.py
│ │ ├── model.py
│ │ └── classify.py
│ │
│ ├── reputation/
│ │ ├── features.py
│ │ ├── model.py
│ │ └── classify.py
│ │
│ ├── etherscan_integration/
│ │ ├── fetcher.py
│ │ └── build_history.py
│ │
│ ├── ml/
│ │ ├── train_deployer_model.py
│ │ └── model_utils.py
│ │
│ ├── cli_token.py
│ ├── cli_deployer.py
│ └── cli_fetch_and_score.py
│
├── artifacts/
│ └── deployer_model.joblib # created after ML training
│
├── requirements.txt
└── README.md
# 1. Clone repo
git clone https://github.com/AmirhosseinHonardoust/onchain-security-suite.git
cd onchain-security-suite
# 2. Create virtual environment (recommended)
python -m venv .venv
.\.venv\Scripts\activate # on Windows
# 3. Install dependencies
pip install -r requirements.txtDependencies include:
requests, for Etherscan V2 APIscikit-learn, for ML modeljoblib, for saving/loading models
Analyze a single token contract (ERC-20–style) and estimate:
- How dangerous its logic is
- Whether it contains classic rugpull mechanics
- A numeric risk score + qualitative label
The token auditor focuses on structural and semantic red flags, such as:
-
Owner Minting
function mint(uint256 amount) public onlyOwner { ... }
- Red flag: Owner can unilaterally increase supply → dumping risk.
-
General Mint Functions
function mint(address to, uint256 amount) external { ... }
- Without clear access control, this is dangerous.
-
Fee Manipulation
function setFee(uint256 _newFee) external onlyOwner { ... }
- Allows future tax changes (from 5% to 90% after listing).
-
Blacklisting / Whitelisting
mapping(address => bool) public isBlacklisted;
- Can be used to trap specific users.
-
Trading Locks
bool public tradingOpen;
- If the owner controls this flag, they can freeze trading.
-
Max Transaction Limits (maxTx)
uint256 public maxTxAmount;
- Can be used to prevent selling or force tiny sells only.
These patterns are implemented as regex rules in token_auditor/features.py.
Example (simplified):
features = {
"n_lines": 143.0,
"n_public": 6.0,
"n_external": 1.0,
"has_mint": 1.0,
"has_owner_mint": 1.0,
"has_set_fee": 1.0,
"has_blacklist": 0.0,
"has_trading_lock": 1.0,
"has_max_tx": 0.0,
}We capture:
-
Structural features:
n_lines: lines of coden_public: number of occurrences ofpublicn_external: number of occurrences ofexternal
-
Pattern features:
has_owner_minthas_set_feehas_blacklisthas_trading_lockhas_max_tx
The core idea is a weighted feature sum:
Where:
Example weighting:
has_owner_mint→ +40has_mint(non-owner) → +20has_set_fee→ +25has_blacklist→ +20has_trading_lock→ +25has_max_tx→ +15n_lines > 800→ +15n_lines > 300→ +8
The final score is clamped to [0, 100].
0–20→ Low, labelsafe21–60→ Medium, labelsuspicious61–100→ High, labelrugpull_candidate
So the auditor output looks like:
{
"file": "rugpull_token_1.sol",
"features": { ... },
"risk_score": 100,
"risk_level": "High",
"label": "rugpull_candidate"
}python -m src.cli_token --file data/tokens/rugpull_token_1.solA single token is not the whole story. The deployer might have:
- a history of safe tokens
- a history of multiple rugpulls
- mixed behavior
We aggregate all tokens they deployed and compute higher-level features.
For each deployer address, we track:
n_contracts, number of known deployed contractsn_safe, how many were labeledsafen_suspiciousn_rugpull, labeledrugpull_candidatefrac_safe=n_safe / n_contractsfrac_rugpull=n_rugpull / n_contracts
This is produced by:
src/reputation/features.py
For training the initial model, we use a simple, interpretable rule:
- If
n_rugpull >= 2→ label1(bad deployer) - Else → label
0(good/neutral deployer)
This is not “ultimate truth” but an intuitive starting point.
We use:
RandomForestClassifiern_estimators = 200class_weight = "balanced"
Why RandomForest?
- Handles mixed numeric features well
- Gives feature importances
- Robust to outliers
- Easy to interpret and explain
We convert that to a risk_score and label:
prob_bad = ml_score(features)
risk_score = int(round(prob_bad * 100))Mapping:
risk_score <= 25→Low,trusted26–60→Medium,watchlist61–100→High,high_risk
So the final result per deployer is:
{
"features": { ... },
"score": 87,
"risk_class": "High",
"label": "high_risk"
}The training script is:
src/ml/train_deployer_model.py
It does:
- Load
data/deployers_example.json - Aggregate features per deployer
- Build
X(features) andy(labels) - Train RandomForest
- Save model to
artifacts/deployer_model.joblib - Print feature importances
python -m src.ml.train_deployer_modelYou must do this once before using the ML-based deployer reputation, so that artifacts/deployer_model.joblib exists.
The Etherscan integration lives in:
src/etherscan_integration/fetcher.pysrc/etherscan_integration/build_history.py
It uses the new Etherscan V2 endpoint:
https://api.etherscan.io/v2/api
with parameters:
chainid(1 for Ethereum mainnet)moduleactionaddressapikey
We use:
module = account
action = txlist
This returns all transactions related to an address.
We then filter for those with a contractAddress field, these correspond to contracts that were created (i.e. deployed).
For each contractAddress, we use:
module = contract
action = getsourcecode
If the contract is verified on Etherscan:
- We get a
SourceCodefield with Solidity code - We pass this into
token_auditor.audit_source()
If not verified:
- We skip it (no source to analyze)
The function:
build_history_for_deployer(deployer, api_key)performs:
-
Fetch txs by deployer
-
Filter contract creation txs
-
For each contract:
- fetch source code
- run token auditor
- store
address,label,risk_score,risk_level
Output structure:
{
"0xDEPL...": {
"contracts": [
{
"address": "0xCONTRACT1",
"label": "rugpull_candidate",
"risk_score": 90,
"risk_level": "High"
},
...
]
}
}save_history() writes this to a JSON file so it can be used by the ML reputation engine.
This contract stores audits per token (by bytes32 tokenId, which could be a hash of the token address).
Fields stored:
score(0–100)level(Low,Medium,High)label("rugpull_candidate")detailsJson(optional, full feature set)auditor(who submitted the result)timestamp
It exposes:
submitAudit(tokenId, score, level, label, detailsJson)getAudit(tokenId)
This enables explorers or DApps to query the latest audit info for a token.
This contract stores deployer reputation:
score(0–100)riskClass(Low,Medium,High)label(trusted,watchlist, etc.)numContractslastUpdatedupdater(who wrote the entry)
It exposes:
updateReputation(deployer, score, riskClass, label, numContracts)getReputation(deployer)
Your Python tooling could be extended to push ML-derived scores on-chain.
python -m src.cli_token --file data/tokens/rugpull_token_1.solUse this when:
- You have a local token contract file
- You want a quick static risk assessment
python -m src.cli_deployer --data data/deployers_example.jsonOr, for a single deployer:
python -m src.cli_deployer --data data/deployers_example.json --deployer 0xDEADDEAD...Use this when:
- You already have a JSON mapping
deployer → contracts + labels - You want to test the ML scoring independently of Etherscan
python -m src.cli_fetch_and_score --deployer <0xDEPL...> --api_key <YOUR_ETHERSCAN_API_KEY>This does:
-
[1/2]Fetch contracts by deployer from Etherscan V2 -
For each verified contract:
- fetch source
- run token auditor
- store result in
data/deployer_<address>.json
-
[2/2]Run ML-based reputation scoring using that JSON -
Print final score + label for that deployer
Full pipeline:
-
Train ML model (once):
python -m src.ml.train_deployer_model
-
Audit a local token:
python -m src.cli_token --file data/tokens/suspicious_token_1.sol
-
Score example deployers (offline data):
python -m src.cli_deployer --data data/deployers_example.json
-
Fetch + analyze a real deployer from Etherscan:
python -m src.cli_fetch_and_score --deployer 0xYOURDEPLOYER --api_key YOUR_API_KEY
-
(Optional) Upload scores to on-chain registry via Remix / Hardhat scripts.
This suite is powerful, but not magic. Key limitations:
-
Static Analysis Only
-
No runtime simulation, no mempool analysis
-
Cannot detect dynamic behavior like:
- Liquidity removal
- Price manipulation
- MEV attacks
-
-
Heuristic Token Labeling
- The token risk model is rule-based
- Some legitimate contracts might have “dangerous-looking” features (false positives)
-
Synthetic Training Data
-
Initial ML model uses synthetic / example data
-
For production use, you should:
- Collect real deployer histories
- Use ground-truth scam labels
-
-
Etherscan Constraints
- Only works for verified contracts
- Subject to Etherscan rate limits and API key tier
These limitations are explicitly documented so the project is realistic and honest.
Some natural extensions you can build next:
- Use AST-based parsing instead of regex
- Attach SHAP to the RandomForest model for explainable reputation
- Add token similarity clustering (token “families”)
- Integrate CodeBERT/LLM-based code embeddings
- Support multiple chains (BSC, Polygon, Arbitrum, Base) with other explorers
- Build a small web dashboard using FastAPI + React
- Build an on-chain oracle that serves reputation scores to DApps
You can treat this suite as the core engine for a future Web3 security product.
- Multichain explorer integrations
- More complex ML labeling logic
- Deployable Docker image
- Web dashboard visualization
- Integration with wallets (warn user on high-risk tokens)
- Batch scanning of new token deployments
- Automatic push to DeployerReputationRegistry on each score update
This project is released under the MIT License. You are free to use it, modify it, and build on top of it.
Contributions are welcome. Ideas:
- New risk patterns for the token auditor
- Better ML models or ensembles
- Real-world datasets (scrubbed & anonymized)
- Documentation improvements
- Hardhat deployment scripts
- Visualization tools
Open an issue or PR to discuss changes.
The On-Chain Security Suite demonstrates how to combine:
- Smart-contract understanding
- Pattern-based security
- Machine learning
- Live blockchain data
- On-chain registries
into a cohesive, explainable, and practical Web3 security toolkit.