ML-Powered Token Launch Auditor is a security-focused toolkit that analyzes ERC-20 style token smart contracts and produces:
- A numeric risk score (0–100)
- A risk level:
Low,Medium, orHigh - A semantic label:
safesuspiciousrugpull_candidate
- A feature breakdown explaining why the score was assigned
Under the hood, the project performs:
- Static feature extraction from Solidity source code
- A heuristic, ML-inspired scoring model over those features
- A clean JSON output suitable for:
- dashboards
- further ML training
- logs / SIEM
- CI pipelines
Optionally, it also includes a Solidity registry contract that can store audit results on-chain.
This project is designed to be:
- Educational, easy to read and extend
- ML-ready, feature-based, not just one-off rules
- Security-focused, centered on real token scam patterns
- Practical, CLI interface, sample tokens, ready to run
Token launches are one of the most common attack surfaces in Web3:
- Hidden owner mint functions → infinite supply → rugpulls
- Blacklists & trading locks → honeypot behavior (you can buy but not sell)
- Dynamic fee setters → stealth tax updates
- MaxTx / MaxWallet → anti-sell or anti-whale mechanics
Most retail users cannot read Solidity and are unable to evaluate:
“Can the owner mint extra tokens?”
“Can they silently turn on a 99% tax?”
“Can they freeze trading whenever they want?”
This project provides a first line of static defense:
- It reads the Solidity source
- Extracts security-relevant patterns
- Computes a risk score
- Explains the features used
It is not a formal auditor, but it captures many common scam patterns and provides a strong foundation to build more advanced ML-based security tools.
ml-token-launch-auditor/
│
├── contracts/
│ └── TokenAuditRegistry.sol # Optional: on-chain storage of audit results
│
├── src/
│ ├── analyzer/
│ │ ├── features.py # Extracts token features from Solidity source
│ │ ├── model.py # Heuristic scoring model over features
│ │ └── classify.py # High-level `audit_token()` function
│ └── cli.py # Command-line interface for auditing
│
├── data/
│ └── tokens/
│ ├── safe_token_1.sol # Example of a safe token
│ ├── rugpull_token_1.sol # Example of a rugpull-style token
│ └── suspicious_token_1.sol # Example of a suspicious but not obvious token
│
├── requirements.txt # Placeholder for Python dependencies
└── README.md # This file
cd /path/where/you/want
git clone https://github.com/AmirhosseinHonardoust/ml-token-launch-auditor.git
cd ml-token-launch-auditor(or just copy the folder you already have into your repo)
Windows:
python -m venv .venv
.\.venv\Scripts\activateLinux/macOS:
python3 -m venv .venv
source .venv/bin/activateIf successful, you should see (.venv) at the start of your terminal prompt.
For the current heuristic version, there are no heavy dependencies:
pip install -r requirements.txtYou can keep using only the Python standard library. If you later add ML models, you’ll add packages such as:
scikit-learnjoblibpandasweb3
-
User provides a Solidity file Example:
data/tokens/rugpull_token_1.sol -
The tool:
- Reads the source code as text
- Applies regex-based pattern detection
- Extracts a fixed set of features
-
The scoring model combines features into a numeric risk score using weightings inspired by real-world scam mechanics.
-
A final label and risk level are derived from the score.
-
A JSON object is printed to stdout for easy consumption or logging.
All feature engineering is defined in src/analyzer/features.py.
These basic metrics describe the “shape” of the contract:
n_lines, total number of linesn_public, count ofpublicoccurrencesn_external, count ofexternaloccurrences
They are used as rough proxies for:
- Contract complexity
- Exposure surface (public functions)
These are binary features (0 or 1) derived from regex patterns:
-
has_mintmint(...)exists somewhere in the contract
-
has_owner_mintonlyOwnerandfunction mintappear together- Suggests the owner can mint new tokens unilaterally
-
has_set_fee- functions like
setFee,setTax,setBuyFee,setSellFee - Owner-controlled tax logic, can turn a token into a honeypot overnight
- functions like
-
has_blacklist- usage of
blacklistorisBlacklisted - Owner can selectively prevent addresses from interacting
- usage of
-
has_trading_locktradingOpen,enableTrading,disableTrading,lockTrading- Owner can control if trading is open or closed
-
has_max_tx- patterns like
maxTxAmount,maxTransactionAmount,maxTx - Used to restrict transaction sizes (sometimes for anti-dump, sometimes for honeypots)
- patterns like
These features are all defined in this dictionary:
DANGEROUS_PATTERNS: Dict[str, str] = {
"has_mint": r"\bmint\s*\(",
"has_owner_mint": r"onlyOwner[\s\S]*function\s+mint",
"has_set_fee": r"setFee|setTax|setBuyFee|setSellFee",
"has_blacklist": r"blacklist|isBlacklisted",
"has_trading_lock": r"tradingOpen|enableTrading|disableTrading|lockTrading",
"has_max_tx": r"maxTxAmount|maxTransactionAmount|maxTx",
}All scoring logic lives in src/analyzer/model.py.
The goal is to:
- Keep it interpretable
- Use weights that reflect real risk impact
- Make it easy to upgrade to ML later
score = 0High-impact features:
if features.get("has_owner_mint", 0) >= 1:
score += 40
elif features.get("has_mint", 0) >= 1:
score += 20
if features.get("has_set_fee", 0) >= 1:
score += 25
if features.get("has_blacklist", 0) >= 1:
score += 20
if features.get("has_trading_lock", 0) >= 1:
score += 25
if features.get("has_max_tx", 0) >= 1:
score += 15Structural complexity:
n_lines = features.get("n_lines", 0)
if n_lines > 800:
score += 15
elif n_lines > 300:
score += 8score = max(0, min(100, score))if score <= 20:
level = "Low"
label = "safe"
elif score <= 60:
level = "Medium"
label = "suspicious"
else:
level = "High"
label = "rugpull_candidate"The entry point is src/cli.py.
python src/cli.py --file data/tokens/safe_token_1.sol--filepoints to a Solidity.solfile- You can give it any path: relative or absolute
Example with your own token:
python src/cli.py --file C:\Users\Amir\Desktop\MyToken.solThe CLI prints a JSON object like this:
{
"file": "data/tokens/rugpull_token_1.sol",
"features": {
"n_lines": 98.0,
"n_public": 8.0,
"n_external": 0.0,
"has_mint": 1.0,
"has_owner_mint": 1.0,
"has_set_fee": 1.0,
"has_blacklist": 1.0,
"has_trading_lock": 1.0,
"has_max_tx": 0.0
},
"risk_score": 100,
"risk_level": "High",
"label": "rugpull_candidate"
}-
file- Path to the analyzed Solidity file
-
features- The extracted feature set used to compute the score
- You can log this for dataset creation, training ML models, etc.
-
risk_score- Integer in
[0, 100]representing the risk severity
- Integer in
-
risk_level"Low","Medium", or"High"
-
label-
Simplified categorical label:
"safe", no major red flags detected"suspicious", potentially risky mechanics (e.g. maxTx, trading locks)"rugpull_candidate", strong signals of owner power / abusive controls
-
This contract:
- Has a fixed supply set in the constructor
- No mint function
- No blacklisting
- No trading lock mechanism
- No dynamic fee setters
Expected features (simplified):
{
"n_lines": ~40–60,
"has_mint": 0,
"has_owner_mint": 0,
"has_set_fee": 0,
"has_blacklist": 0,
"has_trading_lock": 0,
"has_max_tx": 0
}Expected result:
risk_score: 0–10risk_level:Lowlabel:safe
This contract simulates common scam patterns:
- Owner-controlled
mint() - Owner-controlled
setFee()→ dynamic tax control - Blacklist mapping → can block selling
- Trading gate (
tradingOpen) → token can be deployed but trading closed - Supply fully owned by deployer at start
Expected result:
has_owner_mint: 1has_set_fee: 1has_blacklist: 1has_trading_lock: 1
Combined:
- Very high score (often 100)
risk_level:Highlabel:rugpull_candidate
This demonstrates how the feature set captures owner power concentration.
This contract:
- Has
maxTxAmount→ can restrict selling - Has
tradingOpenflag - No blacklist or mint, but still can be used in tricky ways
Expected result:
has_max_tx: 1has_trading_lock: 1- No
mintorowner_mint
This lands in:
risk_score: mid-rangerisk_level:Mediumlabel:suspicious
This simulates tokens where mechanics can be abused, but are not outright obvious rugpull patterns.
Main responsibilities:
- Read Solidity file as text
- Count lines,
public,external - Run regex patterns to detect risky constructs
- Return a
Dict[str, float]of features
You can add new patterns by:
- Extending
DANGEROUS_PATTERNS - Adjusting
model.pyto give them a weight
Implements the heuristic scoring model:
- Accepts features dict
- Adds risk contributions based on features
- Clamps score
- Maps score to level + label
To change behavior, you adjust:
- The weights assigned to each feature
- The thresholds for
Low/Medium/High
High-level API surface:
def audit_token(path: str) -> Dict[str, Any]:
...This is useful if you want to import the library in other Python code:
from analyzer.classify import audit_token
result = audit_token("data/tokens/rugpull_token_1.sol")
print(result["risk_score"], result["risk_level"])Console entry point for humans and scripts:
- Wraps
audit_token() - Parses
--fileargument - Prints JSON to stdout
You can integrate it in CI like:
python src/cli.py --file contracts/YourToken.sol > audit_result.jsonThe Solidity contract in contracts/TokenAuditRegistry.sol allows you to store audit results on-chain:
function submitAudit(
bytes32 tokenId,
uint256 score,
RiskLevel level,
string calldata label,
string calldata detailsJson
) external;You could use:
tokenId = keccak256(abi.encodePacked(token_source_hash))- Or
tokenId = keccak256(abi.encodePacked(token_address))
This enables:
- On-chain, verifiable audit records
- DApps querying
getAudit(tokenId) - Indexers (The Graph) to build dashboards
This is currently optional and not wired into the Python CLI, but it defines a clean interface for future integration.
Right now, the model is heuristic but ML-inspired. To make it truly ML-powered:
-
Generate a dataset:
- Collect many token contracts (+ labels such as
scam,legit, etc.) - Use
extract_token_features()to create feature vectors - Store in CSV / parquet
- Collect many token contracts (+ labels such as
-
Train a model (e.g. RandomForest):
from sklearn.ensemble import RandomForestClassifier
# X: feature matrix, y: labels
clf = RandomForestClassifier(n_estimators=200, random_state=42)
clf.fit(X, y)- Save the model:
import joblib
joblib.dump(clf, "artifacts/token_risk_model.joblib")- Modify
model.pyto:
- Load the trained model
- Use
featuresas input toclf.predict()/clf.predict_proba() - Derive
risk_score,risk_level,labelfrom probabilities
This transforms the current system into a true ML-based auditor.
- This is not a formal security audit
- It does not guarantee that a token is safe or unsafe
- It only flags patterns commonly seen in rugpulls and scam tokens
- It does not simulate blockchain state or transactions
- It does not parse ASTs or bytecode (current version = regex/lexical)
Use this as:
- A first-pass filter
- A research tool
- A component in a larger analysis pipeline
Not as a sole decision-maker for high-value financial actions.
- Add support for AST-based feature extraction
- Integrate real ML model (RandomForest / XGBoost)
- Build dataset loader for real-world token contracts
- Add explanations and SHAP-style feature importances
- Add web dashboard for visualizing audit results
- Add integration with
web3.pyto push results toTokenAuditRegistry.sol - Add CI examples (GitHub Actions) to auto-audit tokens in PRs
Contributions are welcome:
- Add new features or risk patterns
- Improve the scoring weights
- Add real datasets for ML training
- Extend the Solidity registry
- Improve documentation
Feel free to open issues or pull requests if you experiment with new ideas.