Skip to content

omobolajiadeyan/phishguard-ai

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

PhishGuard AI

Tests CodeQL Release License: MIT Maintainer Contributions GitHub forks Release downloads

An explainable phishing detection engine that analyzes URLs and emails in real time using feature engineering and heuristic risk scoring. It works offline and requires no API key.

Created and maintained by Omobolaji Adeyan, a cybersecurity engineer focused on practical Python security tooling, threat detection, and security automation.

Built because most phishing detection tools are either black-box cloud services or require expensive ML training pipelines. PhishGuard runs entirely offline and explains exactly why it flagged something.

How It Works

Rather than relying only on blocklists, PhishGuard extracts behavioral and structural features from URLs and email content, then applies an explainable, hand-tuned heuristic model. The current weights are informed by common phishing indicators and protected by regression tests; they have not yet been validated as a statistically trained model.

URL features analyzed:

  • Domain entropy (randomly generated domains score high)
  • IP address in URL (almost always malicious)
  • Suspicious TLDs (.xyz, .tk, .ml, .ga, .click)
  • Phishing keyword density (verify, suspended, account, secure, etc.)
  • Subdomain depth, path depth, digit ratio, special character density
  • Punycode and Unicode hostname indicators, weighted conservatively as context

Email features analyzed:

  • Urgency language (action required, account suspended, verify now)
  • Link and URL density
  • ALL CAPS word usage
  • Attachment mentions
  • Exclamation mark frequency
  • Optional SPF, DKIM, and DMARC results from a trusted receiver

Features

  • Real-time URL and email scoring with probability output
  • Batch scan a list of URLs from a file
  • Explainable results — see which features triggered the alert
  • Three verdict levels: SAFE, SUSPICIOUS, PHISHING
  • JSON export for integration into SOC workflows
  • SARIF 2.1.0 export for GitHub Code Scanning and CI security pipelines
  • Zero dependencies — pure Python standard library
  • Offline — no data sent anywhere

Try It in One Minute

The one-minute demo compares legitimate and suspicious inputs, displays the explainable feature breakdown, and exports a finding without using live phishing infrastructure.

PhishGuard safe-input and phishing-input terminal comparison

See Project Evidence for dated benchmark results, release and contribution evidence, a reproducible demonstration, and explicit limits on what the current metrics establish. Watch the 18-second safe demo video.

Installation

Install the verified v0.5.1 wheel directly from GitHub Releases:

python -m pip install \
  https://github.com/omobolajiadeyan/phishguard-ai/releases/download/v0.5.1/phishguard_ai-0.5.1-py3-none-any.whl
phishguard --help

The release also includes a source archive, SHA256SUMS, and signed build provenance. See the v0.5.1 release for downloads and verification details.

GitHub Action

Use the stable Marketplace release to scan a URL in CI:

- name: Scan URL with PhishGuard AI
  uses: omobolajiadeyan/phishguard-ai@v0.5.1
  with:
    url: https://example.com
    sarif-output: phishguard-results.sarif

See the GitHub Marketplace listing for available inputs and version selection.

For development, install from a clone:

git clone https://github.com/omobolajiadeyan/phishguard-ai.git
cd phishguard-ai
python --version  # Python 3.10+ required
python -m pip install .
python -m unittest discover -s tests -v

Installation provides a phishguard command. Running the source file directly remains supported for development.

Usage

# Analyze a single URL
phishguard url "http://paypa1-secure-login.xyz/verify"

# Analyze with feature breakdown
phishguard url "https://google.com" --verbose

# Analyze an email
phishguard email \
  --subject "URGENT: Your account has been suspended" \
  --body "Click here immediately to verify your account or it will be deleted." \
  --authentication-results "mx.example; spf=fail; dkim=fail; dmarc=fail"

# Batch scan a list of URLs
phishguard batch data/urls.txt

# Use ASCII-only output in legacy terminals or CI logs
python phishguard.py url "https://google.com" --plain
python phishguard.py batch data/urls.txt --no-unicode

# Export results to JSON
phishguard batch data/urls.txt --output results.json

# Export actionable findings to SARIF 2.1.0
phishguard batch data/urls.txt \
  --format sarif \
  --output phishguard.sarif

See the GitHub Code Scanning guide for a copy-ready workflow using GitHub's official SARIF upload action. See the email JSON and SARIF examples for generated SPF, DKIM, and DMARC output and its authentication trust boundary. See the detection model documentation for feature semantics, limitations, and the evidence required for scoring changes.

Reproducible Benchmark

Run the public-safe URL regression fixture with:

python tools/evaluate_url_benchmark.py
python tools/evaluate_url_benchmark.py data/public_benchmark_urls.jsonl

The command reports ordered predictions, confusion-matrix counts, precision, recall, and false-positive rate. These are fixture metrics for detecting regressions, not population-level accuracy or calibration estimates. See the benchmark documentation for the synthetic fixture, the licensed URL-Phish-derived slice, sanitization, and reporting rules.

Example Output

  PHISHGUARD AI
  AI-powered phishing detection

────────────────────────────────────────────────────────────
  URL     : http://paypa1-secure-login.xyz/verify
  Verdict : PHISHING
  Risk    : ████████████████████  94.2%

  Feature breakdown:
    url_length           : 38
    has_ip_address       : 0
    suspicious_tld       : 1   *
    phishing_keywords    : 2   *
    has_https            : 0   *
    url_entropy          : 3.84 *

Architecture

phishguard-ai/
├── phishguard.py    # CLI entrypoint — commands: url, email, batch
├── email_auth.py    # SPF, DKIM, and DMARC result parsing
├── features.py      # Feature extraction (URL + email)
├── model.py         # Weighted scoring model + sigmoid normalisation
├── reporting.py     # Native JSON and SARIF 2.1.0 serialization
├── data/
│   └── urls.txt     # Sample URLs for batch testing
└── README.md

Contributing

Contributions are welcome from security analysts, Python developers, students, researchers, and first-time open-source contributors.

Project Leadership

Author

Omobolaji Adeyan - Cybersecurity Engineer GitHub

License and Citation

PhishGuard AI is available under the MIT License. The project may be cited using the metadata in CITATION.cff.

About

Explainable offline phishing detection for URLs and email. Zero dependencies. SARIF 2.1.0 output. Reusable GitHub Action.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages