A phishing URL detection application using machine learning, built with Starlette framework.
- Features
- Prerequisites
- Installation
- Configuration
- Running the Application
- Project Structure
- API Endpoints
- Development
- License
- Support
- URL Analysis: Advanced phishing detection using machine learning
- Feature Extraction: Comprehensive URL feature analysis including:
- Address bar-based features
- Domain-based features
- Content-based features
- Modern API Framework: Built with Starlette for high performance and async support
- API Documentation: Automatic OpenAPI/Swagger documentation
- Internationalization: Multi-language support (English and Spanish)
- Web Interface: Clean and intuitive UI for URL analysis
- Real-time Analysis: Immediate feedback on URL legitimacy
- Detailed Reports: Comprehensive feature analysis for each URL check
- Python 3.10+
- pip (Python package manager)
- Clone the repository:
git clone <repository-url>
cd phishing-url-detector
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Copy the environment template:
cp .env.example .env
Configure your .env
file with appropriate values:
# Server
HTTP_SCHEMA=http
HOST=localhost
PROD=False
PORT=8000
# API Spec
OPENAPI_TITLE=Phishing URL Detection API
OPENAPI_DESCRIPTION=A phishing URL detection API using machine learning.
OPENAPI_VERSION=0.0.1
- Start the development server:
python main.py
Assuming the default configuration, the application will be available at:
- Web Interface: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Interactive web interface for URL analysis with real-time results:
Select your preferred language:
Comprehensive API documentation with Swagger UI:
├── core/ # Core functionality
├── data/ # Data files
├── dtos/ # Data Transfer Objects
├── extractors/ # URL feature extractors
├── lib/ # Libraries and utilities
├── locales/ # Translation files
├── middlewares/ # Middleware components
├── models/ # ML models and data structures
├── notebooks/ # Jupyter notebooks for ML training
├── routers/ # API routes
├── services/ # Business logic
├── static/ # Static files
├── templates/ # HTML templates
├── tests/ # Test suite
└── utils/ # Utility functions
POST /predict
- Analyzes a URL for phishing characteristics
- Request body:
{"url": "https://example.com"}
- Response: Prediction results with detailed feature analysis
The model is trained using various URL features, such as:
- URL length
- Domain characteristics
- Content analysis
Training notebooks are available in the notebooks/
directory.
Features are extracted using the URLFeaturesExtractor
class, which analyzes:
- Address bar features
- Domain-based features
- Content-based features
Supports multiple languages through JSON locale files:
- English (
en.json
) - Spanish (
es.json
)
Run the test suite:
pytest
Coverage reports are automatically generated through GitHub Actions.
This project is licensed under the MIT License. See the LICENSE file for details.
If you find this project useful, give it a ⭐ on GitHub!