An analytical project utilizing unsupervised machine learning to cluster stocks based on their volatility and returns, identifying latent market patterns and optimizing diversified trading strategies.
Source Code · Technical Specification · Video Demo · Live Demo
Authors · Overview · Features · Structure · Quick Start · Usage Guidelines · License · About · Acknowledgments
Terna Engineering College | Computer Engineering | Batch of 2022
![]() Amey Thakur |
![]() Hasan Rizvi |
![]() Mega Satish |
|---|
Important
Special thanks to Hasan Rizvi and Mega Satish for their meaningful contributions, guidance, and support that helped shape this work.
This project investigates the application of K-Means Clustering on financial market data. By categorizing stocks into distinct clusters based on their historical price movements, the system provides a data-driven approach to understanding market dynamics and constructing balanced investment portfolios.
Developed as a mini-project for the Big Data Analytics & Computational Lab - I curriculum, this implementation showcases the full data science pipeline: from data acquisition via Yahoo Finance to feature engineering (volatility/returns) and unsupervised model validation.
| # | Resource | Description |
|---|---|---|
| 1 | Project Model | Complete Jupyter Notebook implementation |
| 2 | Technical Specification | Technical Architecture & Specification |
| 3 | Technical Report | Comprehensive project documentation |
| 4 | Technical Presentation | Visual overview of methodology and results |
| 5 | Project Demo | Real-time demonstration of the analysis |
Tip
Cluster Validation Best Practices
Use the Elbow Method to identify the optimal number of clusters by plotting Within-Cluster Sum of Squares (WCSS). Complement this with the Silhouette Score to validate cluster cohesion and separation for robust market segmentation.
| Feature | Description |
|---|---|
| K-Means Clustering | Unsupervised segmentation of stocks based on volatility and returns metrics. |
| Data Acquisition | Automated historical data retrieval via Yahoo Finance API (yfinance). |
| Feature Engineering | Calculation of annualized volatility and returns for each stock. |
| Cluster Validation | Elbow Method and Silhouette Score for optimal cluster determination. |
| Visualization | Interactive scatter plots and cluster centroid analysis. |
| Portfolio Optimization | Data-driven insights for diversified investment strategies. |
- Language: Python 3.8+
- ML Framework: Scikit-Learn (K-Means, Silhouette Analysis)
- Data Processing: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Data Source: yfinance (Yahoo Finance API)
OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING/
│
├── docs/ # Formal Documentation
│ └── SPECIFICATION.md # Technical Architecture & Specification
│
├── Mega/ # Archival Attribution Assets
│ ├── Filly.jpg # Companion (Filly)
│ ├── Mega.png # Author Profile Image (Mega Satish)
│
├── Mini-Project/ # Research & Academic Assets
│ ├── BDA_MINI-PROJECT_PPT...pdf # Project Presentation (PDF)
│ ├── BDA_MINI-PROJECT_PPT...pptx # Project Presentation (PPTX)
│ ├── BDA_MINI-PROJECT_REPORT...docx # Technical Project Report (DOCX)
│ └── BDA_MINI-PROJECT_REPORT...pdf # Technical Project Report (PDF)
│
├── Source Code/ # Model Implementation
│ ├── OPTIMIZING STOCK TRADING STRATEGY...ipynb # Core K-Means Analysis Notebook
│ └── Stock_Market_Clustering.py # Production-ready Python Script
│
├── .gitattributes # Global Git LFS & Config
│ └── .gitignore # Asset Exclusion Manifest
├── requirements.txt # Dependency Manifest
├── CITATION.cff # Scholarly Citation Metadata
├── codemeta.json # Software Metadata Manifest
├── LICENSE # MIT License Terms
├── README.md # Comprehensive Archival Entrance
└── SECURITY.md # Vulnerability Exposure PolicyEnsure your environment meets the minimum specifications:
- Python: Version 3.8 or higher.
- Hardware: 4GB Minimum RAM (8GB recommended for large datasets).
- Environment: Virtual environment (venv) is highly recommended.
Warning
Technical Dependencies & Data Variability
This system is built using Python 3.8+ and Scikit-Learn. Stock market data is inherently volatile; results may vary based on the date range and ticker symbols selected. For stable execution and reproducible analysis, it is recommended to run this in an isolated virtual environment.
- Clone the Repository:
git clone https://github.com/Amey-Thakur/OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING.git cd OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING - Install Dependencies:
pip install -r requirements.txt
- Run the Python Script:
cd "Source Code" python Stock_Market_Clustering.py
- Explore the Notebook:
- Open
OPTIMIZING STOCK TRADING STRATEGY WITH K-MEANS CLUSTERING.ipynbin Jupyter Notebook for interactive analysis.
- Open
Tip
Optimizing Stock Trading Strategy with K-Means Clustering
Experience a high-fidelity interactive simulation grouping major S&P 500 companies based on volatility and return patterns to identify optimal trading opportunities through unsupervised machine learning and advanced market segmentation.
Recent enhancements also include a Reinforcement Learning (RL) gateway for advanced strategy optimization.
This repository is openly shared to support learning and knowledge exchange across the academic community.
For Students
Use this project as a reference for understanding clustering algorithms, financial data preprocessing, and the application of Big Data Analytics in stock market optimization.
For Educators
This project may serve as a practical example or supplementary teaching resource for Big Data Analytics (CSDLO7032) and Computational Laboratory–I (CSL704) modules. Attribution is appreciated when utilizing content.
For Researchers
The implementation provides a foundation for exploring more advanced clustering techniques (e.g., DBSCAN, Hierarchical Clustering) and sentiment-integrated market analysis.
This repository and all linked academic content are made available under the MIT License. See the LICENSE file for complete terms.
Note
Summary: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original author.
Copyright © 2022 Amey Thakur, Hasan Rizvi, Mega Satish
Created & Maintained by: Amey Thakur, Hasan Rizvi & Mega Satish
Academic Journey: Bachelor of Engineering in Computer Engineering (2018-2022)
Institution: Terna Engineering College, Navi Mumbai
University: University of Mumbai
This project features the Optimizing Stock Trading Strategy with K-Means Clustering, an analytical utility developed as a 7th Semester Mini-Project. It explores the application of unsupervised machine learning for financial market analysis and portfolio optimization.
Connect: GitHub · LinkedIn · ORCID
Grateful acknowledgment to Hasan Rizvi and Mega Satish for their exceptional collaboration and scholarly partnership during the development of this project. Their technical expertise, constant support, and dedication to software quality were instrumental in achieving the project's analytical objectives. Learning alongside them was a transformative experience; their thoughtful approach to problem-solving and encouragement turned complex challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Hasan and Mega, for everything you shared and taught along the way.
Grateful acknowledgment to the faculty members of the Department of Computer Engineering at Terna Engineering College for their guidance and instruction in Big Data Analytics. Their expertise in data science and machine learning helped shape the technical foundation of this project.
Special thanks to the mentors and peers whose encouragement, discussions, and support contributed meaningfully to this learning experience.
Authors · Overview · Features · Structure · Quick Start · Usage Guidelines · License · About · Acknowledgments
🔬 Big Data Analytics Laboratory · 📊 Optimizing Stock Trading Strategy
Computer Engineering (B.E.) - University of Mumbai
Semester-wise curriculum, laboratories, projects, and academic notes.



