Skip to content

madhugoutham/Stock_Price_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Stock Price Prediction with Machine Learning

A comprehensive end-to-end project that predicts next-day stock prices using historical data from Yahoo Finance. This project demonstrates a practical approach to time-series regression with tree-based models including Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, and XGBoost.

Disclaimer: This project is for educational purposes only. Stock price prediction is inherently uncertain, and real trading or financial decisions require much more rigorous analysis, risk management, and domain expertise.


Table of Contents

  1. Overview
  2. Project Features
  3. Data Source
  4. Installation & Requirements
  5. Project Structure
  6. Usage
  7. Methodology
  8. Results
  9. Next Steps & Improvements
  10. Contributing
  11. License
  12. Contact

Overview

The Stock Price Prediction project shows how to:

  • Collect stock data from Yahoo Finance using the yfinance library.
  • Clean and explore time-series data.
  • Create features (e.g., rolling averages, shift columns) to predict the next day’s closing price.
  • Train and compare multiple tree-based regression models:
    • Decision Tree Regressor
    • Random Forest Regressor
    • Gradient Boosting Regressor
    • XGBoost Regressor
  • Evaluate using standard regression metrics: MAE, MSE, RMSE, R².
  • Interpret results via feature importance analysis.

Project Features

  • Time-Series Specific: Demonstrates a straightforward time-based train-test split.
  • Feature Engineering: Includes rolling averages and a target shifted by one day (Close_Target).
  • Multiple Ensemble Methods: Illustrates how bagging (Random Forest) and boosting (Gradient Boosting, XGBoost) can improve predictive performance.
  • Hyperparameter Tuning: Shows how to tune models with GridSearchCV.
  • Evaluation & Visualization: Checks errors (MAE, RMSE, etc.) and plots feature importances to understand model behavior.

Data Source

This project uses historical stock data from Yahoo Finance, accessed via the yfinance Python library.

  • Default Ticker: Tesla (TSLA)
  • Example Date Range: 2016-01-01 to 2021-12-31

Feel free to change the ticker or date range in the code to explore different stocks or time spans.


Installation & Requirements

Python Version

  • Python 3.7+ recommended

Dependencies

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • xgboost
  • yfinance

If you have a requirements.txt file included, simply run:

pip install -r requirements.txt

Otherwise, install manually via pip:

pip install pandas numpy matplotlib seaborn scikit-learn xgboost yfinance

Project Structure

├── README.md                      <- This README
├── stock_price_prediction.ipynb   <- Jupyter Notebook with full walkthrough
├── requirements.txt               <- (Optional) Python dependencies
└── ...

Key File: stock_price_prediction.ipynb
	•	Contains step-by-step code for data fetching, feature engineering, model training, and evaluation.

Usage
	1.	Clone or Download this repository:

git clone https://github.com/YourUsername/Stock-Price-Prediction.git
cd Stock-Price-Prediction


	2.	Install Dependencies (if using requirements.txt):

pip install -r requirements.txt


	3.	Run Jupyter Notebook:

jupyter notebook stock_price_prediction.ipynb

	•	Or run it as a .py script if you prefer.

	4.	Customize:
	•	In the notebook, change the ticker and date range (in the yfinance download cell) to experiment with different stocks or time periods.
	•	Adjust the feature engineering (rolling windows, shift steps, etc.) as you see fit.
	•	Update hyperparameters in the GridSearchCV sections to explore further tuning.

Methodology
	1.	Data Collection: Fetch daily stock data (Open, High, Low, Close, Adj Close, Volume) from Yahoo Finance using yfinance.download().
	2.	Data Cleaning & Exploration: Check for missing values, visual anomalies, and trends.
	3.	Feature Engineering:
	•	Target: Shift the Close price by one day to create Close_Target.
	•	Moving Averages: Rolling features like MA5 or MA10.
	4.	Time-Based Train/Test Split:
	•	Use historical data (e.g., 2016–2020) for training, and a more recent period (e.g., 2021) for testing.
	5.	Model Training:
	•	Decision Tree Regressor (baseline)
	•	Random Forest Regressor
	•	Gradient Boosting Regressor
	•	XGBoost Regressor
	6.	Hyperparameter Tuning:
	•	Use GridSearchCV to optimize n_estimators, max_depth, learning_rate, etc.
	7.	Evaluation:
	•	Calculate MAE, MSE, RMSE, R² on the test set.
	•	Compare models to see which performs best.
	8.	Interpretation:
	•	Examine feature importances to see which features influence predictions the most.
	9.	Results & Analysis:
	•	Summarize errors, plot predictions vs. actual values, discuss limitations.

Results

Here is a sample table of metrics (illustrative, not actual):

Model	MAE	MSE	RMSE	R²
Decision Tree (Tuned)	25.41	975.0	31.24	0.74
Random Forest (Tuned)	18.62	582.1	24.13	0.82
Gradient Boosting (Tuned)	17.93	537.5	23.19	0.84
XGBoost (Tuned)	16.10	490.0	22.13	0.86

	•	Typically, boosting methods (Gradient Boosting, XGBoost) can outperform a single Decision Tree or even a Random Forest, but exact performance depends on the data and hyperparameter tuning.
	•	Note: Absolute error values (MAE, RMSE) depend on the scale of the stock price.

A higher R² and a lower MAE/RMSE indicate better predictive performance.

Next Steps & Improvements
	•	Time-Series Cross-Validation: Instead of a single train/test split, implement a rolling or expanding window approach (e.g., TimeSeriesSplit in scikit-learn).
	•	Additional Features: Try adding more technical indicators (RSI, MACD, Bollinger Bands) or external data (market indices, sentiment, macroeconomic indicators).
	•	Hyperparameter Search: Use advanced search techniques (e.g., RandomizedSearchCV, Bayesian Optimization) for better and faster tuning.
	•	Model Stacking: Combine multiple ensemble models (e.g., stack XGBoost & Random Forest) for potentially improved results.
	•	Production Deployment: Wrap your final model in a Flask or FastAPI service for real-time predictions, along with monitoring and logging.

Contributing

Contributions, suggestions, and improvements are welcome! Please open an issue or submit a pull request if you have ideas for enhancements, bug fixes, or new features.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

Author: [Your Name or GitHub Handle]
Email: [your.email@domain.com]
GitHub: Your GitHub Profile

Feel free to reach out or open an issue if you have any questions. Enjoy exploring stock price predictions with these ensemble methods!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors