A comprehensive end-to-end project that predicts next-day stock prices using historical data from Yahoo Finance. This project demonstrates a practical approach to time-series regression with tree-based models including Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, and XGBoost.
Disclaimer: This project is for educational purposes only. Stock price prediction is inherently uncertain, and real trading or financial decisions require much more rigorous analysis, risk management, and domain expertise.
- Overview
- Project Features
- Data Source
- Installation & Requirements
- Project Structure
- Usage
- Methodology
- Results
- Next Steps & Improvements
- Contributing
- License
- Contact
The Stock Price Prediction project shows how to:
- Collect stock data from Yahoo Finance using the
yfinancelibrary. - Clean and explore time-series data.
- Create features (e.g., rolling averages, shift columns) to predict the next day’s closing price.
- Train and compare multiple tree-based regression models:
- Decision Tree Regressor
- Random Forest Regressor
- Gradient Boosting Regressor
- XGBoost Regressor
- Evaluate using standard regression metrics: MAE, MSE, RMSE, R².
- Interpret results via feature importance analysis.
- Time-Series Specific: Demonstrates a straightforward time-based train-test split.
- Feature Engineering: Includes rolling averages and a target shifted by one day (
Close_Target). - Multiple Ensemble Methods: Illustrates how bagging (Random Forest) and boosting (Gradient Boosting, XGBoost) can improve predictive performance.
- Hyperparameter Tuning: Shows how to tune models with GridSearchCV.
- Evaluation & Visualization: Checks errors (MAE, RMSE, etc.) and plots feature importances to understand model behavior.
This project uses historical stock data from Yahoo Finance, accessed via the yfinance Python library.
- Default Ticker: Tesla (TSLA)
- Example Date Range: 2016-01-01 to 2021-12-31
Feel free to change the ticker or date range in the code to explore different stocks or time spans.
- Python 3.7+ recommended
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- xgboost
- yfinance
If you have a requirements.txt file included, simply run:
pip install -r requirements.txt
Otherwise, install manually via pip:
pip install pandas numpy matplotlib seaborn scikit-learn xgboost yfinance
Project Structure
├── README.md <- This README
├── stock_price_prediction.ipynb <- Jupyter Notebook with full walkthrough
├── requirements.txt <- (Optional) Python dependencies
└── ...
Key File: stock_price_prediction.ipynb
• Contains step-by-step code for data fetching, feature engineering, model training, and evaluation.
Usage
1. Clone or Download this repository:
git clone https://github.com/YourUsername/Stock-Price-Prediction.git
cd Stock-Price-Prediction
2. Install Dependencies (if using requirements.txt):
pip install -r requirements.txt
3. Run Jupyter Notebook:
jupyter notebook stock_price_prediction.ipynb
• Or run it as a .py script if you prefer.
4. Customize:
• In the notebook, change the ticker and date range (in the yfinance download cell) to experiment with different stocks or time periods.
• Adjust the feature engineering (rolling windows, shift steps, etc.) as you see fit.
• Update hyperparameters in the GridSearchCV sections to explore further tuning.
Methodology
1. Data Collection: Fetch daily stock data (Open, High, Low, Close, Adj Close, Volume) from Yahoo Finance using yfinance.download().
2. Data Cleaning & Exploration: Check for missing values, visual anomalies, and trends.
3. Feature Engineering:
• Target: Shift the Close price by one day to create Close_Target.
• Moving Averages: Rolling features like MA5 or MA10.
4. Time-Based Train/Test Split:
• Use historical data (e.g., 2016–2020) for training, and a more recent period (e.g., 2021) for testing.
5. Model Training:
• Decision Tree Regressor (baseline)
• Random Forest Regressor
• Gradient Boosting Regressor
• XGBoost Regressor
6. Hyperparameter Tuning:
• Use GridSearchCV to optimize n_estimators, max_depth, learning_rate, etc.
7. Evaluation:
• Calculate MAE, MSE, RMSE, R² on the test set.
• Compare models to see which performs best.
8. Interpretation:
• Examine feature importances to see which features influence predictions the most.
9. Results & Analysis:
• Summarize errors, plot predictions vs. actual values, discuss limitations.
Results
Here is a sample table of metrics (illustrative, not actual):
Model MAE MSE RMSE R²
Decision Tree (Tuned) 25.41 975.0 31.24 0.74
Random Forest (Tuned) 18.62 582.1 24.13 0.82
Gradient Boosting (Tuned) 17.93 537.5 23.19 0.84
XGBoost (Tuned) 16.10 490.0 22.13 0.86
• Typically, boosting methods (Gradient Boosting, XGBoost) can outperform a single Decision Tree or even a Random Forest, but exact performance depends on the data and hyperparameter tuning.
• Note: Absolute error values (MAE, RMSE) depend on the scale of the stock price.
A higher R² and a lower MAE/RMSE indicate better predictive performance.
Next Steps & Improvements
• Time-Series Cross-Validation: Instead of a single train/test split, implement a rolling or expanding window approach (e.g., TimeSeriesSplit in scikit-learn).
• Additional Features: Try adding more technical indicators (RSI, MACD, Bollinger Bands) or external data (market indices, sentiment, macroeconomic indicators).
• Hyperparameter Search: Use advanced search techniques (e.g., RandomizedSearchCV, Bayesian Optimization) for better and faster tuning.
• Model Stacking: Combine multiple ensemble models (e.g., stack XGBoost & Random Forest) for potentially improved results.
• Production Deployment: Wrap your final model in a Flask or FastAPI service for real-time predictions, along with monitoring and logging.
Contributing
Contributions, suggestions, and improvements are welcome! Please open an issue or submit a pull request if you have ideas for enhancements, bug fixes, or new features.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
Author: [Your Name or GitHub Handle]
Email: [your.email@domain.com]
GitHub: Your GitHub Profile
Feel free to reach out or open an issue if you have any questions. Enjoy exploring stock price predictions with these ensemble methods!