Skip to content

πŸ“Š BullBearAI β€” A cutting-edge stock market prediction system combining classical ML πŸ“ˆ with deep learning 🧠. From data cleaning and EDA to hybrid LSTM-CNN modeling β€” built for insight, performance, and clarityπŸš€.

Notifications You must be signed in to change notification settings

asRot0/BullBearAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Project Progress Overview: BullBearAI

Python Jupyter License Status

This project is designed to predict stock market trends using traditional ML, deep learning, and a hybrid LSTM-CNN architecture. Below is the step-by-step progress with brief descriptions.

Project Structure

BullBearAI/
β”‚
β”œβ”€β”€ data/                        # Raw and processed stock data
β”‚   β”œβ”€β”€ raw/                     # Untouched downloaded data
β”‚   β”œβ”€β”€ interim/                 # Intermediate transformation outputs
β”‚   └── processed/               # Cleaned and final datasets
β”‚
β”œβ”€β”€ notebooks/                  # Jupyter notebooks for EDA, modeling, evaluation
β”‚   β”œβ”€β”€ 01_eda.ipynb                        # Exploratory Data Analysis
β”‚   β”œβ”€β”€ 02_feature_engineering.ipynb        # Feature engineering techniques
β”‚   β”œβ”€β”€ 03_ml_baselines.ipynb               # Traditional ML models: SVM, RF, LR, Gradient Boosting
β”‚   β”œβ”€β”€ 04_time_series_models.ipynb         # Time series statistical models: ARIMA, SARIMA, GARCH
β”‚   β”œβ”€β”€ 05_cnn_model.ipynb                  # CNN-based deep learning model
β”‚   β”œβ”€β”€ 06_lstm_model.ipynb                 # LSTM (RNN) based sequence model
β”‚   β”œβ”€β”€ 07_hybrid_cnn_lstm_model.ipynb      # Hybrid CNN-LSTM deep model
β”‚   └── 08_model_comparison.ipynb           # Evaluation & performance comparison
β”‚
β”œβ”€β”€ src/                        # All source code
β”‚   β”œβ”€β”€ config/                 # Configuration files and parameters
β”‚   β”‚   └── config.yaml
β”‚   β”œβ”€β”€ data_loader/            # Data loading and preprocessing scripts
β”‚   β”‚   └── load_data.py
β”‚   β”œβ”€β”€ features/               # Feature engineering functions
β”‚   β”‚   └── technical_indicators.py
β”‚   β”œβ”€β”€ models/                 # ML & DL model definitions
β”‚   β”‚   β”œβ”€β”€ arima_model.py
β”‚   β”‚   β”œβ”€β”€ svm_model.py
β”‚   β”‚   β”œβ”€β”€ cnn_model.py
β”‚   β”‚   β”œβ”€β”€ lstm_model.py
β”‚   β”‚   └── hybrid_model.py
β”‚   β”œβ”€β”€ training/               # Training and validation loops
β”‚   β”‚   └── train_model.py
β”‚   β”œβ”€β”€ evaluation/             # Metrics and model comparisons
β”‚   β”‚   └── evaluate.py
β”‚   └── visualization/          # Custom plotting functions
β”‚       └── plot_utils.py
β”‚
β”œβ”€β”€ saved_models/               # Checkpoints and final models (.h5 or .pth)
β”‚
β”œβ”€β”€ reports/                    # Analysis reports, result plots, performance graphs
β”‚   β”œβ”€β”€ figures/
β”‚   └── model_comparison.md
β”‚
β”œβ”€β”€ cli/                        # Command-line tools for automation
β”‚   └── run_train.py
β”‚
β”œβ”€β”€ tests/                      # Unit tests for various components
β”‚   β”œβ”€β”€ test_models.py
β”‚   β”œβ”€β”€ test_utils.py
β”‚   └── test_data_loader.py
β”‚
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ README.md                   # Project overview, setup, and usage
β”œβ”€β”€ LICENSE                     # License info
└── .gitignore                  # Files to ignore in version control

Data Loading & Initial Inspection

  • Loaded the raw stock market data (Netflix stock) from the data/raw/ directory.
  • Verified file integrity, parsed dates correctly, and ensured data types were appropriate.
  • Saved a clean version in data/processed/netflix_cleaned.csv.

Data Cleaning

  • Removed duplicates and handled any missing/null values.
  • Renamed columns for consistency and usability (Close/Last instead of Close*).
  • Converted all date fields to datetime format.
  • Ensured data is sorted chronologically.
  • Exported cleaned dataset to data/processed/.

Exploratory Data Analysis (EDA)

  • Visualized time-series trends of Close, Volume, and Open.
  • Used Seaborn and Matplotlib for:
    • Moving averages
    • Seasonal decomposition
    • Daily/Monthly return distributions
  • Checked for trends, volatility, and patterns.
  • Identified data gaps, outliers, or anomalies.
  • All EDA work is saved in notebooks/01_eda.ipynb.

Feature Engineering

Performed a comprehensive set of transformations to prepare predictive features:

Date-Based Features

  • Extracted: Year, Month, Day, DayOfWeek, and IsWeekend.

Lag Features

  • Created lagged versions of Close/Last and Volume (lags: 1, 2, 3 days).

Rolling Statistics

  • Computed rolling means, medians, stds, max, min for 7, 14, and 30-day windows.

Volatility Measures

  • Daily percentage change, return, and rolling return metrics.

Technical Indicators

  • Simple & Exponential Moving Averages (SMA, EMA)
  • RSI (Relative Strength Index)
  • MACD (Moving Average Convergence Divergence)
  • Bollinger Bands

Target Variable

  • Target_Close_Next_Day: Next day’s close price
  • Target_UpDown: Binary classification target (1 = price goes up, 0 = down)

Engineered dataset saved to: data/interim/engineered_features.csv.


Machine Learning Baseline Models (Regression)

This notebook builds baseline regression models to predict:

  • Target_Close_Next_Day β€” the actual next-day closing price of the stock.

Implemented Models:

  • Linear Regression
  • Support Vector Regression (SVR)
  • Random Forest Regressor
  • Gradient Boosting Regressor

Highlights:

  • Models trained on engineered features including lag features, rolling window stats, and technical indicators (e.g., RSI, MACD, Bollinger Bands).

  • Evaluation metrics include:

    • MAE (Mean Absolute Error)
    • RMSE (Root Mean Squared Error)
    • RΒ² Score
  • Model Performance Metrics

    Model MAE RMSE RΒ² Score
    LR 19.25 22.43 -0.30
    SVR 27.82 34.50 -2.08
    RF 9.04 11.88 0.63
    GB 8.72 11.40 0.66
  • Visualizations:

    • Actual vs Predicted Prices (line plot)
    • Residual Plot (errors)
    • MAE & RMSE comparison bar charts

Time Series Modeling (ARIMA, SARIMA, GARCH)

This section compares three powerful time series models:

  • ARIMA: Captures trend using autoregressive and moving average components.
  • SARIMA: Extends ARIMA by modeling seasonality.
  • GARCH: Models time-varying volatility (useful for financial series).

Model Performance Metrics

Model MAE RMSE
ARIMA 6.134887 15.929801
SARIMA 19.205966 21.711764
  • MAE (Mean Absolute Error): Measures average absolute errors.
  • RMSE (Root Mean Squared Error): Penalizes large errors more.

Key Takeaways

  • ARIMA works well for capturing trend but may struggle with seasonality.
  • SARIMA provides improved results when seasonality is present.
  • GARCH is useful to understand and forecast volatility (especially useful in financial data like stock prices).

CNN-Based Model

Use deep learning (CNN) to model patterns in stock price sequences and predict future values with better local feature extraction than traditional models.

Step Description
Scaling Applies MinMaxScaler to normalize prices between 0 and 1.
Sequence Generation Converts time series into sequences using sliding windows.
CNN Architecture 1D Convolution + MaxPooling + Dense layers.
Training Compiled with adam optimizer and mse loss.
Evaluation MAE, RMSE, and future price predictions plotted.

Performance Summary

Metric Value
MAE 9.66
RMSE 11.93

LSTM-Based Model

Leverage LSTM (a variant of RNN) for time series forecasting of stock prices using historical closing data. LSTMs are well-suited for sequential data due to their ability to preserve long-term memory and overcome the vanishing gradient problem in vanilla RNNs.

  • Close/Last: Normalized closing price.

  • Target_Close_Next_Day: Target value to predict (next day’s closing price).

  • LSTM Architecture:

    • Contains memory cells with gates (input, forget, and output).
    • Capable of learning both short-term and long-term temporal patterns.
  • Sliding Window: We use 60-day historical windows to predict the next day's price.

  • EarlyStopping: To avoid overfitting (patience = 10)

Evaluation Metrics (on Inverse Scaled Real Prices)

Metric Value
MAE 8.3355
RMSE 10.2783

About

πŸ“Š BullBearAI β€” A cutting-edge stock market prediction system combining classical ML πŸ“ˆ with deep learning 🧠. From data cleaning and EDA to hybrid LSTM-CNN modeling β€” built for insight, performance, and clarityπŸš€.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published