Skip to content

iRubeel/CO2-Emissions-predictive-model-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CO₂ Prediction for PEI & NB Potato Farm Fields

Summary

This project builds a predictive model for CO₂ concentrations recorded across Prince Edward Island (PEI) and New Brunswick (NB) using environmental sensor data. The notebook ingests the combined dataset, cleans it, selects province-specific predictors, and benchmarks two modeling pipelines (Linear Regression and XGBoost with five-fold cross-validation) to quantify how well each province’s emissions can be reconstructed.

Dataset

  • Source: dataset.xlsx, sheet Combined, which aggregates 528 time-stamped observations of soil (moisture, temperature, EC) and ambient (air temperature, humidity, wind speed, dew point, precipitation) variables plus greenhouse-gas concentrations.
  • Columns include Date, Sr. #, the sensor readings listed above, CO2, N2O, CH4, H2O, and two empty placeholder columns. The notebook appends a Province label (first 408 entries → PEI, next 120 → NB) before dropping the irrelevant metadata columns.
  • After dropping the unused columns and removing detected CO₂ outliers, the modeling dataset contains 516 rows and 13 columns (the eight sensor features, CO2, the three other gases, and Province).

Cleaning & Outlier Detection

  • Numeric inspection uses the Z-score routine to count outliers per feature for both provinces, which guided the choice of cleanup strategy.
  • CO₂ outliers were trimmed differently per province: an IQR-based filter removed 11 PEI points, while a Z-score threshold (|z| > 2.5) removed 1 NB point. The cleaned partitions were concatenated for downstream modeling.
  • Remaining rows are split again by province so that the modeling loop trains on PEI and NB independently while reusing the same pipeline helpers.

Features

  • Candidate features: Soil Mositure, Soil Temperature, Soil EC, Air Temperature [°C], Precipitation [mm], Relative Humidity [%], Wind Speed [m/s], Dew Point [°C].
  • Selected predictors (per the correlation and visual inspection steps):
    • PEI: Dew Point [°C], Air Temperature [°C], Soil Temperature
    • NB: Air Temperature [°C], Dew Point [°C], Soil Temperature
  • The notebook also maintains CO2 as the target and keeps the remaining gases (N2O, CH4, H2O) for contextual analysis and potential multi-target extensions.

Modeling

  • Both Linear Regression and XGBoost regressors are evaluated with five-fold cross-validation (KFold(n_splits=5, shuffle=True, random_state=42)), ensuring every split reports MAE, MSE, RMSE, and R².
  • Custom helpers collect predictions, record the averaged metric ± standard deviation, and plot measured vs. predicted / residual diagnostics to assess calibration per province and model.
  • XGBoost columns are sanitized (square/bracket characters removed) before training to avoid regex-style column name issues.

Results

Model Province MAE MSE RMSE
Linear Regression PEI 0.93 ± 0.10 1.38 ± 0.29 1.17 ± 0.13 0.40 ± 0.09
Linear Regression NB 0.73 ± 0.08 0.89 ± 0.20 0.94 ± 0.10 0.64 ± 0.10
XGBoost PEI 0.73 ± 0.06 1.05 ± 0.22 1.02 ± 0.10 0.54 ± 0.11
XGBoost NB 0.90 ± 0.13 1.21 ± 0.23 1.09 ± 0.11 0.51 ± 0.11
  • Interpretation: Linear Regression already captures NB’s behavior quite well (R² 0.64) but offers weaker generalization for PEI (0.40). XGBoost lifts PEI’s R² to 0.54 while slightly degrading NB’s score, suggesting that nonlinear interactions help where the data volume is richer (PEI) but may overfit the smaller NB partition.

Running the Notebook

  1. Install the core dependencies (the notebook already imports these packages):

    python -m pip install --user pandas numpy scipy matplotlib seaborn scikit-learn xgboost notebook openpyxl
  2. Launch the notebook server and open the analysis:

    jupyter notebook FinalProject.ipynb
  3. Re-run the cells to regenerate figures or tune the model (the dataset file must stay next to the notebook).

Recommended Structure & Next Steps

  • Structure:
.
├── data/                # Raw and processed spreadsheets (e.g., dataset.xlsx)
├── notebooks/           # The analyzed notebook(s) such as FinalProject.ipynb
├── README.md
└── scripts/             # Standalone helpers / conversions if needed
  • Consider future work:
    • Merge the NB & PEI partitions into a single multi-output model with province-aware features.
    • Hyperparameter tune XGBoost (grid/random search) and explore regularized linear models.
    • Add a deployment-ready pipeline (data validation, serialization, scoring) or a dashboard to monitor predictions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors