CO₂ Prediction for PEI & NB Potato Farm Fields

Summary

This project builds a predictive model for CO₂ concentrations recorded across Prince Edward Island (PEI) and New Brunswick (NB) using environmental sensor data. The notebook ingests the combined dataset, cleans it, selects province-specific predictors, and benchmarks two modeling pipelines (Linear Regression and XGBoost with five-fold cross-validation) to quantify how well each province’s emissions can be reconstructed.

Dataset

Source: dataset.xlsx, sheet Combined, which aggregates 528 time-stamped observations of soil (moisture, temperature, EC) and ambient (air temperature, humidity, wind speed, dew point, precipitation) variables plus greenhouse-gas concentrations.
Columns include Date, Sr. #, the sensor readings listed above, CO2, N2O, CH4, H2O, and two empty placeholder columns. The notebook appends a Province label (first 408 entries → PEI, next 120 → NB) before dropping the irrelevant metadata columns.
After dropping the unused columns and removing detected CO₂ outliers, the modeling dataset contains 516 rows and 13 columns (the eight sensor features, CO2, the three other gases, and Province).

Cleaning & Outlier Detection

Numeric inspection uses the Z-score routine to count outliers per feature for both provinces, which guided the choice of cleanup strategy.
CO₂ outliers were trimmed differently per province: an IQR-based filter removed 11 PEI points, while a Z-score threshold (|z| > 2.5) removed 1 NB point. The cleaned partitions were concatenated for downstream modeling.
Remaining rows are split again by province so that the modeling loop trains on PEI and NB independently while reusing the same pipeline helpers.

Features

Candidate features: Soil Mositure, Soil Temperature, Soil EC, Air Temperature [°C], Precipitation [mm], Relative Humidity [%], Wind Speed [m/s], Dew Point [°C].
Selected predictors (per the correlation and visual inspection steps):
- PEI: Dew Point [°C], Air Temperature [°C], Soil Temperature
- NB: Air Temperature [°C], Dew Point [°C], Soil Temperature
The notebook also maintains CO2 as the target and keeps the remaining gases (N2O, CH4, H2O) for contextual analysis and potential multi-target extensions.

Modeling

Both Linear Regression and XGBoost regressors are evaluated with five-fold cross-validation (KFold(n_splits=5, shuffle=True, random_state=42)), ensuring every split reports MAE, MSE, RMSE, and R².
Custom helpers collect predictions, record the averaged metric ± standard deviation, and plot measured vs. predicted / residual diagnostics to assess calibration per province and model.
XGBoost columns are sanitized (square/bracket characters removed) before training to avoid regex-style column name issues.

Results

Model	Province	MAE	MSE	RMSE	R²
Linear Regression	PEI	0.93 ± 0.10	1.38 ± 0.29	1.17 ± 0.13	0.40 ± 0.09
Linear Regression	NB	0.73 ± 0.08	0.89 ± 0.20	0.94 ± 0.10	0.64 ± 0.10
XGBoost	PEI	0.73 ± 0.06	1.05 ± 0.22	1.02 ± 0.10	0.54 ± 0.11
XGBoost	NB	0.90 ± 0.13	1.21 ± 0.23	1.09 ± 0.11	0.51 ± 0.11

Interpretation: Linear Regression already captures NB’s behavior quite well (R² 0.64) but offers weaker generalization for PEI (0.40). XGBoost lifts PEI’s R² to 0.54 while slightly degrading NB’s score, suggesting that nonlinear interactions help where the data volume is richer (PEI) but may overfit the smaller NB partition.

Running the Notebook

Install the core dependencies (the notebook already imports these packages):

python -m pip install --user pandas numpy scipy matplotlib seaborn scikit-learn xgboost notebook openpyxl

Launch the notebook server and open the analysis:
```
jupyter notebook FinalProject.ipynb
```
Re-run the cells to regenerate figures or tune the model (the dataset file must stay next to the notebook).

Recommended Structure & Next Steps

Structure:

.
├── data/                # Raw and processed spreadsheets (e.g., dataset.xlsx)
├── notebooks/           # The analyzed notebook(s) such as FinalProject.ipynb
├── README.md
└── scripts/             # Standalone helpers / conversions if needed

Consider future work:
- Merge the NB & PEI partitions into a single multi-output model with province-aware features.
- Hyperparameter tune XGBoost (grid/random search) and explore regularized linear models.
- Add a deployment-ready pipeline (data validation, serialization, scoring) or a dashboard to monitor predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
FinalProject.ipynb		FinalProject.ipynb
README.md		README.md
dataset.xlsx		dataset.xlsx
ppt.pptx		ppt.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CO₂ Prediction for PEI & NB Potato Farm Fields

Summary

Dataset

Cleaning & Outlier Detection

Features

Modeling

Results

Running the Notebook

Recommended Structure & Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CO₂ Prediction for PEI & NB Potato Farm Fields

Summary

Dataset

Cleaning & Outlier Detection

Features

Modeling

Results

Running the Notebook

Recommended Structure & Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages