The code is developed for use in a machine learning intro course at the Norwegian Geotechnical Institute (NGI): https://www.ngi.no/. For more information about geotechnical data and applied machine learning, check out this NGI course: Introduction to Machine Learning in Geotechnics
Several Jupyter notebooks exemplify key machine learning concepts using geotechnical data. The notebooks are numbered sequentially in the repository root.
The dataset used in this project is the Oberhollenzer CPT dataset, which is a publicly available dataset containing Cone Penetration Test (CPT) data along with corresponding soil type classifications. The dataset is stored in the data/raw/ directory.
Oberhollenzer dataset: https://doi.org/10.1016/j.dib.2020.106618. The dataset can be downloaded from here.
The following soil types are supported:
- 1: gravel
- 2: fine grained organic soils
- 3: coarse grained organic soils
- 4: sand to gravel
- 5: sand
- 6: silt to fine sand
- 7: clay to silt
Some of the key features of the project are:
- Use of the uv package manager for dependency management and Python versions.
- Use of ydata-profiling for data Exploratory Data Analysis (EDA).
- Use of the XGBoost library (optionally with gpu support) for training the machine learning model.
- Use of the scikit-learn library for data preprocessing ml-algorithms (other than xgboost) and evaluation of the model.
- Use pyOD for outlier detection.
- Use imblearn for handling imbalanced datasets.
-
Clone the repository:
git clone https://github.com/norwegian-geotechnical-institute/course-machine-learning-for-geotechnics-intro.git cd course-machine-learning-for-geotechnics-intro -
Install uv: Follow the instructions at https://docs.astral.sh/uv/ to install uv.
-
Install dependencies:
uv sync
- Run the
00_setup.ipynbnotebook to download and extract the Oberhollenzer dataset. - Run the
01_eda_cpt.ipynbnotebook to preprocess the data and generate exploratory data analysis reports. The preprocessed data will be saved in thedata/model_ready/directory. - Explore machine learning fundamentals with the
02_metrics_and_ml_intro.ipynbnotebook. - Train and evaluate models by running the
03_training_evaluation.ipynbnotebook.
The project is developed by Tom F. Hansen: Mail