In real-world industrial settings, direct measurement of force may not always be feasible due to size, cost, or complexity constraints. Instead, stress sensors, which are smaller and more accessible, can provide proxy information. This project aims to predict force vectors (x, y, z) using readings from six stress sensors, each providing three-color channel values (red, blue, yellow).
By learning a mapping from stress to force in a controlled lab setup, this model can then infer forces in new unseen situations.
A robust machine learning pipeline is designed to:
- Ingest and preprocess multiple Excel files with sensor data.
- Merge stress and force values using time alignment.
- Build and tune models like Random Forest, LightGBM, and CatBoost using Optuna.
- Track experiments with MLflow.
- Predict force values for new test datasets.
. # Root of project
├── config/
│ ├── config.yaml # Main config file, including hyperparameter tuning, directories, etc
│ └── logging.yaml
│
├── data/ # Data folder
│ ├── raw/ # Raw Excel input files (e.g., lab_test_1.xlsx)
│ ├── processed/ # Processed CSV files (e.g., merged_data_timestamp.csv)
│ ├── model/ # Trained models saved
│ ├── results/ # Inference result CSVs saved
│ └── test/ # Test set files for prediction
│
├── notebooks/ # Exploratory data analysis
│ ├── eda_non_temporal.ipynb # EDA assuming non-temporal. Main notebook
│ └── eda_temporal.ipynb # EDA for checking if this might be a temporal problem
│
├── src/ # Source code
│ ├── main/ # Entry points for training and inference
│ │ ├── trainer_main.py
│ │ └── inference_main.py
│ ├── data_preprocessor.py # Data loading, merging, cleaning
│ ├── model_trainer.py # Model training and evaluation logic
│ └── utils/
│ └── logging.py # Logging setup
│
├── requirements.in # Clean, minimal list of top-level dependencies
├── requirements.txt # Full frozen list of dependencies
├── README.md # Project overview and usage instructions
└── .gitignore
conda create -n stress_force python=3.12.7
conda activate stress_force
pip install -r requirements.txt
- Place your Excel files in data/raw/
- Files should be named like: lab_result_1.xlsx, lab_result_2.xlsx, etc.
- Each file must contain two sheets: stress and force.
- Before running model training, ensure you’ve correctly set up config.yaml file. This file controls key settings for data loading, model training, and optimization.
- Folder directory section:
data_dir: ./data # Path to main data directory
use_existing_processed_data: true # Set to true to skip preprocessing and use an existing processed and merged CSV
saved_model_filename: random_forest_regressor.joblib
- Model Trainer Settings:
trainer:
test_size: 0.2
seed: 42
optimize: true # Enable Optuna hyperparameter tuning
n_trials: 20
models:
randomforest:
search_space:
n_estimators: 100
min_samples_split: 2
min_samples_leaf: 1
max_features: 'sqrt'
bootstrap: True
- This example enables the Random Forest model with default parameters.
- If you want to use LightGBM or CatBoost, simply uncomment the corresponding blocks in the models section and comment out or remove the unused ones.
- The current parameters has been fine-tuned to the datasets provided. You might want to adjust accordingly.
python -m src.main.trainer_main
- This will:
- Preprocess and merge data
- Tune models using Optuna (if enabled)
- Track results in MLflow
- Save trained models in data/model/
- Make sure your trained model is saved, and place your desied model file name in config.yaml. Example:
saved_model_filename: random_forest_regressor.joblib
- Make sure test Excel files are placed in: data/test/ folder.
- Then run:
python -m src.main.inference_main
- This will output predicted force values into: data/results/ folder.
- Once training is complete, you can inspect experiment runs, metrics, parameters, etc using the MLflow UI:
mlflow ui
- This starts the UI at http://localhost:5000. Open this URL in your browser to:
- Compare runs across different models (LightGBM, CatBoost, RandomForest)
- Analyze logged metrics such as overall MAE, per-axis MAE (x, y, z), and MSE
- View hyperparameters (either default or tuned via Optuna)
Category | Tools Used |
---|---|
Modeling | scikit-learn , LightGBM , CatBoost |
Optimization | Optuna |
MLOps | MLflow , Hydra , |
Visualization | matplotlib , seaborn , SHAP |