Skip to content

Sleggi/calorie-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Calorie Prediction (Multimodal Model)

This project implements a multimodal neural network for predicting the total calorie content of dishes from both images and ingredient lists.

The model combines:

  • DistilBERT as a text encoder for ingredient descriptions.
  • ConvNeXt-Tiny as an image encoder (pretrained on ImageNet).
  • A fusion MLP regressor that integrates text, image, and dish mass into a single prediction.

Dataset

The dataset consists of:

  • ingredients.csv: ingredient IDs and their names.
  • dish.csv: dish IDs, ingredient lists, dish mass, total calories, and train/test split labels.
  • images/: photo of each dish (dish_id/rgb.png).

Dish Example

The target variable is the total calories per dish.


Project structure

data/                # test, train, val, dish, ingredients
models/              # saved best weights
src/                 # Python scripts
solution.ipynb       # Jupyter notebook with learning and evaluation

Training results

Epoch Train Loss Train MAE Val Loss Val MAE
1 52487.58 166.85 22876.51 107.38 0.512
2 17341.38 92.02 14321.95 79.56 0.695
3 11470.07 74.06 10737.25 70.32 0.771
4 9188.95 65.79 9822.49 64.71 0.791
5 7020.94 57.87 8103.12 60.30 0.827
7 5832.44 53.39 7245.34 55.27 0.845
9 3836.40 43.75 6097.49 50.52 0.870
12 3205.79 40.24 5385.90 47.52 0.885
14 2893.54 37.94 5271.95 46.76 0.888
15 2615.69 35.78 5568.53 48.29 0.881

Final evaluation

Metric Score
Test MAE 56.90
Test RMSE 82.93
Test R² 0.847

Evaluation performed on 507 test samples after 15 training epochs.


Resume

The multimodal model successfully learned to integrate visual information (images), semantic information (ingredients), and numerical features (dish mass) to predict dish calories.

Key takeaways:

  • The model achieved MAE < 60 kcal on the test set, reaching 47.5 kcal at validation set.
  • Final test MAE was 56.9 kcal, with a strong R² of 0.85, showing good generalization.
  • Adding dish mass as a feature significantly improved performance, confirming its strong correlation with calorie content.
  • The experiment demonstrates the effectiveness of combining pretrained transformers and CNNs for real-world multimodal regression tasks.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors