DENSE SENSE: A Novel Approach Utilizing an Electron Density Augmented Machine Learning Paradigm to Understand a Complex Odour Landscape
Creating environment:
conda create --name my_env --file requirements.txt
The data files that are used for training are as follows:
DMPNNpruned_without hydrogen_curated_GS_LF_merged_4812_QM_cleaned.csv
curated_GS_LF_merged_4983.csv
DMPNNpruned_graph_data_cleaned.npz
graph_data_cleaned.npz
The above mentioned PyTorch data is created using: Pyg_data_creator_for_cleaned.ipynb.
The results of the QNN model are impressive on this challenging odour label classification task given the fact that only electron localization and delocalization data was provided to the QNN model. DMPNN +LDM model gives us best results amongst all the GNN by achieving a validation score of 0.871 and is competitive with openPOM .
We then employed an ensemble approach to combine graph neural networks for improved performance. We first explored combining DMPNN models, as they demonstrated the best results. We tested ensemble of 10 and 30 DMPNNs and aggregated their result by averaging out their predictions. AUROC metric was used to evaluate the model performance. We tested two cases where we varied random seeds and one without random seed variation.
The notebooks for the 5 Fold Cross Validation for the above models are as follows:
The notebooks for the ensemble of models are as follows:
- Homogenous Bagging of GCN+MPNN+DMPNN with Random Seed = 42
- Homogenous Bagging of GCN+MPNN+DMPNN with Random Seed = 1
- 10 DPMNNs
- 30 DMPNNs
DMPNNFeaturizer() and LDM data, for a given SMILE and odour label.The best way to validate explainability was by utilizing explainability analysis for compounds with functional group-based odour labels, i.e., ketonic, phenolic, etc. This is because
explainability for such odour labels is straightforward: the functional group is the part of
the molecule responsible for its odourous property. The explainability analysis must then
highlight the functional group as the odourgenic region of the molecule.
Our explainability analysis indeed found this to be true. We took the functional group
odour labels that our model was able to successfully predict (AUROC per label score > 0.8). The code is available in this Dense_Sense_Explainability_g1.ipynb notebook.
Pinaki Saha, University of Hertfordshire, UH Biocomputation Group, United Kingdom
Mrityunjay Sharma, CSIR-CSIO, Chandigarh, India
Sarabeshwar Balaji, Indian Institute of Science Education and Research Bhopal(IISERB), India
Aryan Amit Barsainyan, National Institute of Technology Karnataka Surathkal, Karnataka, India
Ritesh Kumar, CSIR-CSIO, Chandigarh, India
Volker Steuber, University of Hertfordshire, UH Biocomputation Group, United Kingdom
Michael Schmuker, Helmholtz-Gemienschaft, Berlin, Germany
To cite this work, please use this bibtex entry:
@article{saha2025dense,
title={DENSE SENSE: A novel approach utilizing an electron density augmented machine learning paradigm to understand a complex odour landscape},
author={Saha, Pinaki and Sharma, Mrityunjay and Balaji, Sarabeshwar and Barsainyan, Aryan Amit and Kumar, Ritesh and Steuber, Volker and Schmuker, Michael},
year={2025}
}