TLRL

This repository contains the code for performing transfer learning + representation learning to predict materials properties using elemental fractions (EF), physical attributes (PA) or extracted features as the model input. The code provides the following functions:

Train a Neural Network model on a given dataset.
Use a pre-trained Neural Network model to perform transfer learning + representation learning on a given target dataset.
Predict material properties of new compounds with a new pre-trained Neural Network model.

It is recommended to train large dataset (e.g. OQMD, MP) from scratch (SC) and small datasets (DFT-computed or experimental datasets) using transfer learning + representation learning methods.

Installation Requirements

The basic requirement for using the files are a Python 3.6.3 Jupyter environment with the packages listed in requirements.txt. It is advisable to create an virtual environment with the correct dependencies.

The work related experiments was performed on Linux Fedora 7.9 Maipo. The code should be able to work on other Operating Systems as well but it has not been tested elsewhere.

Source Files

Here is a brief description about the folder content:

neuralnetwork: code for training Neural Network model from scratch or using a pretrained Neural Network model to perform transfer learning + representation learning.
representation: Jupyter Notebook to perform feature extraction from a specific layer of pre-trained ElemNet model. We have also provided the code to convert chemical formula of a compound into elemental fractions and physical attributes.
prediction: Jupyter Notebook to perform prediction using the pre-trained Neural Network model.

Example Use

Create a customized dataset

To use different representations (such as elemental fractions, physical attributes or extracted features) as input to ElemNet, you will need to create a customized dataset using a .csv file. You can prepare a customized dataset in the following manner:

Prepare a .csv file which contain two columns. The first column contains the compound, and the second column contains the value of target property as shown below:

pretty_comp	target property
KH2N	-0.40
NiTeO4	-0.82

Use the Jupyter Notebook in representation folder to pre-process and convert the first column of the .csv file into elemental fraction, physical attributes or to extract features using a pre-trained Neural Network model. Above example when converted into elemental fraction becomes as follows:

pretty_comp	H	...	Pu	target property
KH2N	0.5	...	0	-0.40
NiTeO4	0	...	0	-0.82

Split the customized dataset into train, validation and test set. We have used train_test_split function of the sklearn library with a random seed of 1234567 to perform the train\validation\test split in our work.

We have provided an example of customized datasets in the repository: data/sample. Here we have converted the first column of the .csv file into elemental fractions. Note that this is required for both training and predicting.

Run Neural Network model

The code to run the Neural Network model is provided in the neuralnetwork folder. In order to run the model you can pass a sample config file to the dl_regressors_tf2.py from inside of your neuralnetwork directory:

python dl_regressors_tf2.py --config_file sample/sample-run_example_tf2.config (without TL+RL) python dl_regressors_tf2_tlnewinput.py --config_file sample/sample-run_example_tf2.config (with TL+RL)

The config file defines all the related hyperparameters associated with the model training and model testing such as loss_type, training_data_path, val_data_path, test_data_path, label, input_type, etc.

For setting pre-trained weight, you need to set it in 'model_path' [e.g. model/sample_model].

To perform TL+RL using dl_regressors_tf2_tlnewinput.py set the 'pretrainedmodel_input' hyperparameter of the config file to the inputs used to train the pre-trained model

To add customized input_type, please make changes to the data_utils.py as follows:

Add a new array with the required columns (preferably near the place where other arrays are defined). For example:

a	b	c	d	pred
0.1	0.3	0.5	0.7	0.9
0.2	0.4	0.6	0.8	1.0

If you have the following .csv file where you have to use columns a, b, c, d to predict pred, you can add new_input = ['a','b','c','d'] to the file.

Add the array variable to the input_atts dictionary so that it can be used with input_type of the config (and pred used with label). For example:

input_atts = {'new_input':new_input, 'elements':elements, ... , 'input32':input32}

After training, you will get the following files:

The output log from the training will be saved in the log folder as log/sample-run_example_tf2.log file.
The trained model will be saved in model folder as model/sample-run_example_tf2.h5 and model/sample-run_example_tf2.json. .h5 file contains the weights values and .json contains the model's architecture.

We also save the model in a newer version of the TensorFlow SavedModel format in sample folder as sample-run_example_tf2 folder. The model architecture, and training configuration (including the optimizer, losses, and metrics) are stored in saved_model.pb. The weights are saved in the variables/ directory. For more information see here

The above command runs a default task with an early stopping of 200 epochs on small dataset of target property (formation energy). This sample task can be used without any changes so that the user can get an idea of how the Neural Network model works. The sample task should take about 1-2 minute on an average when a GPU is available and give a test set MAE of 0.29-0.32 eV after the model training is finished.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, the size of the dataset, the target property to perform the regression modelling for or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
neuralnetwork		neuralnetwork
prediction		prediction
representation		representation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TLRL

Installation Requirements

Source Files

Example Use

Create a customized dataset

Run Neural Network model

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

GuptaVishu2002/TLRL

Folders and files

Latest commit

History

Repository files navigation

TLRL

Installation Requirements

Source Files

Example Use

Create a customized dataset

Run Neural Network model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages