A Multiobjective Closed-loop Approach Towards Autonomous Discovery of Electrocatalysts for Nitrogen Reduction
Data and scripts in support of the publication "A Multiobjective Closed-loop Approach Towards Autonomous Discovery of Electrocatalysts for Nitrogen Reduction", Kavalsky et al., (2023). DOI: 10.26434/chemrxiv-2023-vmbt3-v2.
The repository is organized as follows:
-
-
acsl.json:autocat.learning.sequential.SequentialLearnerobject containing all historical data from the sequential learning search. This may be read using theSequentialLearner.from_jsonmethod. -
acds.json:autocat.learning.sequential.DesignSpaceobject containing all structures within the design space (with calculated labels where available). This may be read using theDesignSpace.from_jsonmethod. -
dft_data.db:ase.dbcontaining all of the generated DFT data from the search with entries in the Physical Information File (PIF) format. This may be read usingase.db.connectusingtype="json" -
ELEMENTS.json: json containing all chemical species considered in this study -
raw_volc_m_b.csv: slopes and intercepts to reproduce the used activity volcano from "The challenge of electrochemical ammonia synthesis: a new perspective on the role of nitrogen scaling relations", Montoya et al., ChemSusChem 8 (13), 2180-2186 (2015). DOI: 10.1002/cssc.201500322 -
Text files with the BEE energy ensembles for each system that was autonomously identified during the search
-
-
-
-
get_aq_hist.py: Script for extracting the acquisition scores and prediction uncertainties as a function of sequential learning (SL) iteration into a text file -
make_aq_hist_plot.py: Script to generate a plot of candidate acquisition scores and uncertainties against SL iteration.
If these scripts are run as-is, will reproduce Figure 3b from the paper.
-
-
-
manage_dft_calculations.py: Script for managing high-throughput adsorption energy calculations on a computing cluster usingfireworks. Will ensure that first the clean slabs are relaxed before placing the adsorbate. -
reference_energies.json: Tabulated reference energies used to calculate$\Delta G_{\mathrm{N}}$ from the DFT total energies of the relaxed systems. -
sl_driver.py: Script for driving the guided candidate selection with SL. Will automatically re-train the machine learning surrogate, re-calculate the acquisition scores, and suggest the next candidate system for evaluation.
-
-
-
extract_obj_space_hist.py: Extracts the HHI, Segregation Energies, and$\Delta G_{\mathrm{N}}$ of both the systems in the initial training set as well as candidates as a function of SL iteration into text files. -
make_obj_space_hist_plot.py: Script for generating two subplots. First, it will generate a subplot of the activity volcano with candidates. Second, it will generate a subplot of Normalized HHI against Segregation Energy. Both plots will have candidates colored based on SL iteration.
If these scripts are run as-is, will reproduce Figure 4 from the paper.
-
-
-
extract_obj_space_hist.py: Extracts the HHI, Segregation Energies, and$\Delta G_{\mathrm{N}}$ of both the systems in the initial training set as well as candidates as a function of SL iteration into text files. -
make_obj_filter_hist_plot.py: Script to generate a plot of Normalized HHI against Segregation energy with distance from volcano peak color-coded.
If these scripts are run as-is, will reproduce Figure S1 from the paper.
-
-
-
get_ranking.py: Calculates the partial scores ($c_j^{\mathrm{active}}$ ,$A_j$ ,$C_j$ ) and total ranking scores ($RS_j$ ) for all candidates and extracts the data into a text file -
make_ranking_plot.py: Script for generating the ranking plot of the top 5 identified candidates
If these scripts are run as-is, will reproduce Figure 5 from the paper
-
-
-
L1_EMBEDDING.txt: Contains the UMAP embeddings of all systems in the considered SAA design space that were used in the paper. -
make_umap_plot_initial_only: Script for generating plot of UMAP projection with only the initial training points highlighted (Figure 1d in the paper) -
make_umap_plot.py: Script for generating plot of UMAP projection with both the initial training points highlighted alongside the identified candidates as a function of iteration (Figure 3a in the paper) -
umap_calc.py: Calculate UMAP embeddings for the SAA design space using magpie featurization. N.B. due to the stochasticity in the UMAP approach, running this script as-is does not guarantee identical embeddings to that provided inL1_EMBEDDING.txt, but overall trends should remain
-
-
The required packages for executing the scripts are specified in requirements.txt,
and can be installed in a new environment (e.g. using
conda)
as follows:
$ conda create -n multi_obj_search python=3.10
$ conda activate multi_obj_search
$ pip install -r requirements.txtThe scripts are all in python, and can be run from the command line. For example:
$ cd scripts/aq_hist_plot
$ python get_aq_hist.py