Bayesian optimisation workflow for fitting Hubbard-U parameters to ternary alloy band-gap targets (and tracking auxiliary metrics such as effective mass and short-band agreement).
The main driver, process_ternary.py, does the following:
- Selects one ternary system (
InGaAs,InGaSb,InAsSb, orGaAsSb). - Uses
data_int.hdf5as a discrete U-parameter grid for the two binaries in that ternary. - Proposes a new grid point with Bayesian optimisation.
- Builds orbital-resolved U values and writes them into QE input files under
intermediate/<ternary>/.... - Reads resulting QE XML outputs to compute:
- band-gap mismatch loss (
loss1), - binary short-band deviation (
loss0, viadelta_*), - optional effective-mass metrics (
meff_*).
- band-gap mismatch loss (
- Appends results to
data_<ternary>.hdf5.
process_ternary.py: Main orchestration script.run_test.sh: Slurm launcher example.data_int.hdf5: Precomputed U-grid data used for Bayesian search bounds and lookup.binary.hse.data.hdf5: Binary HSE reference eigenvalues used infind_delta.cube_donati_<ternary>.csv: Experimental/reference Eg points for each ternary.intermediate/<ternary>/<composition>/...: QE inputs/outputs for each composition point.local/: Helper modules called byprocess_ternary.py.
process_ternary.py directly imports and uses:
local/U_standard_main.py- Convert between nested U dictionaries and flattened forms.
local/make_U.py- Interpolate/assemble U values and inject them into
.infiles.
- Interpolate/assemble U values and inject them into
local/bandgaps_alloy_bands.py- Parse QE XML files and extract Eg values.
local/do_loss.py- Compute sorted pointwise Eg differences and weighted L2 loss.
local/bayes_class.py- Wrapper around
bayes_optfor parameter suggestion/registration.
- Wrapper around
local/meff.py- Parse XML eigenvalues and estimate effective mass.
Typical Python dependencies:
numpyh5pybayesian-optimization(imported asbayes_opt)
Runtime/external dependencies:
- Quantum ESPRESSO (
pw.x) available in PATH. - MPI launcher (
mpirun). - Cluster environment scripts used in this project:
~/tools/setup_data_qe7.3.1.shsetup_conda.sh
Example (adapt to your cluster setup):
source ~/tools/setup_data_qe7.3.1.sh
source setup_conda.sh
conda activate alloy_bayes./process_ternary.py InGaAssbatch run_test.shpython process_ternary.py [TERNARY]TERNARYoptions:InGaAs,InGaSb,InAsSb,GaAsSb- Default if omitted:
InGaAs
At the moment, ternary_proc() calls iteration_test(...), which does not launch QE runs (local/run_all.sh is commented there). It assumes output files already exist under intermediate/<ternary>/... and only evaluates/parses them.
If you want full run-and-evaluate behavior, switch to iteration(...) in ternary_proc().
cube_donati_<ternary>.csvdata_int.hdf5binary.hse.data.hdf5- Prepared
intermediate/<ternary>/...folder structure and QE files
data_<ternary>.hdf5- Group
lambda/<binary>/index - Group
lambda/<binary>/delta - Group
Eg/<material>with attrs (Eg_exp,x,y) - Group
ell/ell1(loss1) andell/ell0(loss0)
- Group
intermediate/<ternary>/data_Eg.csvintermediate/<ternary>/intermediate.csv
- Missing XML parsing data:
- Ensure QE XML exists in
tmp/or fallback.save/data-file-schema.xmlpaths.
- Ensure QE XML exists in
- MPI/Slurm launch failures:
- Verify partition/QoS/time settings in
run_test.sh. - Confirm
SLURM_NPROCSis set and consistent with allocation.
- Verify partition/QoS/time settings in
- Empty or unexpected HDF5 output:
- Check that composition directories and input templates exist under
intermediate/<ternary>/. - Confirm placeholder replacement by
local/make_U.pyis occurring.
- Check that composition directories and input templates exist under
- pseudopotential directory is incorrect for the user's system
- change the directory in the QE input files and re-tar them
.gitignoreexcludes many generated files (*.csv,*.txt, intermediate directories, cache directories, etc.).- Several helper and analysis scripts are present (
loss/,U_valley/, conversion utilities), but the core optimisation entry point isprocess_ternary.py.