Skip to content

Code and data for the publication "Can simple exchange heuristics guide us in predicting magnetic properties of solids?"

License

BSD-3-Clause, CC-BY-4.0 licenses found

Licenses found

BSD-3-Clause
LICENSE_CODE
CC-BY-4.0
LICENSE_DATA
Notifications You must be signed in to change notification settings

kaueltzen/paper-exchange-heuristics-in-magnetic-materials

General

This repository contains code and data of the publication Can simple exchange heuristics guide us in predicting magnetic properties of solids?
We review a popular structure-magnetism heuristic on datasets of magnetic structures (statistical_analysis) and investigate whether we can utilize the heuristic to predict magnetic structures (featurization, models).

Preprint available at https://doi.org/10.26434/chemrxiv-2025-xj84d .

The code has been written by K. Ueltzen with contributions from P. Benner, J. George and A. Naik.

Installation

For executing all scripts except two (see below), create a new Python environment with Python 3.10 and pip install the packages from requirements.txt. After that, install project-specific utils by executing pip install ./utils_kga from this location.

Two scripts (2_add_automatminer_features.py for MAGNDATA and 2_add_automatminer_features.py for MP) contain featurization with the Automatminer package, for those, create an environment from requirements_automatminer.txt.

Large parts of the data and figures produced by the scripts of this repository are given in the repository in a zipped format. For unzipping all files run find . -type f -name '*.gz' -execdir gunzip '{}' \; . The total repository size with all files unzipped is about 1.6 GB. Please note that not all data produced by the scripts is uploaded (e.g., redundant figures in different formats), but can easily be replicated with the scripts.

License

All code (including all *.py and *.ipynb) is licensed under the BSD-3 License (LICENSE_CODE), while the data of this repository is released under the CC-BY license (LICENSE_DATA).
From the latter, we exclude the raw data of the MAGNDATA database[1,2] (mcifs) which is released without license, but we were allowed to use and process it.

Project structure

Generally, scripts are numbered to indicate the order of execution.

data_retrieval_and_preprocessing_MAGNDATA

This folder contains all scripts to

  • retrieve all commensurate magnetic structures from the MAGNDATA database (1_commensurate_MAGNDATA_crawler.py)
  • filter and analyze database entries (2_filter_convert_get_provenance_and_coordination_features_commensurate_MAGNDATA.py)
    • represent entries as pymatgen Structure and as CoordinationFeatures objects containing information on their connectivity
    • collect metadata like the magnetic transition temperature
    • exclude erroneous entries (e.g., those with unphysically low distances, for a detailed list see script)
    • convert entries (e.g., convert D as H, for a detailed list see script)
  • find a crystallographically unique subset (3_multiples_elimination.py) -> this is especially relevant to avoid data leakage in the machine learning part
    • identify entries with the same crystallographic structure (without magnetic information)
    • out of groups of crystallographic multiples, select one entry as per highest magnetic transition / experiment temperature > newest publication > pseudo-random choice

The subfolder data contains the main output of these scripts (df_grouped_and_chosen_commensurate_MAGNDATA.json) that is required for the stat. analysis and the ML as well as the log of MAGNDATA structures that are excluded in 2_filter_convert_get_provenance_and_coordination_features_commensurate_MAGNDATA.py.

Please note that preprocessing of the MP database is done in 1_get_coordinationnet_features_of_MP_database.py in statistical_analysis/MP.

statistical_analysis

This folder contains all scripts to replicate the analysis results for both the MAGNDATA and MP dataset presented in the Sections Statistical Analysis, Outlook and the SI of the paper.

MAGNDATA

MP

MP_via_api

Additionally to the MagneticOrderingsWorkflow dataset by Frey et al. (see above), we repeated the statistical analysis for all structures in the MaterialsProject database that have magnetic structures determined by the MagneticOrderings workflow (not those simply initialized as FM). The statistical test results concerning bond angle trends of FM and AFM spin orderings are similar to the dataset by Frey et al. and further work (analysis of mutual information and machine learning) considers the dataset by Frey et al.
All MP task ids belonging to the MagneticOrderings workflow (find_query.json) were provided by Jason Munro (thanks!).

featurization

This folder contains all scripts for transforming the non-magnetic parent structures of the cryst. unique MAGNDATA dataset, its RE-free subset and MP dataset into structural and compositional descriptors for ML. Both general automatminer features as well as custom, magnetism-specific features are computed. For creation of custom features that include only sites guessed magnetic, e.g., mean distance between nearest-neighbor "magnetic" sites, two methods are employed to guess sites to be magnetically ordered or not:

  1. sites are assumed to be magnetic if they are classified as cationic and belong to the d or f block
  2. sites are assumed to be magnetic if they are classified as cationic and are contained in pymatgen's default_magmoms.yaml from the pymatgen.analysis.magnetism.analyzer module including originally uncommented entries as magnitude of guessed magnetic moment not decisive

Further, this folder contains the computation of feature-target normalized mutual information (NMI) that is presented in the Section Machine-learning model for magnetism and the SI of the paper.

MAGNDATA

  • 1_get_sitewise_and_structurewise_coordinationnet_features.py computes the custom, magnetism-specific features.
  • 2_add_automatminer_features.py computes general compositional and structural automatminer features.
  • 3_add_labels.py computes the binned structurewise p and ap scores with a 10° tolerance to count magnetic vectors of neighboring sites as (anti-)parallel.
  • 4_determine_maximal_common_feature_subset_all_targets_all_datasets.py For all three datasets and all targets (binned p score, binned ap score, AFM / FM classification target), the intersection of non-duplicate, non-constant features is determined and stored for further NMI computations / models. Each feature-target dataframe is split into 20 train test splits (with no overlap between the 5% test splits). For p/ap models, in the two larger dataframes (MP and all-structs of MAGNDATA), an additional, stratified undersampling of the train sets is performed to yield train sets of the same size as in the smallest dataset (RE-free structures of MAGNDATA) so ML results are not a function of dataset size and can be compared directly.
  • 5_compute_normalized_mutual_infomation_as_f_random_seed.py computes the mutual information (MI) and its normalized variant (NMI) between all features and the target for the p and ap target in the 20 train splits of the 2 MAGNDATA datasets. MI and NMI are computed with different random seeds until convergence is reached (which is, in this case, defined as that the sample standard deviation of more than 99 % of all feature-target NMIs dropping below .1 of the respective feature sample mean). Convergence is checked for every 500 random seeds.
  • 6_plot_normalized_mutual_infomation_features_w_p_and_ap_as_f_random_seed_all_datasets.py plots NMI and MI distributions of the p and ap target in all three datasets.
  • 7_determine_NMI_closest_to_sample_mean.py determines the random seed for NMI computation that minimizes the sum of absolute distances of feature-target NMI(random seed) to their respective feature-target mean NMI (i.e., it determines the random seed that gives the most average feature-target NMI results). This is done per target per dataset per train split (for both MAGNDATA and MP datasets).
  • 8_evaluate_mag_site_guessing_methods.py evaluates the two methods applied during featurization to guess magnetically ordered sites for both MAGNDATA and MP datasets.

MP

Note regarding the data: The data folders contain the fully featurized dataframes as yielded by 3_add_labels.py and 2_add_automatminer_features.py. They do not contain the redundant data representation of 20 train test splits created in later scripts. Also, the raw MI and NMI data is not included, but just the analysis results on the most important features.
In case you require the raw NMI data or the 20 train test split data (e.g., for running the ML models below), execute the scripts starting from 4_determine_maximal_common_feature_subset_all_targets_all_datasets.py and 3_compute_normalized_mutual_information_as_f_random_seed.py.

models

This folder contains the training and evaluation of ML models for magnetic structure prediction. The results are presented in the Section Machine-learning model for magnetism and the SI of the paper.

MAGNDATA

  • 1_RF_MAGNDATA_TM-structs_all-structs_p_ap.py trains RF models on the binned p and ap scores in a nested ten-fold nested CV approach on the MAGNDATA dataset and its RE-free subset. This gives 20 models per target per dataset (20 different train-test splits with non-overlapping test sets). The script requires two command line arguments (the train-test split and the target).
    If you have access to an hpc cluster with slurm, you can modify the following batch script and the n_jobs_* parameters in the script for running all MAGNDATA models.
#!/bin/bash
#SBATCH --job-name=RF-md
#SBATCH --nodes=1
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=2
#SBATCH --time=99:00:00
#SBATCH --output=out.%j.txt
#SBATCH --error=error.%j.txt
#SBATCH --mail-user=your-e-mail
#SBATCH --mail-type=ALL

micromamba activate path-to-your-environment

list1=("0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19")
list2=("p" "ap")

for param1 in "${list1[@]}"; do
  for param2 in "${list2[@]}"; do
    srun --exclusive -n 1 python 1_RF_MAGNDATA_TM-structs_all-structs_p_ap.py "$param1" "$param2" &
    if [[ $(jobs -r -p | wc -l) -ge 40 ]]; then
      wait -n
    fi
  done
done

# Wait for all background jobs to finish
wait

You can also modify the number of jobs for feature selection and RF model training - however, please note that the Shapley value computation is not parallelized.

MP

utils_kga

This folder hosts the package that contains all project-specific utility functions.

tests

This folder contains all tests for the utils_kga functions.


[1] S.V. Gallego, J.M. Perez-Mato, L. Elcoro, E.S. Tasci, R.M. Hanson, K. Momma, M.I. Aroyo and G. Madariaga "MAGNDATA: towards a database of magnetic structures. I. The commensurate case." Journal of Applied Crystallography 49 1750-1776 (2016). doi:10.1107/S1600576716012863

[2] S.V. Gallego, J.M. Perez-Mato, L. Elcoro, E.S. Tasci, R.M. Hanson, M.I. Aroyo and G. Madariaga "MAGNDATA: towards a database of magnetic structures. II. The incommensurate case." Journal of Applied Crystallography 49 1941-1956 (2016). doi:10.1107/S1600576716015491

About

Code and data for the publication "Can simple exchange heuristics guide us in predicting magnetic properties of solids?"

Resources

License

BSD-3-Clause, CC-BY-4.0 licenses found

Licenses found

BSD-3-Clause
LICENSE_CODE
CC-BY-4.0
LICENSE_DATA

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published