-
Notifications
You must be signed in to change notification settings - Fork 38
Split Vacancies and Foundation Defect Complexes Code #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces major new functionality for handling complex defect structures in crystalline materials. It adds modules for generating, classifying, and analyzing defect complexes, including split vacancies, and provides robust stenciling utilities for mapping relaxed defect structures into arbitrary supercells. Comprehensive documentation and extensive tests for these features are included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DefectsGenerator
participant ComplexesModule
participant StencilingUtils
participant StructureMatcher
User->>DefectsGenerator: get_split_vacancies()
DefectsGenerator->>ComplexesModule: generate_complex_from_defect_sites(...)
ComplexesModule->>ComplexesModule: classify_vacancy_geometry(...)
ComplexesModule->>ComplexesModule: get_equivalent_complex_defect_sites_in_primitive(...)
ComplexesModule->>ComplexesModule: get_complex_defect_multiplicity(...)
DefectsGenerator->>User: Return split vacancy candidates
User->>StencilingUtils: get_defect_in_supercell(defect_entry, target_supercell)
StencilingUtils->>StencilingUtils: Place defect site via symmetry
StencilingUtils->>StencilingUtils: Generate super-supercell
StencilingUtils->>StencilingUtils: Stencil sites to target supercell
StencilingUtils->>StructureMatcher: Match and validate structure
StencilingUtils->>User: Return stenciled defect and bulk supercells
Poem
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
🧹 Nitpick comments (10)
doped/core.py (1)
2616-2632: New helper is unreferenced—DRY opportunity
The logic for extracting single‐valence oxidation states is still duplicated in_from_pmg_defect. Replace those inline dict comprehensions with calls to_get_single_valence_oxi_statesto avoid code duplication and improve maintainability.tests/test_stenciling.py (3)
20-29: Consider using pathlib for more robust file operations.The current implementation works but could be simplified and made more robust using pathlib.
-def if_present_rm(path): - """ - Remove the file/folder if it exists. - """ - if os.path.exists(path): - if os.path.isfile(path): - os.remove(path) - elif os.path.isdir(path): - shutil.rmtree(path) +from pathlib import Path + +def if_present_rm(path): + """ + Remove the file/folder if it exists. + """ + p = Path(path) + if p.exists(): + if p.is_file(): + p.unlink() + elif p.is_dir(): + shutil.rmtree(p)
61-66: Track the TODO items for comprehensive test coverage.The TODO comments indicate planned additions for extrinsic defects and split vacancies testing, which would enhance test coverage of the new complex defect functionality.
Would you like me to create GitHub issues to track these TODO items for future implementation?
119-127: Consider extracting file I/O testing to a separate utility.The pattern of writing to file, reading back, and comparing could be extracted to a reusable test utility method to reduce duplication across tests.
def assert_structure_roundtrip(self, structure, filepath, reference): """Test that structure survives POSCAR write/read cycle.""" structure.to(fmt="vasp", filename=filepath) reloaded = Structure.from_file(filepath) assert reloaded == reference if_present_rm(filepath)tests/test_complexes.py (2)
58-60: Address the TODO comment for incomplete test setup.This commented-out code appears to be setting up a specific split vacancy configuration but is incomplete. Consider either completing the implementation or removing it if no longer needed.
Would you like me to help complete this test setup or open an issue to track this task?
153-160: Complete or remove the commented test method.The
test_Ga2O3_generate_complex_from_defect_sitesmethod is commented out with a TODO marker. This suggests incomplete test coverage for thegenerate_complex_from_defect_sitesfunction.Would you like me to help implement this test method or create an issue to track its completion?
doped/complexes.py (2)
164-164: Avoid accessing private attributes directly.Using
structure._chargedirectly accesses a private attribute, which violates encapsulation principles and could break if the internal implementation changes.Consider using a public API method or property if available, or working with the pymatgen team to expose this functionality properly.
688-692: Track TODO items in issue tracker.There are several TODO comments that should be properly tracked:
- Future complex generation functionality (line 688-689)
- DOI updates when published (line 690)
- Missing tests (line 691)
Would you like me to create GitHub issues to track these TODO items?
doped/utils/stenciling.py (2)
154-166: Address TODO comments for production readiness.Multiple TODO items indicate this code may not be fully production-ready:
- Missing tests (line 155-156)
- Unimplemented symmetry analysis feature (lines 157-160)
- Missing documentation (lines 161-165)
Would you like me to help create unit tests or open issues to track these items?
853-878: Consider extracting nested function for clarity.The nested
_check_other_sitesfunction could be moved to module level as a private function to improve readability and testability.This would make the main function logic clearer and allow the helper to be tested independently if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
tests/data/Split_Vacancies/Ba3Al2(SiO4)3-mp-14049.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/CsCd(PO3)3-mp-13738.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/Ga2O3-mp-1243.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/Ga2O3-mp-886.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/GeI2-mp-1540935.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/Li3Ti4O8-mp-752879.json.gzis excluded by!**/*.gztests/data/Split_Vacancies/Na2Mg(NiO2)2-mp-2217469.json.gzis excluded by!**/*.gz
📒 Files selected for processing (11)
docs/doped.complexes.rst(1 hunks)docs/doped.rst(1 hunks)doped/complexes.py(1 hunks)doped/core.py(1 hunks)doped/generation.py(3 hunks)doped/utils/configurations.py(0 hunks)doped/utils/stenciling.py(1 hunks)pyproject.toml(1 hunks)tests/data/Ga2O3_R3c_POSCAR(1 hunks)tests/test_complexes.py(1 hunks)tests/test_stenciling.py(1 hunks)
💤 Files with no reviewable changes (1)
- doped/utils/configurations.py
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/test_complexes.py
[refactor] 281-281: Too many local variables (24/15)
(R0914)
[refactor] 403-403: Too many local variables (28/15)
(R0914)
[refactor] 542-542: Too many local variables (31/15)
(R0914)
tests/test_stenciling.py
[refactor] 44-44: Too many instance attributes (10/7)
(R0902)
doped/generation.py
[refactor] 2465-2465: Too many local variables (16/15)
(R0914)
[refactor] 2539-2585: Too many nested blocks (7/5)
(R1702)
doped/complexes.py
[refactor] 238-238: Too many arguments (9/5)
(R0913)
[refactor] 238-238: Too many positional arguments (9/5)
(R0917)
[refactor] 238-238: Too many local variables (36/15)
(R0914)
[error] 366-373: Unexpected keyword argument 'symm_ops' in function call
(E1123)
[error] 366-373: Unexpected keyword argument 'dist_tol' in function call
(E1123)
[refactor] 567-567: Too many arguments (8/5)
(R0913)
[refactor] 567-567: Too many positional arguments (8/5)
(R0917)
doped/utils/stenciling.py
[refactor] 44-44: Too many arguments (6/5)
(R0913)
[refactor] 44-44: Too many positional arguments (6/5)
(R0917)
[refactor] 44-44: Too many local variables (36/15)
(R0914)
[error] 48-48: Value 'np.ndarray' is unsubscriptable
(E1136)
[refactor] 44-44: Too many statements (56/50)
(R0915)
[error] 348-348: Value 'np.ndarray' is unsubscriptable
(E1136)
[refactor] 385-385: Too many arguments (6/5)
(R0913)
[refactor] 385-385: Too many positional arguments (6/5)
(R0917)
[refactor] 385-385: Too many local variables (32/15)
(R0914)
[refactor] 385-385: Too many branches (14/12)
(R0912)
[refactor] 385-385: Too many statements (54/50)
(R0915)
[error] 576-576: Value 'np.ndarray' is unsubscriptable
(E1136)
[refactor] 913-913: Too many local variables (23/15)
(R0914)
[error] 1025-1025: Value 'np.ndarray' is unsubscriptable
(E1136)
[error] 1097-1097: Value 'np.ndarray' is unsubscriptable
(E1136)
🪛 Ruff (0.11.9)
tests/test_stenciling.py
67-67: Function name test_Se_20_Å_supercell contains a non-ASCII character
(PLC2401)
⏰ Context from checks skipped due to timeout of 90000ms (20)
- GitHub Check: test (ubuntu-latest, 3.13, 10)
- GitHub Check: test (ubuntu-latest, 3.13, 7)
- GitHub Check: test (ubuntu-latest, 3.13, 4)
- GitHub Check: test (ubuntu-latest, 3.13, 6)
- GitHub Check: test (ubuntu-latest, 3.13, 9)
- GitHub Check: test (ubuntu-latest, 3.13, 3)
- GitHub Check: test (ubuntu-latest, 3.10, 9)
- GitHub Check: test (ubuntu-latest, 3.13, 8)
- GitHub Check: test (ubuntu-latest, 3.13, 1)
- GitHub Check: test (ubuntu-latest, 3.13, 5)
- GitHub Check: test (ubuntu-latest, 3.10, 6)
- GitHub Check: test (ubuntu-latest, 3.10, 7)
- GitHub Check: test (ubuntu-latest, 3.10, 2)
- GitHub Check: test (ubuntu-latest, 3.13, 2)
- GitHub Check: test (ubuntu-latest, 3.10, 4)
- GitHub Check: test (ubuntu-latest, 3.10, 8)
- GitHub Check: test (ubuntu-latest, 3.10, 10)
- GitHub Check: test (ubuntu-latest, 3.10, 3)
- GitHub Check: test (ubuntu-latest, 3.10, 1)
- GitHub Check: test (ubuntu-latest, 3.10, 5)
🔇 Additional comments (11)
docs/doped.complexes.rst (1)
1-7: Documentation addition looks good
The Sphinxautomoduledirective correctly exposesdoped.complexeswith members, undocumented items, and inheritance. No issues found.pyproject.toml (1)
26-26: LGTM!The shakenbreak version update to >=3.4.2 is appropriate for the StructureMatcher efficiency improvements needed by the new complex defect functionality.
docs/doped.rst (1)
20-20: Documentation correctly updated.The new
doped.complexessubmodule is properly added to the documentation structure in alphabetical order.tests/data/Ga2O3_R3c_POSCAR (1)
1-39: Valid POSCAR test data file.The Ga2O3 R3c structure file is properly formatted with correct atom counts, fractional coordinates, and helpful oxidation state annotations for defect complex testing.
tests/test_stenciling.py (1)
93-110: Well-structured test with proper assertions and cleanup.The test implementation includes:
- Proper warning handling and verification
- Structure matching with tight tolerances
- Minimum distance validation
- File cleanup after assertions
Good test practices demonstrated here.
tests/test_complexes.py (5)
1-10: LGTM!The module docstring and imports are well-organized and appropriate for a test module.
40-49: LGTM!The helper function correctly handles file and directory removal with appropriate existence checks.
62-116: Well-designed test for periodicity-breaking edge case.This test effectively validates that the defect multiplicities are calculated correctly even with periodicity-breaking supercells, and demonstrates where the standard pymatgen approach fails. Good test coverage!
162-252: Comprehensive molecule matching tests.Excellent test coverage for molecule matching functionality, including translation, rotation, and permutation invariance. The tests properly validate both the custom
are_equivalent_moleculesfunction and compare it with pymatgen'sKabschMatcherbehavior.
254-269: LGTM!The helper functions are well-implemented. The rotation matrix calculation is mathematically correct.
doped/utils/stenciling.py (1)
1016-1018:Details
❓ Verification inconclusive
Resolve TODO about removed check.
The TODO comment indicates uncertainty about why a check was removed in commit
40f129f. This should be investigated and resolved.
🏁 Script executed:
#!/bin/bash # Check git history for context on why the check was removed git log -p --grep="40f129f" || git show 40f129fLength of output: 0
I’ll dig deeper into the Git history to find the context for when and why
_check_min_distwas removed.
🏁 Script executed:
#!/bin/bash # Verify if commit 40f129f exists and show its details; otherwise, show recent commits touching the file if git rev-parse --verify 40f129f >/dev/null 2>&1; then git show --stat 40f129f else echo "⚠️ Commit 40f129f not found in history" echo echo "Listing the most recent commits affecting doped/utils/stenciling.py:" git log -n 20 --pretty=format:"%h %ci %s" -- doped/utils/stenciling.py fiLength of output: 274
I’ll locate the exact file path and inspect its recent Git history for any related removals of
_check_min_dist.
🏁 Script executed:
#!/bin/bash # Find the file path for stenciling.py and show its recent commit history FILE_PATH=$(fd stenciling.py | head -n1) if [ -n "$FILE_PATH" ]; then echo "Found file at: $FILE_PATH" echo "Showing last 20 commits touching $FILE_PATH:" git log -n 20 --pretty=format:"%h %ci %s" -- "$FILE_PATH" else echo "❌ stenciling.py not found in repo" fiLength of output: 328
Investigate rationale behind
_check_min_distcall
We couldn’t locate commit40f129fin the current history, so it’s unclear why this check was removed. Please confirm whether_check_min_distshould be kept or can safely be dropped, then update or remove this TODO accordingly.• File: doped/utils/stenciling.py
Lines: 1016–1018# TODO: This check was removed before (in `40f129f`), not sure if for a proper reason -- check and # remove this note _check_min_dist(Structure.from_sites(bulk_outer_cell_sites), min_dist_tol, warning=False)
| def _get_single_valence_oxi_states(bulk_oxi_states: Structure | Composition | dict = False) -> dict: | ||
| if isinstance(bulk_oxi_states, Structure): | ||
| single_valence_oxi_states = { | ||
| el.symbol: el.oxi_state for el in bulk_oxi_states.composition.elements | ||
| } | ||
| elif isinstance(bulk_oxi_states, Composition): | ||
| single_valence_oxi_states = {el.symbol: el.oxi_state for el in bulk_oxi_states.elements} | ||
| elif isinstance(bulk_oxi_states, dict): | ||
| single_valence_oxi_states = bulk_oxi_states | ||
| else: | ||
| raise TypeError( | ||
| f"Input bulk_oxi_states must be a pymatgen Structure, Composition or dict, not " | ||
| f"{type(bulk_oxi_states)}." | ||
| ) | ||
|
|
||
| return single_valence_oxi_states | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Type hint should match default False
The signature declares bulk_oxi_states: Structure | Composition | dict but defaults it to False (a bool). This mismatch will always trigger the TypeError arm for the default. Either include bool in the annotation or switch the default to None (and handle None explicitly) to accurately reflect allowed input types.
🤖 Prompt for AI Agents
In doped/core.py around lines 2616 to 2632, the function
_get_single_valence_oxi_states has a type hint indicating bulk_oxi_states can be
Structure, Composition, or dict, but the default value is False (a bool),
causing a type mismatch and always triggering the TypeError. To fix this, change
the default value to None and update the type hint to include Optional for these
types. Also, add explicit handling for None in the function to avoid the
TypeError for the default case.
| # consistency with other parts of the code | ||
| # TODO: Useful test case could be trying to stencil split vacancies in new supercells... | ||
|
|
||
| def test_Se_20_Å_supercell(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove non-ASCII character from function name.
The function name contains the Ångström symbol (Å) which can cause issues in some environments. Consider using ASCII-only names.
-def test_Se_20_Å_supercell(self):
+def test_Se_20_A_supercell(self):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_Se_20_Å_supercell(self): | |
| - def test_Se_20_Å_supercell(self): | |
| + def test_Se_20_A_supercell(self): |
🧰 Tools
🪛 Ruff (0.11.9)
67-67: Function name test_Se_20_Å_supercell contains a non-ASCII character
(PLC2401)
🤖 Prompt for AI Agents
In tests/test_stenciling.py at line 67, the test function name contains a
non-ASCII character (Å). Rename the function to use only ASCII characters, for
example by replacing the Å symbol with 'A' or 'Angstrom' to ensure compatibility
across all environments.
| def test_symmetry_multiplicity_C2m_Ga2O3(self): | ||
| """ | ||
| Test symmetry and multiplicity analysis for split vacancies in C2/m | ||
| Ga2O3. | ||
| Reference values have been manually checked and stored in the info | ||
| dicts. | ||
| For C2/m Ga2O3, we have no periodicity breaking. Ga site symmetries are | ||
| all Cs, while we have Cs, C2h, C2 and C1 interstitial site symmetries. | ||
| """ | ||
| info_dict = self.C2m_Ga2O3_split_vac_info_dict | ||
| bulk_sga = get_sga(info_dict["bulk_supercell"]) | ||
| bulk_supercell_symm_ops = bulk_sga.get_symmetry_operations() | ||
| bulk_prim = get_primitive_structure(info_dict["bulk_supercell"]) | ||
| supercell_over_prim_factor = len(info_dict["bulk_supercell"]) / len(bulk_prim) | ||
| molecule_dict = {} | ||
|
|
||
| ml_calculated_split_vac_dicts = [ | ||
| (energy, subdict) | ||
| for cation_dict in info_dict.values() | ||
| if isinstance(cation_dict, dict) and "split_vacancies_energy_dict" in cation_dict | ||
| for energy, subdict in cation_dict["split_vacancies_energy_dict"].items() | ||
| if self.energy_key in subdict | ||
| ] | ||
|
|
||
| for energy, subdict in ml_calculated_split_vac_dicts: | ||
| orig_split_vacancy = generate_complex_from_defect_sites( | ||
| info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) | ||
| orig_split_vacancy_symm = point_symmetry_from_structure( | ||
| orig_split_vacancy, info_dict["bulk_supercell"], relaxed=True, verbose=False | ||
| ) | ||
| assert orig_split_vacancy_symm == subdict["unrelaxed_split_vac_symm"] | ||
|
|
||
| for v_idx in [1, 2]: | ||
| vac_symm = point_symmetry_from_site( | ||
| subdict[f"vac_defect_{v_idx}_site"], info_dict["bulk_supercell"] | ||
| ) | ||
| assert vac_symm == "Cs" # only Cs Ga sites in C2/m Ga2O3 | ||
| assert vac_symm == subdict[f"vac_{v_idx}_symm"] | ||
|
|
||
| int_symm = point_symmetry_from_site(subdict["interstitial_site"], info_dict["bulk_supercell"]) | ||
| assert int_symm in [ | ||
| "Cs", | ||
| "C2h", | ||
| "C2", | ||
| "C1", | ||
| ] # unrelaxed interstitial sites in C2/m Ga2O3 | ||
| assert int_symm == subdict["int_symm"] | ||
| relaxed_symm = point_symmetry_from_structure( | ||
| Structure.from_ase_atoms(subdict[self.energy_key]["struct_atoms"]), | ||
| info_dict["bulk_supercell"], | ||
| verbose=False, | ||
| ) | ||
| assert relaxed_symm == subdict["relaxed_split_vac_symm"] | ||
|
|
||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) | ||
| assert comp_mult == subdict["multiplicity"] | ||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| ) # same answer with efficiency options | ||
| assert comp_mult == subdict["multiplicity"] | ||
| supercell_comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| primitive_cell_multiplicity=False, | ||
| ) | ||
| assert supercell_comp_mult == subdict["multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| int_obj = Interstitial(info_dict["bulk_supercell"], subdict["interstitial_site"]) | ||
| v1_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_1_site"]) | ||
| v2_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_2_site"]) | ||
|
|
||
| assert int_obj.multiplicity == subdict["int_multiplicity"] * supercell_over_prim_factor | ||
| assert v1_obj.multiplicity == subdict["vac_1_multiplicity"] * supercell_over_prim_factor | ||
| assert v2_obj.multiplicity == subdict["vac_2_multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| point_symm_mult_dict = { | ||
| "Cs": 2, | ||
| "C2h": 1, | ||
| "C2": 2, | ||
| "C1": 4, | ||
| } | ||
| for v_idx, v_obj in enumerate([v1_obj, v2_obj]): | ||
| assert ( | ||
| v_obj.multiplicity | ||
| == point_symm_mult_dict[subdict[f"vac_{v_idx+1}_symm"]] * supercell_over_prim_factor | ||
| ) | ||
| assert ( | ||
| int_obj.multiplicity | ||
| == point_symm_mult_dict[subdict["int_symm"]] * supercell_over_prim_factor | ||
| ) | ||
|
|
||
| # this isn't universally true for complex defects, but is for the split vacancies: | ||
| assert ( | ||
| comp_mult | ||
| <= (int_obj.multiplicity * v1_obj.multiplicity * v2_obj.multiplicity) | ||
| / supercell_over_prim_factor**3 | ||
| ) | ||
|
|
||
| # in C2/m Ga2O3, each split vacancy has the same multiplicity as the interstitial site | ||
| assert comp_mult == int_obj.multiplicity / supercell_over_prim_factor | ||
|
|
||
| raw_complex_defect_sites = [ | ||
| subdict["vac_defect_1_site"], | ||
| subdict["vac_defect_2_site"], | ||
| subdict["interstitial_site"], | ||
| ] | ||
| molecule_dict[round(float(energy), 2)] = Molecule( | ||
| [site.species for site in raw_complex_defect_sites], | ||
| info_dict["bulk_supercell"].lattice.get_cartesian_coords( | ||
| [ | ||
| site.frac_coords + raw_complex_defect_sites[0].distance_and_image(site)[1] | ||
| for site in raw_complex_defect_sites | ||
| ] | ||
| ), | ||
| ) | ||
|
|
||
| # Here the `relaxed` -3538.32 structure goes to the -3540.98 structure, so can use this in future | ||
| # tests of relaxed complex defect site determinations! (TODO) | ||
| for key, mol in molecule_dict.items(): | ||
| for other_key, other_mol in molecule_dict.items(): | ||
| if key != other_key: | ||
| assert not are_equivalent_molecules(mol, other_mol) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider breaking down this complex test method.
With 28 local variables, this method is difficult to follow. Similar refactoring as suggested above would improve maintainability.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 403-403: Too many local variables (28/15)
(R0914)
🤖 Prompt for AI Agents
In tests/test_complexes.py from lines 403 to 541, the
test_symmetry_multiplicity_C2m_Ga2O3 method is overly complex with 28 local
variables, making it hard to follow. Refactor by splitting this large test into
smaller focused helper methods or sub-tests, each handling a specific part of
the logic such as symmetry checks, multiplicity assertions, and molecule
equivalence tests. This modular approach will improve readability and
maintainability without changing the test logic.
| def test_symmetry_multiplicity_R3c_Ga2O3(self): | ||
| """ | ||
| Test symmetry and multiplicity analysis for split vacancies in R-3c | ||
| Ga2O3. | ||
| Reference values have been manually checked and stored in the info | ||
| dicts. | ||
| For R-3c Ga2O3, we have periodicity breaking in the generated supercell | ||
| which breaks the rotational (but not inversion) symmetries, so C3i -> | ||
| Ci, C2 -> C1. Useful test case for future stenciling to avoid | ||
| periodicity breaking. | ||
| """ | ||
| info_dict = self.R3c_Ga2O3_split_vac_info_dict | ||
| bulk_sga = get_sga(info_dict["bulk_supercell"]) | ||
| bulk_supercell_symm_ops = bulk_sga.get_symmetry_operations() | ||
| bulk_prim = get_primitive_structure(info_dict["bulk_supercell"]) | ||
| supercell_over_prim_factor = len(info_dict["bulk_supercell"]) / len(bulk_prim) | ||
| molecule_dict = {} | ||
|
|
||
| ml_calculated_split_vac_dicts = [ | ||
| (energy, subdict) | ||
| for cation_dict in info_dict.values() | ||
| if isinstance(cation_dict, dict) and "split_vacancies_energy_dict" in cation_dict | ||
| for energy, subdict in cation_dict["split_vacancies_energy_dict"].items() | ||
| if self.energy_key in subdict | ||
| ] | ||
|
|
||
| for energy, subdict in ml_calculated_split_vac_dicts: | ||
| orig_split_vacancy = generate_complex_from_defect_sites( | ||
| info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) | ||
| orig_split_vacancy_symm = point_symmetry_from_structure( | ||
| orig_split_vacancy, info_dict["bulk_supercell"], relaxed=True, verbose=False | ||
| ) | ||
| assert orig_split_vacancy_symm == subdict["unrelaxed_split_vac_symm"] | ||
|
|
||
| for v_idx in [1, 2]: | ||
| vac_symm = point_symmetry_from_site( | ||
| subdict[f"vac_defect_{v_idx}_site"], info_dict["bulk_supercell"] | ||
| ) | ||
| assert vac_symm == "C3" # only C3 Ga sites in R-3c Ga2O3 | ||
| assert vac_symm == subdict[f"vac_{v_idx}_symm"] | ||
|
|
||
| int_symm = point_symmetry_from_site(subdict["interstitial_site"], info_dict["bulk_supercell"]) | ||
| assert int_symm in [ | ||
| "C2", | ||
| "C3i", | ||
| "D3", | ||
| ] # unrelaxed interstitial sites in R-3c Ga2O3 | ||
| assert int_symm == subdict["int_symm"] | ||
| relaxed_symm = point_symmetry_from_structure( | ||
| Structure.from_ase_atoms(subdict[self.energy_key]["struct_atoms"]), | ||
| info_dict["bulk_supercell"], | ||
| verbose=False, | ||
| ) | ||
| assert relaxed_symm == subdict["relaxed_split_vac_symm"] | ||
|
|
||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) | ||
| assert comp_mult == subdict["multiplicity"] | ||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| ) # same answer with efficiency options | ||
| assert comp_mult == subdict["multiplicity"] | ||
| supercell_comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| primitive_cell_multiplicity=False, | ||
| ) | ||
| assert supercell_comp_mult == subdict["multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| int_obj = Interstitial(info_dict["bulk_supercell"], subdict["interstitial_site"]) | ||
| v1_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_1_site"]) | ||
| v2_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_2_site"]) | ||
|
|
||
| assert int_obj.multiplicity == subdict["int_multiplicity"] * supercell_over_prim_factor | ||
| assert v1_obj.multiplicity == subdict["vac_1_multiplicity"] * supercell_over_prim_factor | ||
| assert v2_obj.multiplicity == subdict["vac_2_multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| # this isn't universally true for complex defects, but is for the split vacancies: | ||
| assert ( | ||
| comp_mult | ||
| <= (int_obj.multiplicity * v1_obj.multiplicity * v2_obj.multiplicity) | ||
| / supercell_over_prim_factor**3 | ||
| ) | ||
|
|
||
| raw_complex_defect_sites = [ | ||
| subdict["vac_defect_1_site"], | ||
| subdict["vac_defect_2_site"], | ||
| subdict["interstitial_site"], | ||
| ] | ||
| molecule_dict[round(float(energy), 2)] = Molecule( | ||
| [site.species for site in raw_complex_defect_sites], | ||
| info_dict["bulk_supercell"].lattice.get_cartesian_coords( | ||
| [ | ||
| site.frac_coords + raw_complex_defect_sites[0].distance_and_image(site)[1] | ||
| for site in raw_complex_defect_sites | ||
| ] | ||
| ), | ||
| ) | ||
|
|
||
| # in this case, two of the molecules are actually equivalent, with slightly different ES energies | ||
| # because of the periodicity breaking in the generated supercell: | ||
| for key, mol in molecule_dict.items(): | ||
| if key in [-4245.99, -4246.28]: | ||
| assert are_equivalent_molecules(molecule_dict[-4245.99], mol) | ||
| else: | ||
| assert not are_equivalent_molecules(molecule_dict[-4245.99], mol) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to reduce method complexity.
This test method has 24 local variables (pylint warns about >15). Consider extracting helper methods to improve readability and maintainability:
- Extract molecule creation logic into a helper method
- Extract symmetry validation logic into a separate method
- Consider using a dataclass or named tuple for the subdict data structure
Example refactoring approach:
def _validate_site_symmetries(self, subdict, bulk_supercell):
"""Validate vacancy and interstitial site symmetries."""
for v_idx in [1, 2]:
vac_symm = point_symmetry_from_site(
subdict[f"vac_defect_{v_idx}_site"], bulk_supercell
)
assert vac_symm == "C3" # only C3 Ga sites in R-3c Ga2O3
assert vac_symm == subdict[f"vac_{v_idx}_symm"]
int_symm = point_symmetry_from_site(subdict["interstitial_site"], bulk_supercell)
assert int_symm in ["C2", "C3i", "D3"]
assert int_symm == subdict["int_symm"]
return int_symm
def _create_molecule_from_sites(self, raw_sites, lattice):
"""Create molecule from defect sites."""
return Molecule(
[site.species for site in raw_sites],
lattice.get_cartesian_coords([
site.frac_coords + raw_sites[0].distance_and_image(site)[1]
for site in raw_sites
])
)🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 281-281: Too many local variables (24/15)
(R0914)
🤖 Prompt for AI Agents
In tests/test_complexes.py from lines 281 to 402, the
test_symmetry_multiplicity_R3c_Ga2O3 method is overly complex with 24 local
variables, making it hard to read and maintain. Refactor by extracting the
molecule creation logic into a helper method that takes defect sites and lattice
to return a Molecule instance, and extract the symmetry validation logic into a
separate helper method that asserts vacancy and interstitial site symmetries.
Additionally, consider using a dataclass or named tuple to represent the subdict
data structure for clarity and easier access to its fields.
| def test_symmetry_multiplicity_tricky_structures(self): | ||
| """ | ||
| Test symmetry and multiplicity analysis for split vacancies in some | ||
| complex structures. | ||
| Some of these had issues with initial symmetry/multiplicity analysis | ||
| prior to robustness updates. Here we test that the analysis proceeds | ||
| fine and matches expected values. | ||
| """ | ||
| database_folder = f"{data_dir}/Split_Vacancies" | ||
| formula_mpid_files = [i for i in os.listdir(database_folder) if "Ga2O3" not in i] | ||
| for formula_mpid_file in formula_mpid_files: | ||
| info_dict = loadfn(f"{database_folder}/{formula_mpid_file}") | ||
| bulk_sga = get_sga(info_dict["bulk_supercell"]) | ||
| bulk_supercell_symm_ops = bulk_sga.get_symmetry_operations() | ||
| bulk_prim = get_primitive_structure(info_dict["bulk_supercell"]) | ||
| supercell_over_prim_factor = len(info_dict["bulk_supercell"]) / len(bulk_prim) | ||
| molecule_dict = {} | ||
|
|
||
| ml_calculated_split_vac_dicts = [ | ||
| (energy, subdict) | ||
| for cation_dict in info_dict.values() | ||
| if isinstance(cation_dict, dict) and "split_vacancies_energy_dict" in cation_dict | ||
| for energy, subdict in cation_dict["split_vacancies_energy_dict"].items() | ||
| if self.energy_key in subdict | ||
| ] | ||
| if "GeI2" not in formula_mpid_file and "Li3Ti4O8" not in formula_mpid_file: | ||
| # take a random subset of max 10: | ||
| ml_calculated_split_vac_dicts = random.sample( | ||
| ml_calculated_split_vac_dicts, min(10, len(ml_calculated_split_vac_dicts)) | ||
| ) | ||
| for energy, subdict in ml_calculated_split_vac_dicts: | ||
| orig_split_vacancy = generate_complex_from_defect_sites( | ||
| info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) | ||
| orig_split_vacancy_symm = point_symmetry_from_structure( | ||
| orig_split_vacancy, info_dict["bulk_supercell"], relaxed=True, verbose=False | ||
| ) | ||
| assert orig_split_vacancy_symm == subdict["unrelaxed_split_vac_symm"] | ||
|
|
||
| for v_idx in [1, 2]: | ||
| vac_symm = point_symmetry_from_site( | ||
| subdict[f"vac_defect_{v_idx}_site"], info_dict["bulk_supercell"] | ||
| ) | ||
| assert vac_symm == subdict[f"vac_{v_idx}_symm"] | ||
|
|
||
| int_symm = point_symmetry_from_site( | ||
| subdict["interstitial_site"], info_dict["bulk_supercell"] | ||
| ) | ||
| assert int_symm == subdict["int_symm"] | ||
| relaxed_symm = point_symmetry_from_structure( | ||
| Structure.from_ase_atoms(subdict[self.energy_key]["struct_atoms"]), | ||
| info_dict["bulk_supercell"], | ||
| verbose=False, | ||
| ) | ||
| assert relaxed_symm == subdict["relaxed_split_vac_symm"] | ||
|
|
||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| ) # this breaks if we get a multiplicity less than expected minimum | ||
| assert comp_mult == subdict["multiplicity"] | ||
| comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| ) # same answer with efficiency options | ||
| assert comp_mult == subdict["multiplicity"] | ||
| supercell_comp_mult = get_complex_defect_multiplicity( | ||
| bulk_supercell=info_dict["bulk_supercell"], | ||
| vacancy_sites=[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"]], | ||
| interstitial_sites=[subdict["interstitial_site"]], | ||
| primitive_structure=bulk_prim, | ||
| supercell_symm_ops=bulk_supercell_symm_ops, | ||
| primitive_cell_multiplicity=False, | ||
| ) | ||
| assert supercell_comp_mult == subdict["multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| int_obj = Interstitial(info_dict["bulk_supercell"], subdict["interstitial_site"]) | ||
| v1_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_1_site"]) | ||
| v2_obj = Vacancy(info_dict["bulk_supercell"], subdict["vac_defect_2_site"]) | ||
|
|
||
| assert int_obj.multiplicity == subdict["int_multiplicity"] * supercell_over_prim_factor | ||
| assert v1_obj.multiplicity == subdict["vac_1_multiplicity"] * supercell_over_prim_factor | ||
| assert v2_obj.multiplicity == subdict["vac_2_multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| for v_idx, v_obj in enumerate([v1_obj, v2_obj]): | ||
| assert ( | ||
| v_obj.multiplicity | ||
| == subdict[f"vac_{v_idx+1}_multiplicity"] * supercell_over_prim_factor | ||
| ) | ||
| assert int_obj.multiplicity == subdict["int_multiplicity"] * supercell_over_prim_factor | ||
|
|
||
| # this isn't universally true for complex defects, but is for the split vacancies: | ||
| assert ( | ||
| comp_mult | ||
| <= (int_obj.multiplicity * v1_obj.multiplicity * v2_obj.multiplicity) | ||
| / supercell_over_prim_factor**3 | ||
| ) | ||
| assert comp_mult >= int_obj.multiplicity / supercell_over_prim_factor | ||
|
|
||
| raw_complex_defect_sites = [ | ||
| subdict["vac_defect_1_site"], | ||
| subdict["vac_defect_2_site"], | ||
| subdict["interstitial_site"], | ||
| ] | ||
| molecule_dict[round(float(energy), 2)] = Molecule( | ||
| [site.species for site in raw_complex_defect_sites], | ||
| info_dict["bulk_supercell"].lattice.get_cartesian_coords( | ||
| [ | ||
| site.frac_coords + raw_complex_defect_sites[0].distance_and_image(site)[1] | ||
| for site in raw_complex_defect_sites | ||
| ] | ||
| ), | ||
| ) | ||
|
|
||
| eq_mol_count = 0 | ||
| for key, mol in molecule_dict.items(): | ||
| for other_key, other_mol in molecule_dict.items(): | ||
| if ( # same cations, different molecules | ||
| sorted(mol.atomic_numbers) == sorted(other_mol.atomic_numbers) and key != other_key | ||
| ) and are_equivalent_molecules(mol, other_mol): | ||
| eq_mol_count += 1 | ||
|
|
||
| if "GeI2" not in formula_mpid_file and "Li3Ti4O8" not in formula_mpid_file: | ||
| assert eq_mol_count == 0 | ||
| elif "GeI2" in formula_mpid_file: | ||
| assert eq_mol_count == 8 | ||
| elif "Li3Ti4O8" in formula_mpid_file: | ||
| assert eq_mol_count == 80 | ||
| # Note that a lot of these have matching equivalent relaxed complexes, so would make good test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extract common test logic to reduce complexity.
This method has 31 local variables and significant duplication with the previous test methods. Consider creating a base test method that handles the common validation logic.
Consider creating a shared validation method:
def _validate_split_vacancy_data(self, info_dict, energy_key, expected_eq_mol_count=0):
"""Common validation logic for split vacancy data."""
bulk_sga = get_sga(info_dict["bulk_supercell"])
bulk_supercell_symm_ops = bulk_sga.get_symmetry_operations()
bulk_prim = get_primitive_structure(info_dict["bulk_supercell"])
# Extract and validate split vacancy data
ml_calculated_split_vac_dicts = self._extract_ml_split_vac_dicts(info_dict, energy_key)
molecule_dict = {}
for energy, subdict in ml_calculated_split_vac_dicts:
self._validate_single_split_vacancy(subdict, info_dict, bulk_prim, bulk_supercell_symm_ops)
molecule_dict[round(float(energy), 2)] = self._create_molecule_from_sites(
[subdict["vac_defect_1_site"], subdict["vac_defect_2_site"], subdict["interstitial_site"]],
info_dict["bulk_supercell"].lattice
)
# Validate molecule equivalences
eq_mol_count = self._count_equivalent_molecules(molecule_dict)
assert eq_mol_count == expected_eq_mol_count🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 542-542: Too many local variables (31/15)
(R0914)
🤖 Prompt for AI Agents
In tests/test_complexes.py lines 542 to 677, the
test_symmetry_multiplicity_tricky_structures method is overly complex with many
local variables and duplicated logic from other tests. Refactor by extracting
the common validation logic into a separate helper method, for example
_validate_split_vacancy_data, which takes info_dict, energy_key, and
expected_eq_mol_count as parameters. Move the symmetry, multiplicity, and
molecule equivalence checks into this helper, and call it from the test method
to reduce duplication and improve readability.
| for frac_coords in get_equiv_frac_coords_in_primitive( | ||
| site.frac_coords, | ||
| primitive_structure, | ||
| bulk_supercell, | ||
| symm_ops=supercell_symm_ops, # only small speed up (due to caching) | ||
| symprec=symprec, | ||
| dist_tol=dist_tol, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify that get_equiv_frac_coords_in_primitive accepts the keyword arguments.
Static analysis indicates that symm_ops and dist_tol may be unexpected keyword arguments for this function.
🏁 Script executed:
#!/bin/bash
# Search for the function definition to verify accepted parameters
ast-grep --pattern 'def get_equiv_frac_coords_in_primitive($$$)'Length of output: 15931
Invalid keyword arguments passed to get_equiv_frac_coords_in_primitive
The signature of get_equiv_frac_coords_in_primitive (see doped/utils/symmetry.py:1223) is:
def get_equiv_frac_coords_in_primitive(
frac_coords,
primitive,
supercell,
symprec=0.01,
dist_tol_factor=1.0,
equiv_coords=True,
return_symprec_and_dist_tol_factor=False,
fixed_symprec_and_dist_tol_factor=False,
verbose=False,
) -> ...It does not accept symm_ops or dist_tol. Please update the call in doped/complexes.py accordingly:
- Remove the
symm_ops=supercell_symm_opsargument. - Replace
dist_tol=…withdist_tol_factor=…(or drop it if you intend to use the default).
Location:
- File:
doped/complexes.py - Lines: ~366–373
Suggested diff:
- for frac_coords in get_equiv_frac_coords_in_primitive(
- site.frac_coords,
- primitive_structure,
- bulk_supercell,
- symm_ops=supercell_symm_ops,
- symprec=symprec,
- dist_tol=dist_tol,
- )
+ for frac_coords in get_equiv_frac_coords_in_primitive(
+ site.frac_coords,
+ primitive_structure,
+ bulk_supercell,
+ symprec=symprec,
+ dist_tol_factor=dist_tol_factor, # if you want to override the default
+ )🧰 Tools
🪛 Pylint (3.3.7)
[error] 366-373: Unexpected keyword argument 'symm_ops' in function call
(E1123)
[error] 366-373: Unexpected keyword argument 'dist_tol' in function call
(E1123)
🤖 Prompt for AI Agents
In doped/complexes.py around lines 366 to 373, the call to
get_equiv_frac_coords_in_primitive incorrectly passes symm_ops and dist_tol as
keyword arguments, which are not accepted by the function. Remove the
symm_ops=supercell_symm_ops argument entirely and replace dist_tol=dist_tol with
dist_tol_factor=dist_tol if you want to specify it, or omit it to use the
default value. Adjust the call accordingly to match the function signature in
doped/utils/symmetry.py.
|
|
||
| def _convert_defect_neighbours_to_X( | ||
| defect_supercell: Structure, | ||
| defect_position: np.ndarray[float], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix all remaining numpy array type hints.
Multiple functions have the same invalid numpy type hint syntax.
Apply this pattern throughout the file:
# Line 576
- defect_position: np.ndarray[float],
+ defect_position: np.ndarray,
# Line 1025
- defect_frac_coords: np.ndarray[float] | list[float] | None = None,
+ defect_frac_coords: np.ndarray | list[float] | None = None,
# Line 1097
- lattice: Lattice, cart_coords: np.ndarray[float] | list[float], tol: float = 1e-5
+ lattice: Lattice, cart_coords: np.ndarray | list[float], tol: float = 1e-5Also applies to: 1025-1025, 1097-1097
🧰 Tools
🪛 Pylint (3.3.7)
[error] 576-576: Value 'np.ndarray' is unsubscriptable
(E1136)
🤖 Prompt for AI Agents
In doped/utils/stenciling.py at lines 576, 1025, and 1097, the numpy array type
hints use an invalid syntax like np.ndarray[float]. Replace these with the
correct typing syntax by importing numpy.typing as npt and using npt.NDArray
with the appropriate dtype, for example npt.NDArray[np.float64]. Update all such
type hints in these lines accordingly.
| def get_defect_in_supercell( | ||
| defect_entry: DefectEntry, | ||
| target_supercell: Structure, | ||
| check_bulk: bool = True, | ||
| target_frac_coords: np.ndarray[float] | list[float] | bool = True, | ||
| edge_tol: float = 1, | ||
| min_dist_tol_factor: float = 0.99, | ||
| ) -> tuple[Structure, Structure]: | ||
| """ | ||
| Re-generate a relaxed defect structure in a different supercell. | ||
| This function takes the relaxed defect structure of the input | ||
| ``DefectEntry`` (from ``DefectEntry.defect_supercell``) and re-generates it | ||
| in the ``target_supercell`` structure, and the closest possible position to | ||
| ``target_frac_coords`` (default is the supercell centre = [0.5, 0.5, 0.5]), | ||
| also providing the corresponding bulk supercell (which should be the same | ||
| for each generated defect supercell given the same ``target_supercell`` and | ||
| base supercell for ``defect_entry``, see note below). | ||
| ``target_supercell`` should be the same host crystal structure, just with | ||
| different supercell dimensions, having the same lattice parameters and bond | ||
| lengths. | ||
| Note: This function does `not` guarantee that the generated defect | ||
| supercell atomic position basis exactly matches that of | ||
| ``target_supercell``, which may have come from a different primitive | ||
| structure definition (e.g. CdTe with | ||
| ``{"Cd": [0,0,0], "Te": [0.25,0.25,0.25]}`` vs | ||
| ``{"Cd": [0,0,0], "Te": [0.75,0.75,0.75]}``). The generated supercell | ||
| `will` have the exact same lattice/cell definition with fully | ||
| symmetry-equivalent atom positions, but if the actual position basis | ||
| differs then this can cause issues with parsing finite-size corrections | ||
| (which rely on site-matched potentials). This is perfectly fine if it | ||
| occurs, just will require the use of a matching bulk/reference supercell | ||
| when parsing (rather than the input ``target_supercell``) -- ``doped`` will | ||
| also throw a warning about this when parsing if a non-matching bulk | ||
| supercell is used anyway. This function will automatically check if the | ||
| position basis in the generated supercell differs from that of | ||
| ``target_supercell``, printing a warning if so (unless ``check_bulk`` is | ||
| ``False``) and returning the corresponding bulk supercell which should be | ||
| used for parsing defect calculations with the generated supercell. Of | ||
| course, if generating multiple defects in the same ``target_supercell``, | ||
| only one such bulk supercell calculation should be required (should | ||
| correspond to the same bulk supercell in each case). | ||
| Briefly, this function works by: | ||
| - Translating the defect site to the centre of the original supercell. | ||
| - Identifying a super-supercell which fully encompasses the target | ||
| supercell (regardless of orientation). | ||
| - Generate this super-supercell, using one copy of the original defect | ||
| supercell (``DefectEntry.defect_supercell``), and the rest of the sites | ||
| (outside of the original defect supercell box, with the defect translated | ||
| to the centre) are populated using the bulk supercell | ||
| (``DefectEntry.bulk_supercell``). | ||
| - Translate the defect site in this super-supercell to the Cartesian | ||
| coordinates of the centre of ``target_supercell``, then stencil out all | ||
| sites in the ``target_supercell`` portion of the super-supercell, | ||
| accounting for possible site displacements in the relaxed defect | ||
| supercell (e.g. if ``target_supercell`` has a different shape and does | ||
| not fully encompass the original defect supercell). | ||
| This is done by scanning over possible combinations of sites near the | ||
| boundary regions of the ``target_supercell`` portion, and identifying the | ||
| combination which maximises the minimum inter-atomic distance in the new | ||
| supercell (i.e. the most bulk-like arrangement). | ||
| - Re-orient this new stenciled supercell to match the orientation and site | ||
| positions of ``target_supercell``. | ||
| - If ``target_frac_coords`` is not ``False``, scan over all symmetry | ||
| operations of ``target_supercell`` and apply that which places the defect | ||
| site closest to ``target_frac_coords``. | ||
| Args: | ||
| defect_entry (DefectEntry): | ||
| A ``DefectEntry`` object for which to re-generate the relaxed | ||
| structure (taken from ``DefectEntry.defect_supercell``) in the | ||
| ``target_supercell`` lattice. | ||
| target_supercell (Structure): | ||
| The supercell structure to re-generate the relaxed defect structure | ||
| in. | ||
| check_bulk (bool): | ||
| Whether to check if the generated defect/bulk supercells have | ||
| different atomic position bases to ``target_supercell`` (as | ||
| described above) -- if so, a warning will be printed (unless | ||
| ``check_bulk`` is ``False``). Default is ``True``. | ||
| target_frac_coords (Union[np.ndarray[float], list[float], bool]): | ||
| The fractional coordinates to target for defect placement in the | ||
| new supercell. If just set to ``True`` (default), will try to place | ||
| the defect nearest to the centre of the superset cell (i.e. | ||
| ``target_frac_coords = [0.5, 0.5, 0.5]``), as is default in | ||
| ``doped`` defect generation. Note that defect placement is harder | ||
| in this case than in generation with ``DefectsGenerator``, as we | ||
| are not starting from primitive cells and we are working with | ||
| relaxed geometries. | ||
| edge_tol (float): | ||
| A tolerance (in Angstrom) for site displacements at the edge of the | ||
| stenciled supercell, when determining the best match of sites to | ||
| stencil out in the new supercell (of ``target_supercell`` | ||
| dimension). Default is 1 Angstrom, and then this is sequentially | ||
| increased up to 4.5 Angstrom if the initial scan fails. | ||
| min_dist_tol_factor (float): | ||
| Tolerance factor when checking the minimum interatomic distance in | ||
| the stenciled defect supercell, as a factor of the minimum distance | ||
| in the original ``DefectEntry.defect_supercell``. Default is 0.99. | ||
| Returns: | ||
| tuple[Structure, Structure]: | ||
| The re-generated defect supercell in the ``target_supercell`` | ||
| lattice, and the corresponding bulk/reference supercell for the | ||
| generated defect supercell (see explanations above). | ||
| """ | ||
| # Note to self; using Pycharm breakpoints throughout is likely easiest for debugging (TODO) | ||
| # TODO: Tests!! (At least one of each defect, Se good test case, then at least one or two with | ||
| # >unary compositions and extrinsic substitution/interstitial) | ||
| # TODO: We should now be able to use these functions (without the final re-orientation step, | ||
| # for speed) to determine the point symmetries of relaxed defects in non-symmetry-conserving | ||
| # supercells, by stenciling into a small symmetry-conserving cell and getting the point symmetry | ||
| # for that -- will do! | ||
| # TODO: When releasing, note that this code is useful for supercell size convergence tests, and | ||
| # accelerating `ShakeNBreak` etc. If/when adding, make sure to link in `SnB` docs as well. Should have | ||
| # some notes in docs/tutorials about this functionality, maybe with a quick example (e.g. Se results). | ||
| # Likewise, the displacements functions are useful as a validation / check of supercell size | ||
| # convergence, and for quantifying the strain / distortion introduced by a certain defect | ||
|
|
||
| pbar = tqdm( | ||
| total=100, bar_format="{desc}{percentage:.1f}%|{bar}| [{elapsed}, {rate_fmt}{postfix}]" | ||
| ) # tqdm progress bar. 100% is completion | ||
| pbar.set_description("Getting super-supercell (relaxed defect + bulk sites)") | ||
|
|
||
| bulk_mismatch_warning = False | ||
|
|
||
| try: | ||
| orig_supercell = _get_defect_supercell(defect_entry) | ||
| orig_min_dist = min_dist(orig_supercell) | ||
| min_dist_tol = orig_min_dist * min_dist_tol_factor | ||
| orig_bulk_supercell = _get_bulk_supercell(defect_entry) | ||
| orig_defect_frac_coords = defect_entry.sc_defect_frac_coords | ||
| target_supercell = target_supercell.copy() | ||
| bulk_min_bond_length = min_dist(orig_bulk_supercell) | ||
| bulk_min_dist_tol = bulk_min_bond_length * min_dist_tol_factor | ||
| target_frac_coords = [0.5, 0.5, 0.5] if target_frac_coords is True else target_frac_coords | ||
|
|
||
| # ensure no oxidation states (for easy composition matching later) | ||
| for struct in [orig_supercell, orig_bulk_supercell, target_supercell]: | ||
| struct.remove_oxidation_states() | ||
|
|
||
| # first translate both orig supercells to put defect in the middle, to aid initial stenciling: | ||
| orig_def_to_centre = np.array([0.5, 0.5, 0.5]) - orig_defect_frac_coords | ||
| orig_supercell = translate_structure(orig_supercell, orig_def_to_centre, frac_coords=True) | ||
| trans_orig_bulk_supercell = translate_structure( | ||
| orig_bulk_supercell, orig_def_to_centre, frac_coords=True | ||
| ) | ||
|
|
||
| # get big_supercell, which is expanded version of orig_supercell so that _each_ lattice vector is | ||
| # now bigger than the _largest_ lattice vector in target_supercell (so that target_supercell is | ||
| # fully encompassed by big_supercell): | ||
| superset_matrix, big_supercell, big_supercell_with_X, orig_supercell_with_X = ( # supa-set!! | ||
| _get_superset_matrix_and_supercells(orig_supercell, target_supercell, [0.5, 0.5, 0.5]) | ||
| ) | ||
| big_bulk_supercell = orig_bulk_supercell * superset_matrix # get big bulk supercell | ||
|
|
||
| # this big_supercell is with the defect now repeated in it, but we want it with just one defect, | ||
| # so only take the first repeated defect supercell, and then the rest of the sites from the | ||
| # expanded bulk supercell: | ||
| # only keep atoms in big supercell which are within the original supercell bounds: | ||
| big_defect_supercell = _get_matching_sites_from_s1_then_s2( | ||
| orig_supercell_with_X, | ||
| big_supercell_with_X, | ||
| trans_orig_bulk_supercell * superset_matrix, | ||
| min_dist_tol, | ||
| ) | ||
|
|
||
| # translate structure to put defect at the centre of the big supercell (w/frac_coords) | ||
| big_supercell_defect_site = next(s for s in big_defect_supercell.sites if s.specie.symbol == "X") | ||
| def_to_centre = np.array([0.5, 0.5, 0.5]) - big_supercell_defect_site.frac_coords | ||
| big_defect_supercell = translate_structure(big_defect_supercell, def_to_centre, frac_coords=True) | ||
|
|
||
| pbar.update(20) # 20% of progress bar | ||
| pbar.set_description("Getting sites in border region") | ||
|
|
||
| # get all atoms in big supercell within the cartesian bounds of the target_supercell supercell: | ||
| new_defect_supercell = _stencil_target_cell_from_big_cell( | ||
| big_defect_supercell, | ||
| target_supercell, | ||
| bulk_min_bond_length=bulk_min_bond_length, | ||
| min_dist_tol=min_dist_tol, | ||
| edge_tol=edge_tol, | ||
| pbar=pbar, | ||
| ) | ||
| new_bulk_supercell = _stencil_target_cell_from_big_cell( | ||
| big_bulk_supercell, | ||
| target_supercell, | ||
| bulk_min_bond_length=bulk_min_bond_length, | ||
| min_dist_tol=bulk_min_dist_tol, | ||
| edge_tol=1e-3, | ||
| pbar=None, | ||
| ) # shouldn't need `edge_tol`, should be much faster than defect supercell stencil | ||
|
|
||
| pbar.update(15) # 55% of progress bar | ||
| pbar.set_description("Ensuring matching orientation w/target_supercell") | ||
|
|
||
| # now we just do get s2 like s1 to get the orientation right: | ||
| defect_site = next(site for site in new_defect_supercell if site.specie.symbol == "X") | ||
|
|
||
| # Note: this function is typically the main bottleneck in this workflow. We have already | ||
| # optimised the underlying ``StructureMatcher`` workflow in many ways (caching, | ||
| # fast structure/site/composition comparisons, skipping comparison of defect neighbourhood to | ||
| # reduce requisite ``stol`` etc; being many orders of magnitude faster than the base | ||
| # ``pymatgen`` ``StructureMatcher``), however the ``_cart_dists()`` function call is | ||
| # still quite expensive, especially with large structures with significant noise in the atomic | ||
| # positions... | ||
| new_defect_supercell_w_defect_neighbours_as_X = _convert_defect_neighbours_to_X( | ||
| new_defect_supercell, defect_site.frac_coords, coords_are_cartesian=False | ||
| ) | ||
| with warnings.catch_warnings(): | ||
| warnings.filterwarnings("ignore", message="Not all sites have property orig_species") | ||
| # first we orient the generated _bulk_ supercell to match the ``target_supercell``, | ||
| # to try ensure consistency in the generated supercells | ||
| oriented_new_bulk_supercell = orient_s2_like_s1( # speed should be >= defect orienting | ||
| target_supercell, | ||
| new_bulk_supercell, | ||
| verbose=False, | ||
| allow_subset=True, | ||
| ) | ||
| # return new_defect_supercell, _round_struct_coords(oriented_new_bulk_supercell, | ||
| # dist_precision=0.01, to_unit_cell=True) | ||
| oriented_new_defect_supercell_w_defect_neighbours_as_X = orient_s2_like_s1( | ||
| oriented_new_bulk_supercell, | ||
| new_defect_supercell_w_defect_neighbours_as_X, | ||
| verbose=False, | ||
| ignored_species=["X"], # ignore X site | ||
| allow_subset=True, # allow defect supercell composition to differ | ||
| ) | ||
| oriented_new_defect_supercell = _convert_X_back_to_orig_species( | ||
| oriented_new_defect_supercell_w_defect_neighbours_as_X | ||
| ) | ||
| oriented_new_defect_site = next( # get defect site and remove X from sites | ||
| oriented_new_defect_supercell.pop(i) | ||
| for i, site in enumerate(oriented_new_defect_supercell.sites) | ||
| if site.specie.symbol == "X" | ||
| ) | ||
|
|
||
| pbar.update(35) # 90% of progress bar | ||
|
|
||
| if target_frac_coords is not False: | ||
| pbar.set_description("Placing defect closest to target_frac_coords") | ||
|
|
||
| target_symm_op = _scan_symm_ops_to_place_site_closest_to_frac_coords( | ||
| target_supercell, oriented_new_defect_site, target_frac_coords | ||
| ) | ||
| oriented_new_defect_supercell = get_clean_structure( | ||
| apply_symm_op_to_struct( # apply symm_op to structure | ||
| target_symm_op, | ||
| oriented_new_defect_supercell, | ||
| fractional=True, | ||
| rotate_lattice=False, | ||
| ), | ||
| ) # reordered inputs in updated doped | ||
| if check_bulk: | ||
| bulk_mismatch_warning = not check_atom_mapping_far_from_defect( | ||
| target_supercell, | ||
| oriented_new_bulk_supercell, | ||
| oriented_new_defect_site.frac_coords, | ||
| warning=False, | ||
| ) | ||
|
|
||
| oriented_new_bulk_supercell = get_clean_structure( | ||
| apply_symm_op_to_struct( | ||
| target_symm_op, | ||
| oriented_new_bulk_supercell, | ||
| fractional=True, | ||
| rotate_lattice=False, | ||
| ), | ||
| ) | ||
|
|
||
| pbar.update(pbar.total - pbar.n) # set to 100% of progress bar | ||
|
|
||
| except Exception as e: | ||
| pbar.close() | ||
| raise e | ||
|
|
||
| finally: | ||
| pbar.close() | ||
|
|
||
| if bulk_mismatch_warning: # print warning after closing pbar; cleaner | ||
| warnings.warn( | ||
| "Note that the atomic position basis of the generated defect/bulk supercell " | ||
| "differs from that of the ``target_supercell``. This is likely fine, " | ||
| "and just due to differences in (symmetry-equivalent) primitive cell definitions " | ||
| "(e.g. {'Cd': [0,0,0], 'Te': [0.25,0.25,0.25]} vs " | ||
| "{'Cd(': [0,0,0], 'Te': [0.75,0.75,0.75]}``) -- see ``get_defect_in_supercell`` " | ||
| "docstring for more info. For accurate finite-size charge corrections when " | ||
| "parsing, a matching bulk and defect supercell should be used, and so the " | ||
| "matching bulk supercell for the generated defect supercell (also returned " | ||
| "by this function) should be used for its reference host cell calculation.", | ||
| ) | ||
|
|
||
| _check_min_dist(oriented_new_defect_supercell, min_dist_tol) # check distances are reasonable | ||
| _check_min_dist(oriented_new_bulk_supercell, bulk_min_dist_tol) | ||
| return oriented_new_defect_supercell, oriented_new_bulk_supercell | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider breaking down this complex function.
With 56 statements and 36 local variables, this function is difficult to understand and maintain. The extensive inline comments and progress bar updates suggest natural boundaries for extraction.
Consider extracting these logical sections into separate functions:
- Super-supercell generation (lines 175-220)
- Stenciling logic (lines 224-240)
- Orientation matching (lines 245-284)
- Defect placement (lines 287-317)
This would improve testability and make the overall flow clearer.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 44-44: Too many arguments (6/5)
(R0913)
[refactor] 44-44: Too many positional arguments (6/5)
(R0917)
[refactor] 44-44: Too many local variables (36/15)
(R0914)
[error] 48-48: Value 'np.ndarray' is unsubscriptable
(E1136)
[refactor] 44-44: Too many statements (56/50)
(R0915)
🤖 Prompt for AI Agents
In doped/utils/stenciling.py around lines 44 to 343, the get_defect_in_supercell
function is overly long and complex with 56 statements and 36 local variables,
making it hard to understand and maintain. Refactor by extracting the following
logical sections into separate helper functions: super-supercell generation
(lines 175-220), stenciling logic (lines 224-240), orientation matching (lines
245-284), and defect placement (lines 287-317). Replace the corresponding code
blocks in the main function with calls to these new functions to improve
readability, testability, and maintainability.
| def _scan_symm_ops_to_place_site_closest_to_frac_coords( | ||
| symm_ops: Structure | Sequence[SymmOp], | ||
| site: PeriodicSite, | ||
| target_frac_coords: np.ndarray[float] | list[float] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid numpy array type hint syntax.
Same issue as above with invalid numpy type hint.
- target_frac_coords: np.ndarray[float] | list[float] | None = None,
+ target_frac_coords: np.ndarray | list[float] | None = None,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| target_frac_coords: np.ndarray[float] | list[float] | None = None, | |
| target_frac_coords: np.ndarray | list[float] | None = None, |
🧰 Tools
🪛 Pylint (3.3.7)
[error] 348-348: Value 'np.ndarray' is unsubscriptable
(E1136)
🤖 Prompt for AI Agents
In doped/utils/stenciling.py at line 348, the type hint for a numpy array is
incorrectly specified as np.ndarray[float]. Replace this with the correct syntax
using numpy.typing.NDArray, such as numpy.typing.NDArray[np.float64], or use a
more general type hint compatible with numpy arrays. Adjust the import
statements if necessary to include numpy.typing for proper type hinting.
| defect_entry: DefectEntry, | ||
| target_supercell: Structure, | ||
| check_bulk: bool = True, | ||
| target_frac_coords: np.ndarray[float] | list[float] | bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid numpy array type hint syntax.
The type hint np.ndarray[float] is not valid Python syntax. NumPy arrays don't support direct subscript notation in type hints.
- target_frac_coords: np.ndarray[float] | list[float] | bool = True,
+ target_frac_coords: np.ndarray | list[float] | bool = True,For more specific typing, consider using numpy.typing.NDArray[np.float64] with appropriate imports.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| target_frac_coords: np.ndarray[float] | list[float] | bool = True, | |
| target_frac_coords: np.ndarray | list[float] | bool = True, |
🧰 Tools
🪛 Pylint (3.3.7)
[error] 48-48: Value 'np.ndarray' is unsubscriptable
(E1136)
🤖 Prompt for AI Agents
In doped/utils/stenciling.py at line 48, the type hint np.ndarray[float] is
invalid syntax. Replace it with numpy.typing.NDArray[np.float64] after importing
NDArray from numpy.typing and np.float64 from numpy to correctly specify the
array type in the type hint.
b304253 to
f0d7be7
Compare
6ef1f97 to
3631bc8
Compare
3631bc8 to
03042cd
Compare
…ching, and use `dist_tol_factor`)
…ions to `doped.complexes`
… in defect site parsing functions
…defect_type_and_site_indices` and move unrelaxed defect structure code out to `defect_from_structures`
… complex defects
… make `site_tol` apply element-wise
03042cd to
1a6a8d7
Compare
See original commit names here: #119
Squashed and restarted after repo clean.
Mostly for tracking for now. Main TODO is to get automated split vacancy generation from ML-calculated database fully implemented. (General) complex defect symmetry and multiplicity analysis (& thus concentrations/thermodynamics) is now implemented, will also add automated complex defect structure recognition in parsing.
Key TODOs: