Skip to content

[WIP] Better input set generator and sets for io.lammps #4368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

vir-k01
Copy link

@vir-k01 vir-k01 commented Apr 19, 2025

Summary

This PR is part of an effort to include workflows for running LAMMPS via atomate2. A concurrent PR will be opened in atomate2 that has the flows, while this PR aims to update the input set generators to match the inputs required there.
These changes are based on initial work here: https://github.com/Matgenix/atomate2-lammps done by @ml-evs and @gbrunin. Also tagging in @esoteric-ephemera who helped structure the code in this PR, and @davidwaroquiers.
Major changes:

  • Updated templates for running NVT/NPT/Minimization jobs with the upcoming flows in atomate2.
  • Added in a data class to validate commonly utilized settings (based on the updated templates)
  • The new input set generator is written (to the best of my ability) to provide input settings consistent with classical MD run through other codes on solids (matched defaults to ASE MD)

Marked this as a WIP as there might be further updates here dependent on the updates made in the concurrent atomate2 PR.

TODO:

  • Add tests
  • I've made most of the design decisions here assuming these sets will be used with solids through the Structure object, but one could also argue including support for Molecules. I'm looking for suggestions on how that can be incorporated without varying too much of the logic (ex : for the default inputs, atom styles, force field parameters)
  • I'm also looking for suggestions on how to best check/validate user specified force field parameters, which is difficult to do considering the numerous force field formats that exist out there. The existing ForceField object in io.lammps.data is not very intuitive to set-up for a user and does not support lammps integrated MLIPs yet.

vir-k01 and others added 20 commits October 30, 2024 17:17
…LammpsInputFile objects as input for the template to make use of all the other io.lammps functions in pmg
…rd lammps) files with the input set that are prepared independent of the input file
Added a new (and probs better?) implementation for generating lammps input sets, with a more expressive MD template. Everything is now based on the template to ensure the atomate2 workflows being written work without the user having to specify too much. Inputs and states have to be enumed.
…LammpsInputFile objects as input for the template to make use of all the other io.lammps functions in pmg
…rd lammps) files with the input set that are prepared independent of the input file
Added a new (and probs better?) implementation for generating lammps input sets, with a more expressive MD template. Everything is now based on the template to ensure the atomate2 workflows being written work without the user having to specify too much. Inputs and states have to be enumed.
…o lammps_sets"

This reverts commit c1f2ed6, reversing
changes made to 410d4b4.
…class for storing and validating user defined settings
 Changes to be committed:

Reworked the way the input set generator generates the input file to be less cumbersome and use fewer if-else clauses, this new approach goes back to the original generator in order to be a lot more flexible (esp for custom lammps jobs). A LammpsSettings object is now present in order to validate the inputs specified by the user prior to actually passing them to lammps. This approach works out for the makers implemented currently (NVT/NPT/NVE/Minimization).
Redid how input sets are validated, in particular how "general" settings can also be stored as attrs of LammpsSettings, and only validate keys from _BASE_LAMMPS_SETTINGS. This way, allows for very flexible inputs from user to also be present in the final taskdoc.inputs. Also used PathLike instead of Path objects.
@vir-k01 vir-k01 requested review from shyuep and mkhorton as code owners April 19, 2025 21:06
@vir-k01 vir-k01 changed the title [WIPBetter input set generator and sets for io.lammps [WIP] Better input set generator and sets for io.lammps Apr 19, 2025
@@ -776,6 +777,19 @@ def is_rare_earth(self) -> bool:
"""
return self.is_lanthanoid or self.is_actinoid or self.symbol in {"Sc", "Y"}

@property
@deprecated(
Copy link
Contributor

@DanielYang59 DanielYang59 Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess reverting changes to core.periodic_table and io.exciting is not intended?

If so you could revert those changes with (warning, this would drop all changes to that file, make sure this is what you need) git checkout upstream/master -- file_to_revert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just behind the master branch, synced up now

Copy link
Contributor

@gbrunin gbrunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vir,
Thanks for opening this PR, it looks very good and much better than what we had already done. I have left a few comments - those are only suggestions, I don't have a strong opinion on any of them.

I think indeed including support for molecules would be beneficial. I have added a few comments/suggestions going in that direction, let me know what you think.

The force field description is definitely a challenge with so many different ways to specify it. I'm not sure checking/validating what the user specifies is absolutely necessary at this stage - after all the code is also not checking the validity of other user-specified settings. I would leave it to the user to know how to specify the force field they want to use. But if others have ideas on how to implement that efficiently, I'm definitely not against it either :-)

Looking forward to having this integrated in pymatgen!

Best,
Guillaume


MODULE_DIR = os.path.dirname(os.path.abspath(__file__))
TEMPLATE_DIR = f"{MODULE_DIR}/templates"
_BASE_LAMMPS_SETTINGS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the different default settings for solids or molecules, maybe this could be a nested dictionary like

_COMMON_SETTINGS = {}  # Add any default that would be common to any type of data. 
_BASE_LAMMPS_SETTINGS = {
    "structure": _COMMON_SETTINGS | {"atom_style": "atomic", "boundary": ("p", "p", "p")},
    "molecule": _COMMON_SETTINGS | {"atom_style": "full", "boundary": ("f", "f", "f")},
    "general": _COMMON_SETTINGS,
}

that would then be used according to the type of the object passed to get_input_set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a small comment, I wanted to point out that in principle the Lattice has a pbc attribute:

pbc: tuple[bool, bool, bool] = (True, True, True),

so in principle even mixed use cases could be handled directly with a Structure object.
Maybe it would be easier to have the boundary and atom_style settings be part of the data as produced by LammpsData.from_structure(from_molecule) instead? Would it make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the comments! I've made some of the changes, and let me try what @gpetretto mentioned for the molecules

input_file = LammpsInputFile.from_str(filtered_input_str, keep_stages=self.keep_stages)

return LammpsInputSet(
inputfile=input_file, data=data, calc_type=self.calc_type, additional_data=write_data, **kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to pass the argument template_file=... so that a trace of the initial template is kept somehow. This could probably be achieved in different ways though, not sure what would be best.


def get_input_set(
self,
data: Structure | LammpsData | CombinedData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to accept Molecule as well.

species = ""
if isinstance(data, Structure):
species = " ".join({s.symbol for s in data.species})
data = LammpsData.from_structure(data, atom_style=atom_style)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Molecules, I think the LammpsData.from_molecule classmethod should be implemented.

vir-k01 added 4 commits April 28, 2025 19:07
Moved the FF warning to inside the get_input_set function, better position cause if the FF hasn't been specified by then it's most likely an error. Also used .get() methods for the conditionals in the get_input_set function for when the user is using the custom maker
…ized json where atoms and velocities are not resolved into dataframes before checking condition. Unsure why that is the case, commenting out for now
@davidwaroquiers
Copy link
Contributor

@vir-k01 any news about this PR ?
Can we provide some help or would you like us to take this on if needed on your side and possible on our side ?

@vir-k01
Copy link
Author

vir-k01 commented Jun 26, 2025

@davidwaroquiers apologies for slacking off on this. After a bit of experimentation, I've added a LammpsData.from_molecule method and added different defaults as per what @gbrunin suggested. Tbh although this supports using molecules, a better way would be for the user to first make the Topology and ForceField objects as they've been currently implemented and use the LammpsData.from_ff_and_topologies method instead. This would preserve information about bonding. I suppose this shouldn't be an issue though for force fields that do not rely on such info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants