Skip to content

Feature suggestion: lifecycle hooks for user-defined callbacks during simulations #397

@janosh

Description

@janosh

context

about 1 in 1e5 geometry optimizations goes off the rails (usually an issue with FIRE step size or MLIP stress preds) causing the cell to distort in strange ways, volume collapses, edge count explodes → OOM. this can bring down nodes doing long-running structure searches.

a possible current workaround is modifying the convergence check to monitor how many times the cell gets tiled for neighbor list construction and abort if it exceeds some threshold (e.g. 1k).

proposal

would be useful to have a general concept of lifecycle hooks in torch-sim that allows users to pass arbitrary callback functions. these could:

  • perform custom sanity checks at every N-th step (N user-defined) (based on as many sim params as torch-sim can provide: cell volume, edge count, neighbor list tiling, etc.)
  • modify sim params on the fly (e.g. reduce timestep if things look unstable)
  • abort early with a user-intelligible reason if sim looks unrecoverable

something like:

def my_check(state: SimState) -> bool | str:
    if state.neighbor_tiles > 1000:
        return "cell tiling exceeded 1k, aborting"
    return True  # continue

result = step_func(state, model, ..., callbacks=[my_check])

a general callback mechanism would let:

  • users define domain-specific guardrails
  • future tools (think custodian for MD) tap into the sim lifecycle
  • debugging/logging without modifying torch-sim internals

tagging @kyonofx who brought this up, briefly discussed this with @abhijeetgangan

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design discussionsecosystemComp-chem ecosystem relatedenhancementNew feature or requestfeatureEntirely new features, not improvements to existing ones

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions