MolCrysKit is built around the concept of extending the Atomic Simulation Environment (ASE) with graph-based molecular representations. The central class, CrystalMolecule, inherits from ASE Atoms but adds a NetworkX graph for connectivity. This dual representation allows for both standard ASE operations and sophisticated graph-based molecular analysis.
The architecture enables:
- Full compatibility with ASE tools and workflows
- NetworkX graph algorithms for molecular analysis
- Efficient identification of molecular components within crystal structures
- Flexible representation of chemical connectivity with customizable bonding thresholds
The disorder handling pipeline in MolCrysKit is structured as a three-phase process that transforms raw disorder information into physically realistic ordered structures:
The process begins with scan_cif_disorder which parses CIF files to extract disorder information. This phase identifies atoms belonging to different disorder groups (PART numbers), their occupancies, and assembly information. The extracted data is stored in a DisorderInfo object (defined in io/cif.py, re-exported from disorder/info.py) containing:
- Atomic symbols and labels
- Fractional coordinates
- Occupancy values
- Disorder group assignments
- Assembly identifiers
The DisorderGraphBuilder constructs a conflict graph where atoms that cannot coexist in the same physical structure are connected by edges. This phase implements sophisticated conflict detection mechanisms:
- Conformer conflicts: Detects logical alternatives that cannot occupy the same space
- Explicit conflicts: Identifies atoms with identical assembly IDs or close proximity
- Geometric conflicts: Flags atoms that are too close to physically coexist
- Valence conflicts: Resolves chemically unrealistic coordination environments
The graph construction process uses precomputed distance matrices with Periodic Boundary Conditions (PBC) to efficiently evaluate all interatomic relationships.
The DisorderSolver implements the final phase by solving the Maximum Weight Independent Set problem on the exclusion graph. The solver:
- Groups atoms into rigid bodies based on disorder group and assembly information
- Implements a greedy algorithm to select groups with high occupancy weights and low conflict degrees
- Samples PART/SP alternatives by occupancy with
method="random"to generate reproducible ensembles when a seed is provided - Enumerates Cartesian products of independent PART/SP alternatives with
method="enumerate" - Reconstructs complete molecular crystals from the selected atom sets
generate_ordered_replicas_from_disordered_sites(..., coupled=False) is the
default enumeration contract. Explicit PART/assembly conflicts are scoped to
the same symmetry operation when symmetry provenance is available, so expanded
copies of one asymmetric-unit disorder model make independent decisions. For
example, a two-copy PART 1/2 site enumerates AA, AB, BA, and BB rather
than only AA and BB.
The same flag applies to implicit special-position motifs. In decoupled mode,
isolated X(H)n centres such as NH4+ expose multiple locally valid H
orientations as competing rigid groups, bounded by
_MAX_MOTIF_ORIENTATIONS_PER_CENTER. Passing coupled=True restores the
legacy behaviour where symmetry copies are locked together and motif merge keeps
only the greedy best orientation.
The MWIS solution represents a compromise between maximizing total occupancy (thermodynamic stability) and minimizing steric clashes (geometric feasibility), resulting in physically realistic ordered structures from disordered crystal data.