ASE can handle systems with up to 70,000 atoms, but fairchem_lammps will report CUDA OOM

### What would you like to report?

Dear fairchem team,

I am running MD simulations in ASE using the “uma-s-1.pt” model for a system of about 70,000 atoms. Although the simulation is slow, it can still run successfully. However, when I try to accelerate the simulation using LAMMPS + fairchem with a single GPU (H100, 98 GB) via the command：
lmp_fc \
  lmp_in=in.lammps \
  task_name=omol \
  local_predict_unit.path.model_name=uma-s-1

I encounter the following error:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 19.20 GiB.
GPU 0 has a total capacity of 93.10 GiB of which 18.03 GiB is free.
Including non-PyTorch memory, this process has 75.06 GiB memory in use.
Of the allocated memory 56.39 GiB is allocated by PyTorch,
and 18.01 GiB is reserved by PyTorch but unallocated.

The LAMMPS integrator continues to advance the MD simulation, and I can still obtain trajectory files. I think the simulation is non-physical because the UMA predictor clearly fails due to CUDA OOM and therefore should not be returning valid energies and forces. 

My questions are therefore:
1. Is this behavior normal?
2. Is it expected that LAMMPS + fairchem can consume more GPU memory than ASE for large systems?
3. Are there any recommended strategies to reduce GPU memory usage or otherwise improve performance in this setup?

I would be extremely grateful for any insights or suggestions you could provide.

Below is my LAMMPS input file:
# Units and Dimensions
units           metal       
dimension       3           
boundary        p p p
atom_style      atomic     
atom_modify     map yes
newton          on

read_data       npt_1bar.data
velocity all create 300.0 42 mom yes rot yes dist gaussian

# Neighbor List Settings
neighbor        2.0 bin
neigh_modify    delay 0 every 10 check yes

# Time Step
timestep        0.001              

thermo_style    custom step temp pe ke etotal press vol lx ly lz
thermo_modify   flush yes
thermo          1

dump            1 all custom 1 traj_npt.lammpstrj id type x y z vx vy vz
dump_modify     1 sort id

fix             1 all npt temp 300.0 300.0 0.1 iso 1.0 1.0 1.0 tchain 3 pchain 3

restart 1000 restart.*.eq
reset_timestep  0
run             100000


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASE can handle systems with up to 70,000 atoms, but fairchem_lammps will report CUDA OOM #1725

What would you like to report?

Units and Dimensions

Neighbor List Settings

Time Step

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ASE can handle systems with up to 70,000 atoms, but fairchem_lammps will report CUDA OOM #1725

Description

What would you like to report?

Units and Dimensions

Neighbor List Settings

Time Step

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions