Update _get_start_data to always grab the beginning of timestep time by lewisgross1296 · Pull Request #3414 · openmc-dev/openmc

lewisgross1296 · 2025-05-19T21:47:17Z

Description

As of right now, any interrupted depletion simulation (think max wall time or unintended power off) will have an incorrect time for the time at which the simulation restarts. This happens when continue_timesteps is True or False (independent of #3272) and needs to be fixed. This function is the problem

openmc/openmc/deplete/abc.py

Lines 821 to 825 in a7d1ceb

    
           def _get_start_data(self): 
        
               if self.operator.prev_res is None: 
        
                   return 0.0, 0 
        
               return (self.operator.prev_res[-1].time[-1], 
        
                       len(self.operator.prev_res) - 1)

The fix is actually a one liner .time[-1] -> .time[-2]. I will explain.

Every StepResult in a depletion simulation, has a member time which is a list of float

openmc/openmc/deplete/stepresult.py

Lines 25 to 36 in a7d1ceb

    
           class StepResult: 
        
               """Result of a single depletion timestep 
        
               .. versionchanged:: 0.13.1 
        
                   Name changed from ``Results`` to ``StepResult`` 
        
               Attributes 
        
               ---------- 
        
               k : list of (float, float) 
        
                   Eigenvalue and uncertainty for each substep. 
        
               time : list of float 
        
                   Time at beginning, end of step, in seconds.

For every step, the list of float time member has values [t,t+dt] for the given step. When reading in previous results, the last depletion step appears to havedt=0 and thus is stored as [t_final,t_final]. I discovered this by printing out the time_dset[step, :] in the from_hdf5 method each time its called in the Results Constructor.

openmc/openmc/deplete/results.py

Lines 66 to 77 in a7d1ceb

    
           def __init__(self, filename='depletion_results.h5'): 
        
               data = [] 
        
               if filename is not None: 
        
                   with h5py.File(str(filename), "r") as fh: 
        
                       cv.check_filetype_version(fh, 'depletion results', VERSION_RESULTS[0]) 
        
                       # Get number of results stored 
        
                       n = fh["number"][...].shape[0] 
        
                       for i in range(n): 
        
                           data.append(StepResult.from_hdf5(fh, i)) 
        
               super().__init__(data)

This is a fine thing to do if the simulation never gets killed, but it introduces a problem if you are not guaranteed to complete a depletion simulation to the last step with [t,t]. Basically, if a simulation gets killed, the final time member will look something like this

[86400, 172800]

Since it never finishes this step, in the next run (continue or not), self.operator.prev_res[-1].time[-1] is actually 1782800. However, a simulation that restarts from this previous run should restart at t=86400 and finish the step. In the restart, we're currently starting at t=172800 with the first dt in the new simulation. When this gets written, it appears like the wrong dt exists in the depletion_results.h5 file as the sum of the final dt of the last step and the first new dt for the interrupted step.

Since the first value in the time list of floats member will always be the time you want in a restart (run to completion or killed), we should change to grab this value instead. Very subtle.

Fixes #3387.

Checklist

I have performed a self-review of my own code
~~I have run clang-format (version 15) on any C++ source files (if applicable)~~
I have followed the style guidelines for Python source files (if applicable)
I have made corresponding changes to the documentation (if applicable)
I have added tests that prove my fix is effective or that my feature works (if applicable)

lewisgross1296 · 2025-05-19T21:48:44Z

I've not pushed the fix for now to see the new test exposing the issue fail. I will push the fix after CI gets started.

Also realizing

        return (self.operator.prev_res[-1].time[0],
                len(self.operator.prev_res) - 1)

might be more clear than

        return (self.operator.prev_res[-1].time[-2],
                len(self.operator.prev_res) - 1)

It seems this member variable will always have length 2, so perhaps a 0 is more clear.

lewisgross1296 · 2025-05-20T03:23:42Z

Looks like I made a silly mistake regarding ==, just fixed that by replacing with numpy.array_equal. Though, I'm realizing the third failure was for a reason I didn't expect

E   OSError: Error reading file 'continue_model.xml': failed to load "continue_model.xml": No such file or directory

Perhaps I'm misunderstanding what files Github Actions has access to, but I thought that pushing XML/h5 files to /tests/unit_tests would mean the test could access them.

Either way @paulromano, I'm curious about your thoughts on this test in general, since it feels a little non-standard. I think it's important to test for this case though. Still going to hold off on pushing the fix until this test fails in the way we expect (the same as in #3387)

gonuke · 2025-05-20T15:32:44Z

Though, I'm realizing the third failure was for a reason I didn't expect
E   OSError: Error reading file 'continue_model.xml': failed to load "continue_model.xml": No such file or directory
Perhaps I'm misunderstanding what files Github Actions has access to, but I thought that pushing XML/h5 files to /tests/unit_tests would mean the test could access them.

It looks like the standars/best practice is to use Path(__file__).parents[] to access the relative path to the test and find the file you want.

…run can restart with the proper timesteps

lewisgross1296 · 2025-05-20T16:49:23Z

Tested locally and now this commit should show only a failure for test_deplete_continue.py::test_killed_and_continue which exposes the current issue with the _get_start_data(self) method in abc.py. It should show that the last simulation time before the job was killed has the wrong timestep (a dt=5 where it should be dt=2) when output from the final results.

After that is the only failure, I will push the correction and show that the commit now passes tests and can be merged

lewisgross1296 · 2025-05-20T17:39:05Z

Was hoping there would be more printout from Github Actions (and this time it stopped after failing this test), but it looks like this is the only failing test in test_deplete_continue.py (which has had updates)

tests/unit_tests/test_deplete_continue.py::test_continue PASSED          [ 46%]
tests/unit_tests/test_deplete_continue.py::test_continue_continue PASSED [ 46%]
tests/unit_tests/test_deplete_continue.py::test_killed_and_continue 
Error: Process completed with exit code 255.

Locally, I get this as printout

>       assert np.array_equal(np.diff(final_res.get_times(time_units="d")),[1.0, 2.0, 3.0, 4.0])
E       AssertionError: assert False
E        +  where False = <function array_equal at 0x7c744c974870>(array([1., 5., 3., 4.]), [1.0, 2.0, 3.0, 4.0])
E        +    where <function array_equal at 0x7c744c974870> = np.array_equal
E        +    and   array([1., 5., 3., 4.]) = <function diff at 0x7c744b3f4bb0>(array([ 0.,  1.,  6.,  9., 13.]))
E        +      where <function diff at 0x7c744b3f4bb0> = np.diff
E        +      and   array([ 0.,  1.,  6.,  9., 13.]) = get_times(time_units='d')
E        +        where get_times = [<StepResult: t=0.0, dt=86400.0, source=35000.0>, <StepResult: t=86400.0, dt=172800.0, source=35000.0>, <StepResult: t...rce=35000.0>, <StepResult: t=777600.0, dt=345600.0, source=35000.0>, <StepResult: t=1123200.0, dt=0.0, source=35000.0>].get_times

/home/lgross/openmc/tests/unit_tests/test_deplete_continue.py:93: AssertionError

I will now push the fix

lewisgross1296 · 2025-05-20T20:42:11Z

Hmm so not expecting this failure... I was testing with python 3.11.11, so I switched to 3.12.8 (pyenv randomly doesn't have 3.12.10)

Locally, I'm getting this

lgross@ulam:~/openmc/tests/unit_tests (fix_continue_h5_bug) $ pytest test_deplete_continue.py 
===================================== test session starts =====================================
platform linux -- Python 3.12.8, pytest-8.3.5, pluggy-1.6.0
rootdir: /home/lgross/openmc
configfile: pytest.ini
collected 5 items                                                                                                                                                                                                                                                      

test_deplete_continue.py .....                                                                                                                                                                                                                                   [100%]

====================================== warnings summary ======================================
tests/unit_tests/test_deplete_continue.py: 768 warnings
  /home/lgross/.pyenv/versions/3.12.8/lib/python3.12/multiprocessing/popen_fork.py:66: 
  DeprecationWarning: This process (pid=3925931) is multi-threaded, use of fork() may 
  lead to deadlocks in the child. self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== 5 passed, 768 warnings in 55.76s ===============================

I'm only getting this message from GitHub Actions,

Error: Process completed with exit code 255.

so I'll try updating the commit message to spawn an action-tmate session

EDIT: it seems like prepending [gha-debug] didn't allow the tmate session to spawn 🤔

…estep time so it will work whether a previous simulation completes or is interrupted

lewisgross1296 · 2025-05-21T17:23:06Z

I realized that there already exists a (much) simpler chain in the repo, so I switched to using that. I also realized that the tests use the NNDC cross sections and that the XML/h5 files I generated used ENDF-B-VIII.0. There are some nuclide differences (e.g. handling of carbon) which might be causing the error. To eliminate possible sources of error, I regenerate the continue_depletion_results.h5 file and XML to be more consistent with the testing framework.

gonuke

A few edits of the comments you added

Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>

gonuke

This all looks good to me - thanks @lewisgross1296

gonuke · 2025-05-22T14:18:04Z

In case @paulromano may be wondering, this is basically a 1 line PR (2 character, in fact) other than the tests...

paulromano

@lewisgross1296 Thanks for the fix here (and thanks for the nudge @gonuke). The one-line change looks good to me. The test, however, is a little more problematic because if and when we have other changes that require updating the continue_depletion_results.h5 file, there is no easy way to do that. Other tests recognize when you call pytest --update and will update their reference result files accordingly, but I don't see any way of doing that for this test since it requires killing a process. Are you OK with just removing the test for the time being? Obviously not ideal but honestly I would prefer to have no test rather than have a test that is difficult to update in the future.

gonuke · 2025-06-04T17:43:36Z

Probably fine to get rid of the test from our perspective. We briefly discussed whether we could generate a test that would result in an interrupted job, thus creating this file on the fly, but I'm not sure if that's possible.

paulromano · 2025-06-04T21:25:50Z

Ok, sounds good. I'll go ahead and remove the test and merge this. If you guys are able to figure out a good way to test it, feel free to follow up with another PR but no sweat otherwise.

lewisgross1296 · 2025-06-04T22:05:11Z

Yeah, the test seems difficult to maintain since I had to locally kill the simulation to create that h5 file. It probably makes the most sense to remove it, but knowing the test passed here is hopefully sufficient.

If something in the future causes the continue runs to break, we can cross that bridge when we get there. Hopefully this change is resistant to being broken. The updated doc string should help anyone who might need to use or change the _get_start_data() function for future development.

Thanks for finishing this up @paulromano!

lewisgross1296 requested a review from paulromano as a code owner May 19, 2025 21:47

lewisgross1296 changed the title ~~add assertions to tests and add new test to ensure a killed continue …~~ Update _get_start_data to always grab the beginning of timestep time May 19, 2025

add assertions to tests and add new test to ensure a killed continue …

879350c

…run can restart with the proper timesteps

lewisgross1296 force-pushed the fix_continue_h5_bug branch from af35184 to 879350c Compare May 20, 2025 16:46

lewisgross1296 force-pushed the fix_continue_h5_bug branch 3 times, most recently from a244474 to 624559d Compare May 21, 2025 16:56

[gha-debug] update _get_start_data(self) to grab the beginning of tim…

5643fae

…estep time so it will work whether a previous simulation completes or is interrupted

lewisgross1296 force-pushed the fix_continue_h5_bug branch from 624559d to 5643fae Compare May 21, 2025 17:16

gonuke suggested changes May 21, 2025

View reviewed changes

Comment thread openmc/deplete/abc.py Outdated

Comment thread openmc/deplete/abc.py Outdated

Comment thread openmc/deplete/abc.py Outdated

Comment thread openmc/deplete/abc.py Outdated

Comment thread openmc/deplete/abc.py Outdated

PPHW docstring improvements

b7986d0

Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>

lewisgross1296 commented May 21, 2025

View reviewed changes

Comment thread openmc/deplete/abc.py

remove `` from docstring/slight wording change

3046883

gonuke approved these changes May 21, 2025

View reviewed changes

Update docstring, add type hint

7086a8b

paulromano requested changes Jun 4, 2025

View reviewed changes

Remove test_killed_and_continue

da7031c

paulromano approved these changes Jun 4, 2025

View reviewed changes

paulromano enabled auto-merge (squash) June 4, 2025 21:34

paulromano merged commit e14bb88 into openmc-dev:develop Jun 4, 2025
14 checks passed

lewisgross1296 deleted the fix_continue_h5_bug branch February 5, 2026 01:45

	def _get_start_data(self):
	if self.operator.prev_res is None:
	return 0.0, 0
	return (self.operator.prev_res[-1].time[-1],
	len(self.operator.prev_res) - 1)

	class StepResult:
	"""Result of a single depletion timestep

	.. versionchanged:: 0.13.1
	Name changed from ``Results`` to ``StepResult``

	Attributes
	----------
	k : list of (float, float)
	Eigenvalue and uncertainty for each substep.
	time : list of float
	Time at beginning, end of step, in seconds.

	def __init__(self, filename='depletion_results.h5'):
	data = []
	if filename is not None:
	with h5py.File(str(filename), "r") as fh:
	cv.check_filetype_version(fh, 'depletion results', VERSION_RESULTS[0])

	# Get number of results stored
	n = fh["number"][...].shape[0]

	for i in range(n):
	data.append(StepResult.from_hdf5(fh, i))
	super().__init__(data)

Conversation

lewisgross1296 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

lewisgross1296 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewisgross1296 commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gonuke commented May 20, 2025

Uh oh!

lewisgross1296 commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewisgross1296 commented May 20, 2025

Uh oh!

lewisgross1296 commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lewisgross1296 commented May 21, 2025

Uh oh!

gonuke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonuke left a comment

Choose a reason for hiding this comment

Uh oh!

gonuke commented May 22, 2025

Uh oh!

paulromano left a comment

Choose a reason for hiding this comment

Uh oh!

gonuke commented Jun 4, 2025

Uh oh!

paulromano commented Jun 4, 2025

Uh oh!

lewisgross1296 commented Jun 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lewisgross1296 commented May 19, 2025 •

edited

Loading

lewisgross1296 commented May 19, 2025 •

edited

Loading

lewisgross1296 commented May 20, 2025 •

edited

Loading

lewisgross1296 commented May 20, 2025 •

edited

Loading

lewisgross1296 commented May 20, 2025 •

edited

Loading