NXLauetof Panel dependent dataset #102

YooSunYoung · 2025-01-27T20:33:37Z

Related to #60 and #90

@aaronfinke can you please try this and see if it crashes or not....?

YooSunYoung · 2025-01-27T20:41:56Z

src/ess/nmx/nexus.py

    _create_dataset_from_var(
        name="data",
        root_entry=nx_detector,
-        var=sc.scalar(0, unit=''),  # TODO: Add real data
+        var=sc.fold(dg["counts"].data, dim='id', sizes={'x': num_x, 'y': num_y}),
    )


@aaronfinke And this file has all the exporting methods.

Can you please review this part if the fields are correct...?

For example, this block, is folding pixel counts into [n_x_pixels, n_y_pixels, n_tof_bins]
and then save it as data under instrument/{detector_name}.

Make sure the counts are unsigned int and not float since I think that is what is expected from the EFU (they were floats before but they shouldn't be).

Also need to think about how to set the detector geometries in the file. I think the way it was done before (fast_axis, slow_axis, origin as 3D vectors) is fine. If we do that, we can remove the polar_angle and azimuthal_angle. I need to think about this though

@aaronfinke I pushed a change to export them as uint.
Just note that original McStas data is floating numbers so if you don't give the workflow big enough MaximumProbability, it might just wipe out all the information.

Usually the MaximumProbability is set to 10000 which seems to be big enough. In any case, DIALS reads the data in as ints so we might as well control that input ourselves.

As to whether counts should be uint or signed int- some detectors output -1 as a count to define defective pixels, sometimes it's MAX_VALUE-1. I don't know what our detector will do but once I know I will let you know

aaronfinke · 2025-01-28T08:57:00Z

Doesn't crash for me :) (also does not give the correct processed data, but I guess it's not supposed to yet?)

YooSunYoung · 2025-01-28T10:22:01Z

Doesn't crash for me :) (also does not give the correct processed data, but I guess it's not supposed to yet?)

What do you mean by correct processed data? The counts should be correct.
There are some fields that need to be filled though.

…ead it.

YooSunYoung · 2025-01-31T21:14:21Z

@scipp/ess-maintainers @aaronfinke

Sorry I couldn't really make it pretty yet and ended up making a PR with many fixes.

workflow_chunk.ipynb has an example of processing data chunk by chunk.

I had to move around some providers order,
i.e) we have to scale the floating number probability with maximum existing probability and maximum counts to be set by users manually. Therefore maximum existing probability had to be also accumulated (That's the accumulator I wrote in the notebook) and then fed into the provider.
i.e.2) The events had to be histogrammed both into pixel id and time to be accumulated efficiently.

I'll be away until 11th of February so if anyone's interested, please pick this up and continue.
(Feel free to discard what I have done if needed...!)

TODO:

Update the docs dependencies to have essreduce.
Move MaxAccumulator somewhere in a package.
Test

aaronfinke · 2025-02-03T14:52:22Z

from ess.reduce.streaming import EternalAccumulator, Accumulator
from ess.reduce.streaming import StreamProcessor

These are not part of essnmx; where are they from?

jokasimr · 2025-02-03T15:46:41Z

These are not part of essnmx; where are they from?

They are from https://github.com/scipp/essreduce.

aaronfinke · 2025-02-04T08:32:17Z

Tried on a mcstas dataset, got this error:

---------------------------------------------------------------------------
DatasetError                              Traceback (most recent call last)
Cell In[23], line 38
     36     if any(da.sizes.values()) == 0:
     37         continue
---> 38     results = processor.add_chunk({RawEventData: da})
     40 proton_charge, histogram = results[ProtonCharge], results[NMXHistogram]
     41 reduced = format_nmx_reduced_data(
     42     counts=histogram,
     43     proton_charge=proton_charge,
   (...)
     46     crystal_rotation=crystal_rotation,
     47 )

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:216, in StreamProcessor.add_chunk(self, chunks)
    214 to_accumulate = self._process_chunk_workflow.compute(self._accumulators)
    215 for key, processed in to_accumulate.items():
--> 216     self._accumulators[key].push(processed)
    217     self._finalize_workflow[key] = self._accumulators[key].value
    218 return self._finalize_workflow.compute(self._target_keys)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:62, in Accumulator.push(self, value)
     60 if self._preprocess is not None:
     61     value = self._preprocess(value)
---> 62 self._do_push(value)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:99, in EternalAccumulator._do_push(self, value)
     97     self._value = value.copy()
     98 else:
---> 99     self._value += value

DatasetError: Mismatch in coordinate 't' in operation 'add_equals':
(t: 51)    float64              [s]  [0.122206, 0.122209, ..., 0.122345, 0.122348]
vs
(t: 51)    float64              [s]  [0.078339, 0.0792191, ..., 0.121464, 0.122344]

aaronfinke · 2025-02-04T08:52:21Z

I would suggest to take advantage of the fact that these HDF5 data are stored in compressed chunks, and the chunk size is already optimal for i/o.

e.g. in one mcstas file:

with h5py.File('scratch/aaronfinke/nmx_mcstas_rubredoxin_phiseries/5e11_x11000SPLIT_x10_omega_0_180/1/mccode.h5') as fp:
    dset = fp['entry1/data']['Detector_0_event_signal_dat_list_p_x_y_n_id_t']['events']
    fp_shape = dset.chunks
fp_shape
(1332354, 6)

then you can do something like

for s in dset.iter_chunks():
   <process code>

That is how I am working with the McStas data directly and it is quite fast and memory-economical.

SimonHeybrock · 2025-02-04T13:35:14Z

My guess is that the chunks in the file are much smaller than the optimal chunk size for the data reduction. There will be a very significant per-pixel cost, so if there are too few events per chunk then performance will be dominated by the nchunk*npixel terms.

But yes, we can try too choose the chunk size as a multiple of the HDF5 chunk size of the dataset, as an added bonus.

YooSunYoung · 2025-02-19T08:17:55Z

Tried on a mcstas dataset, got this error:

DatasetError Traceback (most recent call last)
Cell In[23], line 38
36 if any(da.sizes.values()) == 0:
37 continue
---> 38 results = processor.add_chunk({RawEventData: da})
40 proton_charge, histogram = results[ProtonCharge], results[NMXHistogram]
41 reduced = format_nmx_reduced_data(
42 counts=histogram,
43 proton_charge=proton_charge,
(...)
46 crystal_rotation=crystal_rotation,
47 )

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:216, in StreamProcessor.add_chunk(self, chunks)
214 to_accumulate = self._process_chunk_workflow.compute(self._accumulators)
215 for key, processed in to_accumulate.items():
--> 216 self._accumulators[key].push(processed)
217 self._finalize_workflow[key] = self._accumulators[key].value
218 return self._finalize_workflow.compute(self._target_keys)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:62, in Accumulator.push(self, value)
60 if self._preprocess is not None:
61 value = self._preprocess(value)
---> 62 self._do_push(value)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:99, in EternalAccumulator._do_push(self, value)
97 self._value = value.copy()
98 else:
---> 99 self._value += value

DatasetError: Mismatch in coordinate 't' in operation 'add_equals':
(t: 51) float64 [s] [0.122206, 0.122209, ..., 0.122345, 0.122348]
vs
(t: 51) float64 [s] [0.078339, 0.0792191, ..., 0.121464, 0.122344]

@aaronfinke
This error is probably because the time binning parameter is not properly set ...?
I updated the notebook to set it at the beginning. That should fix the issue.

And I also added an argument that allows auto chunk based on the iter_chunks.
You have to set CHUNK_SIZE to be 0.
It might be quicker to work this offline together...

YooSunYoung · 2025-02-19T12:54:18Z

We need a helper that decides time of flight range for a big size file.

SimonHeybrock · 2025-02-19T13:17:01Z

We need a helper that decides time of flight range for a big size file.

Why can't this be a user input, or pre-configured, as in other techniques/instruments?

aaronfinke · 2025-02-19T13:34:49Z

Why can't this be a user input, or pre-configured, as in other techniques/instruments?

It will be pre-configured for real data, once we know precisely the tof bin limits for each chopper setting (determined experimentally). But not for McStas simulations ;)

SimonHeybrock · 2025-02-21T06:37:36Z

Why can't this be a user input, or pre-configured, as in other techniques/instruments?

It will be pre-configured for real data, once we know precisely the tof bin limits for each chopper setting (determined experimentally). But not for McStas simulations ;)

So we can make it a user input then?

aaronfinke · 2025-02-21T08:18:45Z

So we can make it a user input then?

This will cause problems in DIALS if the boxes (or tof_min, tof_max) are not accurately determined; refinement of wavelength in DIALS tends to fail with larger bin sizes, for example. A helper script to determine it directly from the tof data is a better option. This is what I do now and it works. I'm not sure what the negative is, aside from taking longer to compute/having to read the file twice (when loading into memory is not an option). If processing takes twice as long, that's still significantly shorter than it takes to collect NMX data (~10 min vs hours).

I have sent Sunyoung the helper I have written for chunked data, but I will add it here for posterity:

def min_max_probability_tof(self) -> tuple[float]:
        """
        Get the minumum and maxiumum tof and probability.
        Ignore zero tof and zero probabilities (likely errors in McStas).
        Done in chunks to avoid memory errors for large files.
        """
        max_p = 0.0
        max_t, min_t = 0.0, np.inf

        with h5py.File(self.input_file,"r") as fp:
            for i in range(self.num_detectors):
                print(f"minmax_t frame {i}")
                dset = fp["entry1/data"][f"Detector_{i}_event_signal_dat_list_p_x_y_n_id_t"]["events"]

                for s in dset.iter_chunks():
                    chunk = dset[s]
                    chunk_t = chunk.T[5]
                    chunk_p = chunk.T[0]
                    tempmax_t = np.max(chunk_t)
                    tempmin_t = np.min(chunk_t[chunk_t > 0])
                    tempmax_p = np.max(chunk_p)
                    if tempmax_t > max_t:
                        max_t = tempmax_t
                    if tempmax_p > max_p:
                        max_p = tempmax_p
                    if tempmin_t < min_t:
                        min_t = tempmin_t
        logger.info(f"max_t: {max_t}, min_t: {min_t}")
        logger.info(f"max_p: {max_p}")
        return (max_t,min_t,max_p)

YooSunYoung · 2025-02-27T10:06:05Z

It's re-written in #115

YooSunYoung added 3 commits January 27, 2025 21:34

Add placeholder for lauetof export helper.

712e19a

Fix existing test to filter deprecation warning.

eae9464

Split panel dependent and independent part.

08007e5

YooSunYoung force-pushed the lauetof-1 branch from e1c47d4 to 08007e5 Compare January 27, 2025 20:34

YooSunYoung commented Jan 27, 2025

View reviewed changes

Remove wrong nx class attributes.

49a415c

YooSunYoung added 4 commits January 28, 2025 11:24

Include detector name in the mcstas loader.

7693713

Expose more coordinates to be exported.

b3c6b7f

Update workflow example.

66636d7

Flatten function calls in the example.

48e77e8

YooSunYoung force-pushed the lauetof-2 branch from 44d301e to 48e77e8 Compare January 28, 2025 10:25

YooSunYoung added 4 commits January 28, 2025 11:50

Export into specific dtype.

dce3496

Add chunk workflow.

fb0dd9f

Rename example file name to be h5 so that other programs can easily r…

3fbd3d4

…ead it.

Export legacy geometry.

06dc94b

Dataset automatic chunk reading option.

a6f7800

Update mcstas loader and reduction test according to the latest fix.

da07767

This was referenced Feb 25, 2025

Add warning for the to-be-deprecated interface. #107

Merged

Rename MaximumProbability to MaximumCounts and add MaximumProbability. #108

Merged

Update workflow to be able to process chunk by chunk. #109

Merged

YooSunYoung changed the base branch from lauetof-1 to main February 27, 2025 08:09

YooSunYoung mentioned this pull request Feb 27, 2025

Lauetof format export function #115

Merged

YooSunYoung closed this Feb 27, 2025

YooSunYoung deleted the lauetof-2 branch February 27, 2025 10:06

NXLauetof Panel dependent dataset #102

NXLauetof Panel dependent dataset #102

Uh oh!

Conversation

YooSunYoung commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YooSunYoung Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

aaronfinke Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

YooSunYoung Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

aaronfinke Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

aaronfinke Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

aaronfinke commented Jan 28, 2025

Uh oh!

YooSunYoung commented Jan 28, 2025

Uh oh!

YooSunYoung commented Jan 31, 2025

Uh oh!

aaronfinke commented Feb 3, 2025

Uh oh!

jokasimr commented Feb 3, 2025

Uh oh!

aaronfinke commented Feb 4, 2025

Uh oh!

aaronfinke commented Feb 4, 2025

Uh oh!

SimonHeybrock commented Feb 4, 2025

Uh oh!

YooSunYoung commented Feb 19, 2025

Uh oh!

YooSunYoung commented Feb 19, 2025

Uh oh!

SimonHeybrock commented Feb 19, 2025

Uh oh!

aaronfinke commented Feb 19, 2025

Uh oh!

SimonHeybrock commented Feb 21, 2025

Uh oh!

aaronfinke commented Feb 21, 2025

Uh oh!

YooSunYoung commented Feb 27, 2025

Uh oh!

Uh oh!

YooSunYoung commented Jan 27, 2025 •

edited

Loading