Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NXLauetof Panel dependent dataset #102

Draft
wants to merge 8 commits into
base: lauetof-1
Choose a base branch
from
Draft

NXLauetof Panel dependent dataset #102

wants to merge 8 commits into from

Conversation

YooSunYoung
Copy link
Member

@YooSunYoung YooSunYoung commented Jan 27, 2025

Related to #60 and #90

@aaronfinke can you please try this and see if it crashes or not....?

Comment on lines 187 to 189
_create_dataset_from_var(
name="data",
root_entry=nx_detector,
var=sc.scalar(0, unit=''), # TODO: Add real data
var=sc.fold(dg["counts"].data, dim='id', sizes={'x': num_x, 'y': num_y}),
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfinke And this file has all the exporting methods.

Can you please review this part if the fields are correct...?

For example, this block, is folding pixel counts into [n_x_pixels, n_y_pixels, n_tof_bins]
and then save it as data under instrument/{detector_name}.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the counts are unsigned int and not float since I think that is what is expected from the EFU (they were floats before but they shouldn't be).

Also need to think about how to set the detector geometries in the file. I think the way it was done before (fast_axis, slow_axis, origin as 3D vectors) is fine. If we do that, we can remove the polar_angle and azimuthal_angle. I need to think about this though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronfinke I pushed a change to export them as uint.
Just note that original McStas data is floating numbers so if you don't give the workflow big enough MaximumProbability, it might just wipe out all the information.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the MaximumProbability is set to 10000 which seems to be big enough. In any case, DIALS reads the data in as ints so we might as well control that input ourselves.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to whether counts should be uint or signed int- some detectors output -1 as a count to define defective pixels, sometimes it's MAX_VALUE-1. I don't know what our detector will do but once I know I will let you know

@aaronfinke
Copy link
Collaborator

Doesn't crash for me :) (also does not give the correct processed data, but I guess it's not supposed to yet?)

@YooSunYoung
Copy link
Member Author

Doesn't crash for me :) (also does not give the correct processed data, but I guess it's not supposed to yet?)

What do you mean by correct processed data? The counts should be correct.
There are some fields that need to be filled though.

@YooSunYoung
Copy link
Member Author

@scipp/ess-maintainers @aaronfinke

Sorry I couldn't really make it pretty yet and ended up making a PR with many fixes.

workflow_chunk.ipynb has an example of processing data chunk by chunk.

I had to move around some providers order,
i.e) we have to scale the floating number probability with maximum existing probability and maximum counts to be set by users manually. Therefore maximum existing probability had to be also accumulated (That's the accumulator I wrote in the notebook) and then fed into the provider.
i.e.2) The events had to be histogrammed both into pixel id and time to be accumulated efficiently.

I'll be away until 11th of February so if anyone's interested, please pick this up and continue.
(Feel free to discard what I have done if needed...!)

TODO:

  • Update the docs dependencies to have essreduce.
  • Move MaxAccumulator somewhere in a package.
  • Test

@aaronfinke
Copy link
Collaborator

from ess.reduce.streaming import EternalAccumulator, Accumulator
from ess.reduce.streaming import StreamProcessor

These are not part of essnmx; where are they from?

@jokasimr
Copy link
Contributor

jokasimr commented Feb 3, 2025

These are not part of essnmx; where are they from?

They are from https://github.com/scipp/essreduce.

@aaronfinke
Copy link
Collaborator

Tried on a mcstas dataset, got this error:

---------------------------------------------------------------------------
DatasetError                              Traceback (most recent call last)
Cell In[23], line 38
     36     if any(da.sizes.values()) == 0:
     37         continue
---> 38     results = processor.add_chunk({RawEventData: da})
     40 proton_charge, histogram = results[ProtonCharge], results[NMXHistogram]
     41 reduced = format_nmx_reduced_data(
     42     counts=histogram,
     43     proton_charge=proton_charge,
   (...)
     46     crystal_rotation=crystal_rotation,
     47 )

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:216, in StreamProcessor.add_chunk(self, chunks)
    214 to_accumulate = self._process_chunk_workflow.compute(self._accumulators)
    215 for key, processed in to_accumulate.items():
--> 216     self._accumulators[key].push(processed)
    217     self._finalize_workflow[key] = self._accumulators[key].value
    218 return self._finalize_workflow.compute(self._target_keys)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:62, in Accumulator.push(self, value)
     60 if self._preprocess is not None:
     61     value = self._preprocess(value)
---> 62 self._do_push(value)

File ~/.conda/envs/scipp/lib/python3.13/site-packages/ess/reduce/streaming.py:99, in EternalAccumulator._do_push(self, value)
     97     self._value = value.copy()
     98 else:
---> 99     self._value += value

DatasetError: Mismatch in coordinate 't' in operation 'add_equals':
(t: 51)    float64              [s]  [0.122206, 0.122209, ..., 0.122345, 0.122348]
vs
(t: 51)    float64              [s]  [0.078339, 0.0792191, ..., 0.121464, 0.122344]

@aaronfinke
Copy link
Collaborator

I would suggest to take advantage of the fact that these HDF5 data are stored in compressed chunks, and the chunk size is already optimal for i/o.

e.g. in one mcstas file:

with h5py.File('scratch/aaronfinke/nmx_mcstas_rubredoxin_phiseries/5e11_x11000SPLIT_x10_omega_0_180/1/mccode.h5') as fp:
    dset = fp['entry1/data']['Detector_0_event_signal_dat_list_p_x_y_n_id_t']['events']
    fp_shape = dset.chunks
fp_shape
(1332354, 6)

then you can do something like

for s in dset.iter_chunks():
   <process code>

That is how I am working with the McStas data directly and it is quite fast and memory-economical.

@SimonHeybrock
Copy link
Member

My guess is that the chunks in the file are much smaller than the optimal chunk size for the data reduction. There will be a very significant per-pixel cost, so if there are too few events per chunk then performance will be dominated by the nchunk*npixel terms.

But yes, we can try too choose the chunk size as a multiple of the HDF5 chunk size of the dataset, as an added bonus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants