Skip to content

Conversation

@RastislavTuranyi
Copy link
Collaborator

Resolves #24

@RastislavTuranyi RastislavTuranyi added the enhancement New feature or request label Apr 16, 2025
@ajjackson ajjackson force-pushed the add_ideal_instrument branch from 7c46a05 to c183566 Compare January 7, 2026 13:48
Still some details to be worked out on I-C form in energy space (it's
defined for ToF, really). It's not required for MVP so drop for now.
- Boxcar needs a default width or fails validation
- Better to have a float default for float argument
Boxcar does have a standard deviation, not equivalent to length.
Take length as argument (as this is the most natural parameter), but
also derive sigma and mention in the docstring how to do conversion.
@ajjackson
Copy link
Member

ajjackson commented Jan 8, 2026

The boxcar logic is currently inverted. It also seems a bit naive with respect to boxcar sizes that don't evenly divide into the grid; perhaps it would be better to provide linearly-interpolated edges so that normalisation is correct without changing the maximum value.

not_a_boxcar

Astropy seems to handle this by quantising to a whole number of samples and then if this is even we add a sample and set the edges to 0.5: https://docs.astropy.org/en/stable/api/astropy.convolution.Box1DKernel.html Not a bad scheme?

It would be nice to include some nice interpolated versions later, but
for simple testing and sanity checks it is nice if these functions are
designed to always fall neatly on the sample bins. This avoid
surprises such as peak shifts and triangles with flat tops.
Just assemble an output mesh and copy the kernels to it directly.
Probably more efficient than the delta convolutions implementation and
reduces risk of circular dependence of mixins.

Include unit tests (ref data roughly validated but not fully confident yet.)
The convolve() based implementation made assumptions about the input
points aligning perfectly to the output mesh. Also, it seemed to give
the wrong results (peaks not centered correctly).

In principle it is quite inefficient to use the energy-dependent
broaden() method. However as it largely defers to a somewhat efficient
fixed-energy get_peak() method it's not as bad as it looks.

And anyway, these kernels are for testing.

The results look pretty sensible except that, conspicuously, the area
of the broadened boxcar is larger than the area of the input
puslse. So the normalisation of this scheme is not yet correct.
Drop the get_peak() methods on boxcar and triangle that just tweak the
docstring from mixin. Instead make the mixing docstring more general!
@ajjackson
Copy link
Member

I'm getting some nice triangle functions now that snap to grid and obviously have the correct area.

Boxcar is a little more subjective: I'm choosing to normalise area as though it is a trapezoid as this seems most likely to conserve properties correctly.

However at this point I am suspicious of the scaling in general. I think kernels should be normalised by value while peaks are normalised by area. Then the kernel will broaden without modifying overall intensity when convolved against a spectrum. Alternatively we can adopt convention that kernels are normalised by area (as currently) but account for this in broaden() operations using bin_width. Either change is going to break a lot of existing tests, so let's do it outside this PR which focuses on adding new models.

The main purpose for snapping is to avoid surprising changes in shape
across a spectrum, and to facilitate comparison with convolution-based
methods.

For test purposes, we can achieve the same results as a snapped kernel
by choosing the right width parameter. Let's document that part, and
just rely on snapping for efficient peak placement.

Note that test data is unchanged as the test widths were already tidy
multiples of bin width.
Parametrise a general function instead of creating a data function
per-test.
Trapezoid pdf requires c, d parameters as fractions of "scale" ( =
length of base)
The test file is now looking rather concise!
@ajjackson
Copy link
Member

The models are all unit-tested now. The data is generated automatically by running the test file, which also produces images we can use for sanity-checking. They look ok to me, except for the magnitude/area of broadened plots which needs a wider examination. (i.e. the fix will probably break other tests so lets do it outside this PR.)

_get_boxcar_broaden _get_boxcar_kernel _get_boxcar_peak _get_gaussian_broaden _get_gaussian_kernel _get_gaussian_peak _get_lorentzian_broaden _get_lorentzian_kernel _get_lorentzian_peak _get_trapezoid_broaden _get_trapezoid_kernel _get_trapezoid_peak _get_triangle_broaden _get_triangle_kernel _get_triangle_peak

@ajjackson ajjackson marked this pull request as ready for review January 13, 2026 10:52
@ajjackson ajjackson requested a review from oerc0122 January 13, 2026 12:07
@ajjackson
Copy link
Member

@RastislavTuranyi I can't technically request you as reviewer as you started the PR. But if you are interested/available and have any comments now is a good time to make them 😄

@RastislavTuranyi
Copy link
Collaborator Author

I don't think I understand what the problem with normalisation you mention is - the only thing that I can point out is that the SimpleBroaden1DMixin.broaden function appears to be using the get_peak function, not the get_kernel function, so changing the normalisation of get_kernel should not affect broaden at all in the existing cases (only the get_kernel tests should be affected).

Copy link
Collaborator Author

@RastislavTuranyi RastislavTuranyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I've forgotten most of what i had learned, but i had real trouble understanding what is happening with all the bins and rounding and quantisation etc. Would it be possible to write up one robust explanation of what the problem is, how it's been solved, and how that may cause surprises for users? Ideally something i as i am now could understand (that should hopefully cover all users!). Currently, it feels like each model's documentation has a piece of the puzzle, so i think gathering it all up in one place would be valuable

"""
characteristics = {"width": np.ones(len(points)) * self.width}
characteristics["sigma"] = np.full_like(
characteristics["width"], np.sqrt(1 / 12)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you chose to use np.full_like, but isn't the width missing?

Suggested change
characteristics["width"], np.sqrt(1 / 12)
characteristics["width"], np.sqrt(self.width / 12)

Copy link
Member

@ajjackson ajjackson Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it should be self.width * sqrt(1/12) 🤦

Comment on lines 162 to 164
Note that these kernels generated with scipy.signal.uniform are rounded
to an odd-integer width and edges move directly from full-height to
zero. Better but more complicated algorithms exist.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe elaborate a little a bit on what this algorithm is? I think just one sentence at the beginning of this paragraph would make things a lot clearer!

Also, isn't being limited to odd-integer widths really restrictive? That's quite a small subset of real numbers...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is naturally what happens when evaluating the function on loc=(-self.width/2); as soon as the range reaches a new point on one side it also reaches a point on the other side; so it starts at 1 and increases in width by increments of 2.

The tricky part is that there isn't any interpolation to deal the edges of the box car, it always goes from full-height to zero over a width of one sample, so seems to "snap".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can probably be explained more clearly! An earlier version had some explicit logic to do it consistently, then I realised it was unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you're explaining, this sounds more like the other cases in that the rounding is also to integer number of bins rather than to an integer - am i getting that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing with


        Note that these kernels will always consist of an odd-integer width of full-height samples,
        moving directly to zero at the surrounding samples. The area is normalised as though this
        is a trapezoid (i.e. as though lines connect the boxcar top to the surrounding samples),
        resulting in lower height than the ideal boxcar.

Comment on lines 201 to 204
fwhm
The width (in Full-Width Half-Maximum) of the Triangle function. This width is used for all
values of [w, Q]. When realised on a user mesh, the width is rounded to an integer number of
bins to create straight lines from the peak to zero.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how scipy.stats.triag.pdf works? Because I don't see such an operation being explicitly done in get_kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is now inaccurate, thanks for spotting. The correct statement is in the docstring for get_kernel:

Note that shape and area are only exactly correct when FHWM equals an integer number of bins.

Otherwise we get something like
bad_tri

Comment on lines +297 to +299
Note that shape and area are only exactly correct when base lengths correspond to an even number
of bin widths. The get_peak() and broaden() methods will snap input points to the nearest mesh
value; this results in a consistent peak shape.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is similar to the situation in Boxcar and Triangle models? If they all face about the same issue, would it be possible to write one good explanation and either copy paste it into all affected models, or put it one place and then link to it? As you might have noticed, I am quite confused lol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could write an explanation in the docs page including some graphs that make it a bit easier to understand. But then we should still probably have something written in the docstring? Does linking from docstring to sphinx doc pages work well?

DATA_PATH = Path(__file__).parent / "data" / "ideal"


TEST_CASES = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the mesh is an issue for some of the models, could we have a test both for when the bins align with the characteristic, and when they don't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Ideal instrument

3 participants