Consider unifying `raw.resample()` and `raw.decimate()` #11797

New issue

Closed

#11853

Closed

Consider unifying raw.resample() and raw.decimate()#11797

#11853

Milestone

1.5

cbrnr

Resampling to a lower frequency requires an anti-aliasing filter, which is what raw.resample() automatically does. However, there's also a raw.decimate() method, which does not include an anti-aliasing filter. This is very dangerous (even though it is of course documented) and very confusing to users. (In addition, scipy.signal.decimate() includes the anti-aliasing filter, so the terminology is also inconsistent.)

I suggest to make raw.decimate() private (e.g. renaming it to raw._decimate()), so that users only see raw.resample(), which is the only method they should ever use (and for those rare cases where you absolutely want to skip the anti-aliasing filter, they can still use the private method).

larsoner

Member

(and for those rare cases where you absolutely want to skip the anti-aliasing filter, they can still use the private method).

I don't think we should do this. End users should use methods if and only if they're public. We shouldn't say to people "if you want to accomplish X, you should use private method/attribute Y".

We could add extra layers of protection other ways, though. Not sure if we do this already but we could check info['lowpass'] and if's not some suitable amount below the Nyquist frequency (we have chosen such a level for epochs.decimate(...) already) we could by default raise an error by adding a new param like on_bad_lowpass='error' -- I don't love the name but hopefully you get the idea. Anyone who has correctly/safely done raw.filter(..., h_freq=...).decimate(...) will have no change in behavior, but users who are doing something dangerous will now get an error by default, which they can turn into a warning or ignore like usual with on_* kwargs.

cbrnr

ContributorAuthor

I'd rather remove the public method. What you suggest is definitely an improvement over the current situation, but people should really use .resample(). Even if they applied a suitable lowpass filter, it doesn't hurt to have another filter with resample.

Or is there another reason why you would like to keep decimate around?

larsoner

Member

A number of reasons

Consistency with epochs.decimate
It can be much faster than resample if you get unlucky with the rfft radix/next_fast_len (watch out for primes 😱 !)
It's better not to use resample if you don't have to because some of its behaviors are less than ideal (e.g., assumes periodicity, see also MRG: Add polyphase resampling #5136)
Removing it will break some people's currently valid workflows, and we shouldn't do that without very good reasons (I don't think this burden has been met)
Whatever other reasons we might have had to add decimate in the first place that I haven't listed here (feel free to look back at the history)

agramfort

Member

I respectfully disagree too. It’s much harder numerically to resample than low pass and decimate. It’s also faster. We should have a warning but please don’t deprecate decimate. I never use resample.

cbrnr

ContributorAuthor

Sure, although I don't really understand the reason. Resampling = low pass filter + decimation.

agramfort

Member

Filter design for resampling is far from trivial See how many iterations we did to converge on our filter design. We ended up matching roughly eeglab behavior which is quite different from mne-c filters for example

drammock

Member

@cbrnr it seems like making decimate private is a non-starter. Do you want to change this issue's title to reflect the smaller change proposed by @larsoner:

We could add extra layers of protection other ways, though. Not sure if we do this already but we could check info['lowpass'] and if's not some suitable amount below the Nyquist frequency (we have chosen such a level for epochs.decimate(...) already) we could by default raise an error by adding a new param like on_bad_lowpass='error' -- I don't love the name but hopefully you get the idea. Anyone who has correctly/safely done raw.filter(..., h_freq=...).decimate(...) will have no change in behavior, but users who are doing something dangerous will now get an error by default, which they can turn into a warning or ignore like usual with on_* kwargs.

or should we instead close as not planned?

cbrnr

changed the title ~~[-]Resampling is confusing, so make `raw.decimate()` private[/-]~~ Consider unifying `raw.resample()` and `raw.decimate()`

on Jul 13, 2023

cbrnr

ContributorAuthor

@agramfort our filter design as per EEGLAB is super great, but it has little to do with designing a filter for resampling. We're relying on SciPy for resampling, which includes appropriate anti-aliasing filters.

@drammock I've changed the title to reflect what might be a good compromise. We could add a new antialiasing=True parameter to raw.resample(), which if False would be the current raw.decimate() behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

agramfort

Member

@drammock <https://github.com/drammock> I've changed the title to reflect what might be a good compromise. We could add a new antialiasing=True parameter to raw.resample(), which if False would be the current raw.decimate() behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

that would be ok for me.

…

Message ID: ***@***.***>

larsoner

Member

This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

Ahh I misunderstood then -- based on your top post I (mis?)read that the danger and risks of using .decimate was the primary issue at hand. My suggestion of a new on_* kwarg above indeed is meant to deal with this aspect and not the idea of unification/namespace deduplication.

We could add a new antialiasing=True parameter to raw.resample(), which if False would be the current raw.decimate() behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

that would be ok for me.

Point (4) I brought up above is still relevant though:

Removing [the decimate method] will break some people's currently valid workflows, and we shouldn't do that without very good reasons (I don't think this burden has been met)

Our changing the API on people (especially when their code is fine!) has been a valid complaint of people for a while so I think our burden/justification for doing so should be fairly high. I still don't think the burden has been met to break people's code to achieve this naming clarification/unification.

Beyond that, I think the proposal to move only to resample is more drastic a change than it seems at first. We currently have raw.decimate, raw.resample, and epochs.decimate, epochs.resample, and Epochs(..., decim=). Historically, epochs.decimate has been around since 0.10 and raw.resample and Epochs(..., decim=...) since before that I think -- so our naming/use of these terms has been established and used by people for over 7 years now. To truly fully unify these, we'd need to move for example Epochs(..., resample=..., antialias=...) and also remove epochs.decimate. To me anything short of this and we still have the same redundancy problem somewhere that we have to explain, but changing all of these (including Epochs(..., decim=...) will break a ton of end-user code.

One final conceptual point potentially in favor of keeping the functions separate -- the code and operations are actually quite different, and don't work in equivalent circumstances. Our resample relies on scipy.signal.resample, which 1) allows resampling to an arbitrary number of samples and 2) always (implicitly or explicitly) applies an antialiasing filter (there is no way to disable it). decimate on the other hand can only subsample by an integer factor (e.g., no 160 Hz to 100 Hz), so in the new API design people might think "oh I've low-passed so I'll set antialias=False and it should work" but it won't because they're trying to go from 250 Hz to 100 Hz for example.

cbrnr

ContributorAuthor

This would be a big change indeed, and I'm not sure if it is worth it.

However, one comment re breaking people's code. I never understood why we have to maintain compatibility to ancient versions (here going back to at least version 0.10). Code written with a particular MNE version will continue to work with that particular version. With proper deprecation messages, I feel like users should be able to adapt their code to any new API we're changing. Again, this is for new code only – they don't need to change a single line in their historic code if they continue to use the corresponding historic MNE version.

Re my main point, two methods is the main problem, but people misusing (misunderstanding) the decimate() method is also relevant. In my experience, resample() should be used in 99% of all cases (that's why e.g. EEGLAB has only one resample function). I have yet to see an example where resample does not work (as in produces an error and forces you to apply an anti-aliasing filter yourself followed by decimate). In practice, users see both methods and are confused which one is the correct way to resample their data.

agramfort

Member

This would be a big change indeed, and I'm not sure if it is worth it. However, one comment re breaking people's code. I never understood why we have to maintain compatibility to ancient versions (here going back to at least version 0.10). Users with analysis code written with a particular MNE version will continue to work with that particular version.

we have a 2 version deprecation policy so we do this. Deprecate with messages and guarantee it works for the 2 next versions it it works now without deprecation warning. it's very standard in open source.

With proper deprecation messages, I feel like users should be able to adapt their code to any new API we're changing. Again, this is for *new* code only – they don't need to change a single line in their historic code if they continue to use the corresponding historic MNE version.

this assumes people read warnings and that people understand the code they were given. I agree it's annoying but the feedback we got many times suggests it's the reality.

Re my main point, two methods is the main problem, but people misusing (misunderstanding) the decimate() method is also relevant. In my experience, resample() should be used in 99% of all cases (that's why e.g. EEGLAB has only one resample function). I have yet to see an example where resample does not work (as in produces an error and forces you to apply an anti-aliasing filter yourself followed by decimate). In practice, users see both methods and are confused which one is the correct way to resample their data.

can you see how explicit are our docstrings? Do we have see also sections? do we have the proper warnings? I would start there

…

Message ID: ***@***.***>

cbrnr

ContributorAuthor

we have a 2 version deprecation policy so we do this. Deprecate with messages and guarantee it works for the 2 next versions it it works now without deprecation warning. it's very standard in open source.

Yes, I absolutely agree! We could even make it longer, but we should not be afraid to break people's 5 year old code if we deem the change important enough.

this assumes people read warnings and that people understand the code they were given. I agree it's annoying but the feedback we got many times suggests it's the reality.

100% with you. But this also means they don't read our docs, where we already highlight the dangers and differences of decimate().

can you see how explicit are our docstrings? Do we have see also sections? do we have the proper warnings? I would start there

Yes, we already have a lot of explanations, but maybe we can still improve them with what @larsoner has suggested. But I think at the end of the day, we won't be able to fully address the confusion between decimate and resample with only docs and warnings.

cbrnr

ContributorAuthor

Another thought, our decimate function is equal to slicing. So a name change to e.g. slice would probably also clear up some confusion.

10 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

1.5
Closed Aug 15, 2023, 100% complete

Relationships

None yet

Development

BUG: Only add decimate to correct classesmne-tools/mne-python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider unifying `raw.resample()` and `raw.decimate()` #11797

10 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Consider unifying raw.resample() and raw.decimate() #11797

Description

Activity

larsoner commented on Jul 12, 2023

cbrnr commented on Jul 12, 2023

larsoner commented on Jul 12, 2023

agramfort commented on Jul 12, 2023

cbrnr commented on Jul 12, 2023

agramfort commented on Jul 12, 2023

drammock commented on Jul 12, 2023

cbrnr commented on Jul 13, 2023

agramfort commented on Jul 13, 2023

larsoner commented on Jul 13, 2023

cbrnr commented on Jul 13, 2023

agramfort commented on Jul 13, 2023

cbrnr commented on Jul 13, 2023

cbrnr commented on Jul 13, 2023

10 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

Consider unifying `raw.resample()` and `raw.decimate()` #11797