Skip to content

Consider unifying raw.resample() and raw.decimate() #11797

Closed
@cbrnr

Description

@cbrnr

Resampling to a lower frequency requires an anti-aliasing filter, which is what raw.resample() automatically does. However, there's also a raw.decimate() method, which does not include an anti-aliasing filter. This is very dangerous (even though it is of course documented) and very confusing to users. (In addition, scipy.signal.decimate() includes the anti-aliasing filter, so the terminology is also inconsistent.)

I suggest to make raw.decimate() private (e.g. renaming it to raw._decimate()), so that users only see raw.resample(), which is the only method they should ever use (and for those rare cases where you absolutely want to skip the anti-aliasing filter, they can still use the private method).

Activity

larsoner

larsoner commented on Jul 12, 2023

@larsoner
Member

(and for those rare cases where you absolutely want to skip the anti-aliasing filter, they can still use the private method).

I don't think we should do this. End users should use methods if and only if they're public. We shouldn't say to people "if you want to accomplish X, you should use private method/attribute Y".

We could add extra layers of protection other ways, though. Not sure if we do this already but we could check info['lowpass'] and if's not some suitable amount below the Nyquist frequency (we have chosen such a level for epochs.decimate(...) already) we could by default raise an error by adding a new param like on_bad_lowpass='error' -- I don't love the name but hopefully you get the idea. Anyone who has correctly/safely done raw.filter(..., h_freq=...).decimate(...) will have no change in behavior, but users who are doing something dangerous will now get an error by default, which they can turn into a warning or ignore like usual with on_* kwargs.

cbrnr

cbrnr commented on Jul 12, 2023

@cbrnr
ContributorAuthor

I'd rather remove the public method. What you suggest is definitely an improvement over the current situation, but people should really use .resample(). Even if they applied a suitable lowpass filter, it doesn't hurt to have another filter with resample.

Or is there another reason why you would like to keep decimate around?

larsoner

larsoner commented on Jul 12, 2023

@larsoner
Member

A number of reasons

  1. Consistency with epochs.decimate
  2. It can be much faster than resample if you get unlucky with the rfft radix/next_fast_len (watch out for primes 😱 !)
  3. It's better not to use resample if you don't have to because some of its behaviors are less than ideal (e.g., assumes periodicity, see also MRG: Add polyphase resampling #5136)
  4. Removing it will break some people's currently valid workflows, and we shouldn't do that without very good reasons (I don't think this burden has been met)
  5. Whatever other reasons we might have had to add decimate in the first place that I haven't listed here (feel free to look back at the history)
agramfort

agramfort commented on Jul 12, 2023

@agramfort
Member
cbrnr

cbrnr commented on Jul 12, 2023

@cbrnr
ContributorAuthor

Sure, although I don't really understand the reason. Resampling = low pass filter + decimation.

agramfort

agramfort commented on Jul 12, 2023

@agramfort
Member
drammock

drammock commented on Jul 12, 2023

@drammock
Member

@cbrnr it seems like making decimate private is a non-starter. Do you want to change this issue's title to reflect the smaller change proposed by @larsoner:

We could add extra layers of protection other ways, though. Not sure if we do this already but we could check info['lowpass'] and if's not some suitable amount below the Nyquist frequency (we have chosen such a level for epochs.decimate(...) already) we could by default raise an error by adding a new param like on_bad_lowpass='error' -- I don't love the name but hopefully you get the idea. Anyone who has correctly/safely done raw.filter(..., h_freq=...).decimate(...) will have no change in behavior, but users who are doing something dangerous will now get an error by default, which they can turn into a warning or ignore like usual with on_* kwargs.

or should we instead close as not planned?

changed the title [-]Resampling is confusing, so make `raw.decimate()` private[/-] [+]Consider unifying `raw.resample()` and `raw.decimate()`[/+] on Jul 13, 2023
cbrnr

cbrnr commented on Jul 13, 2023

@cbrnr
ContributorAuthor

@agramfort our filter design as per EEGLAB is super great, but it has little to do with designing a filter for resampling. We're relying on SciPy for resampling, which includes appropriate anti-aliasing filters.

@drammock I've changed the title to reflect what might be a good compromise. We could add a new antialiasing=True parameter to raw.resample(), which if False would be the current raw.decimate() behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

agramfort

agramfort commented on Jul 13, 2023

@agramfort
Member
larsoner

larsoner commented on Jul 13, 2023

@larsoner
Member

This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

Ahh I misunderstood then -- based on your top post I (mis?)read that the danger and risks of using .decimate was the primary issue at hand. My suggestion of a new on_* kwarg above indeed is meant to deal with this aspect and not the idea of unification/namespace deduplication.

We could add a new antialiasing=True parameter to raw.resample(), which if False would be the current raw.decimate() behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.

that would be ok for me.

Point (4) I brought up above is still relevant though:

  1. Removing [the decimate method] will break some people's currently valid workflows, and we shouldn't do that without very good reasons (I don't think this burden has been met)

Our changing the API on people (especially when their code is fine!) has been a valid complaint of people for a while so I think our burden/justification for doing so should be fairly high. I still don't think the burden has been met to break people's code to achieve this naming clarification/unification.

Beyond that, I think the proposal to move only to resample is more drastic a change than it seems at first. We currently have raw.decimate, raw.resample, and epochs.decimate, epochs.resample, and Epochs(..., decim=). Historically, epochs.decimate has been around since 0.10 and raw.resample and Epochs(..., decim=...) since before that I think -- so our naming/use of these terms has been established and used by people for over 7 years now. To truly fully unify these, we'd need to move for example Epochs(..., resample=..., antialias=...) and also remove epochs.decimate. To me anything short of this and we still have the same redundancy problem somewhere that we have to explain, but changing all of these (including Epochs(..., decim=...) will break a ton of end-user code.

One final conceptual point potentially in favor of keeping the functions separate -- the code and operations are actually quite different, and don't work in equivalent circumstances. Our resample relies on scipy.signal.resample, which 1) allows resampling to an arbitrary number of samples and 2) always (implicitly or explicitly) applies an antialiasing filter (there is no way to disable it). decimate on the other hand can only subsample by an integer factor (e.g., no 160 Hz to 100 Hz), so in the new API design people might think "oh I've low-passed so I'll set antialias=False and it should work" but it won't because they're trying to go from 250 Hz to 100 Hz for example.

cbrnr

cbrnr commented on Jul 13, 2023

@cbrnr
ContributorAuthor

This would be a big change indeed, and I'm not sure if it is worth it.

However, one comment re breaking people's code. I never understood why we have to maintain compatibility to ancient versions (here going back to at least version 0.10). Code written with a particular MNE version will continue to work with that particular version. With proper deprecation messages, I feel like users should be able to adapt their code to any new API we're changing. Again, this is for new code only – they don't need to change a single line in their historic code if they continue to use the corresponding historic MNE version.

Re my main point, two methods is the main problem, but people misusing (misunderstanding) the decimate() method is also relevant. In my experience, resample() should be used in 99% of all cases (that's why e.g. EEGLAB has only one resample function). I have yet to see an example where resample does not work (as in produces an error and forces you to apply an anti-aliasing filter yourself followed by decimate). In practice, users see both methods and are confused which one is the correct way to resample their data.

agramfort

agramfort commented on Jul 13, 2023

@agramfort
Member
cbrnr

cbrnr commented on Jul 13, 2023

@cbrnr
ContributorAuthor

we have a 2 version deprecation policy so we do this. Deprecate with messages and guarantee it works for the 2 next versions it it works now without deprecation warning. it's very standard in open source.

Yes, I absolutely agree! We could even make it longer, but we should not be afraid to break people's 5 year old code if we deem the change important enough.

this assumes people read warnings and that people understand the code they were given. I agree it's annoying but the feedback we got many times suggests it's the reality.

100% with you. But this also means they don't read our docs, where we already highlight the dangers and differences of decimate().

can you see how explicit are our docstrings? Do we have see also sections? do we have the proper warnings? I would start there

Yes, we already have a lot of explanations, but maybe we can still improve them with what @larsoner has suggested. But I think at the end of the day, we won't be able to fully address the confusion between decimate and resample with only docs and warnings.

cbrnr

cbrnr commented on Jul 13, 2023

@cbrnr
ContributorAuthor

Another thought, our decimate function is equal to slicing. So a name change to e.g. slice would probably also clear up some confusion.

10 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @agramfort@drammock@larsoner@cbrnr@jasmainak

      Issue actions

        Consider unifying `raw.resample()` and `raw.decimate()` · Issue #11797 · mne-tools/mne-python