Closed
Description
Resampling to a lower frequency requires an anti-aliasing filter, which is what raw.resample()
automatically does. However, there's also a raw.decimate()
method, which does not include an anti-aliasing filter. This is very dangerous (even though it is of course documented) and very confusing to users. (In addition, scipy.signal.decimate()
includes the anti-aliasing filter, so the terminology is also inconsistent.)
I suggest to make raw.decimate()
private (e.g. renaming it to raw._decimate()
), so that users only see raw.resample()
, which is the only method they should ever use (and for those rare cases where you absolutely want to skip the anti-aliasing filter, they can still use the private method).
Metadata
Metadata
Assignees
Labels
No labels
Activity
larsoner commentedon Jul 12, 2023
I don't think we should do this. End users should use methods if and only if they're public. We shouldn't say to people "if you want to accomplish X, you should use private method/attribute Y".
We could add extra layers of protection other ways, though. Not sure if we do this already but we could check
info['lowpass']
and if's not some suitable amount below the Nyquist frequency (we have chosen such a level forepochs.decimate(...)
already) we could by default raise an error by adding a new param likeon_bad_lowpass='error'
-- I don't love the name but hopefully you get the idea. Anyone who has correctly/safely doneraw.filter(..., h_freq=...).decimate(...)
will have no change in behavior, but users who are doing something dangerous will now get an error by default, which they can turn into a warning or ignore like usual withon_*
kwargs.cbrnr commentedon Jul 12, 2023
I'd rather remove the public method. What you suggest is definitely an improvement over the current situation, but people should really use
.resample()
. Even if they applied a suitable lowpass filter, it doesn't hurt to have another filter withresample
.Or is there another reason why you would like to keep
decimate
around?larsoner commentedon Jul 12, 2023
A number of reasons
epochs.decimate
resample
if you get unlucky with therfft
radix/next_fast_len
(watch out for primes 😱 !)resample
if you don't have to because some of its behaviors are less than ideal (e.g., assumes periodicity, see also MRG: Add polyphase resampling #5136)decimate
in the first place that I haven't listed here (feel free to look back at the history)agramfort commentedon Jul 12, 2023
cbrnr commentedon Jul 12, 2023
Sure, although I don't really understand the reason. Resampling = low pass filter + decimation.
agramfort commentedon Jul 12, 2023
drammock commentedon Jul 12, 2023
@cbrnr it seems like making
decimate
private is a non-starter. Do you want to change this issue's title to reflect the smaller change proposed by @larsoner:or should we instead close as
not planned
?[-]Resampling is confusing, so make `raw.decimate()` private[/-][+]Consider unifying `raw.resample()` and `raw.decimate()`[/+]cbrnr commentedon Jul 13, 2023
@agramfort our filter design as per EEGLAB is super great, but it has little to do with designing a filter for resampling. We're relying on SciPy for resampling, which includes appropriate anti-aliasing filters.
@drammock I've changed the title to reflect what might be a good compromise. We could add a new
antialiasing=True
parameter toraw.resample()
, which ifFalse
would be the currentraw.decimate()
behavior. This would solve the problem of having two resampling-related methods directly in the object's namespace, which is the main problem I wanted to address.agramfort commentedon Jul 13, 2023
larsoner commentedon Jul 13, 2023
Ahh I misunderstood then -- based on your top post I (mis?)read that the danger and risks of using
.decimate
was the primary issue at hand. My suggestion of a newon_*
kwarg above indeed is meant to deal with this aspect and not the idea of unification/namespace deduplication.Point (4) I brought up above is still relevant though:
Our changing the API on people (especially when their code is fine!) has been a valid complaint of people for a while so I think our burden/justification for doing so should be fairly high. I still don't think the burden has been met to break people's code to achieve this naming clarification/unification.
Beyond that, I think the proposal to move only to
resample
is more drastic a change than it seems at first. We currently haveraw.decimate
,raw.resample
, andepochs.decimate
,epochs.resample
, andEpochs(..., decim=)
. Historically,epochs.decimate
has been around since 0.10 andraw.resample
andEpochs(..., decim=...)
since before that I think -- so our naming/use of these terms has been established and used by people for over 7 years now. To truly fully unify these, we'd need to move for exampleEpochs(..., resample=..., antialias=...)
and also removeepochs.decimate
. To me anything short of this and we still have the same redundancy problem somewhere that we have to explain, but changing all of these (includingEpochs(..., decim=...)
will break a ton of end-user code.One final conceptual point potentially in favor of keeping the functions separate -- the code and operations are actually quite different, and don't work in equivalent circumstances. Our
resample
relies onscipy.signal.resample
, which 1) allows resampling to an arbitrary number of samples and 2) always (implicitly or explicitly) applies an antialiasing filter (there is no way to disable it).decimate
on the other hand can only subsample by an integer factor (e.g., no 160 Hz to 100 Hz), so in the new API design people might think "oh I've low-passed so I'll setantialias=False
and it should work" but it won't because they're trying to go from 250 Hz to 100 Hz for example.cbrnr commentedon Jul 13, 2023
This would be a big change indeed, and I'm not sure if it is worth it.
However, one comment re breaking people's code. I never understood why we have to maintain compatibility to ancient versions (here going back to at least version 0.10). Code written with a particular MNE version will continue to work with that particular version. With proper deprecation messages, I feel like users should be able to adapt their code to any new API we're changing. Again, this is for new code only – they don't need to change a single line in their historic code if they continue to use the corresponding historic MNE version.
Re my main point, two methods is the main problem, but people misusing (misunderstanding) the
decimate()
method is also relevant. In my experience,resample()
should be used in 99% of all cases (that's why e.g. EEGLAB has only oneresample
function). I have yet to see an example whereresample
does not work (as in produces an error and forces you to apply an anti-aliasing filter yourself followed bydecimate
). In practice, users see both methods and are confused which one is the correct way to resample their data.agramfort commentedon Jul 13, 2023
cbrnr commentedon Jul 13, 2023
Yes, I absolutely agree! We could even make it longer, but we should not be afraid to break people's 5 year old code if we deem the change important enough.
100% with you. But this also means they don't read our docs, where we already highlight the dangers and differences of
decimate()
.Yes, we already have a lot of explanations, but maybe we can still improve them with what @larsoner has suggested. But I think at the end of the day, we won't be able to fully address the confusion between
decimate
andresample
with only docs and warnings.cbrnr commentedon Jul 13, 2023
Another thought, our
decimate
function is equal to slicing. So a name change to e.g.slice
would probably also clear up some confusion.10 remaining items