Skip to content

RANSAC correlations (and logic) differ between PyPREP and MATLAB PREP #65

Closed
@a-hurst

Description

@a-hurst

This bit has been harder to compare than earlier parts, but I think I can say with a high degree of confidence now that RANSAC correlations differ between PyPREP and MATLAB PREP.

Before I get into the details, some other relevant stuff I've learned:

  • MatPREP's internal spherical_interpolation function and MNE's internal _make_interpolation_matrix functions produce identical outputs, given the same inputs! This is a huge relief, and one less thing we have to worry about or figure out.
  • Conversely, there's something weird about how MNE's mne.channels.read_custom_montage imports .elp files, like the one bundled with EEGLAB we've been using for the MatPREP data. Specifically, if I just load my pre-cleanlined and montage'd .set/.fdt files into MNE, the Cartesian X/Y/Z coordinates for the electrodes more-or-less line up with MATLAB (the X and Y axes are swapped) and the interpolation matrices match exactly. However, if I re-read the standard-10-5-cap385.elp montage into MNE and apply it to the file, the coordinates and interpolation matrices both differ quite a bit. Not sure what this is all about, but regardless I don't think it's a PyPREP bug.

Anyway, the gist of the difference (and why it's been so hard to dig into): MatPREP and PyPREP have two very different approaches of getting the same RANSAC correlation results. Here's a quick overview:

MATLAB PREP's RANSAC:

  1. Choose random channels for each RANSAC sample.
  2. Generate interpolation matrix for all channels for each RANSAC sample.
  3. For each 5 second RANSAC window in the data, calculate the predicted signal for each channel across each of the 50 RANSAC samples, and then correlate the predicted values with the actual values to get a correlation for each channel in that window.
  4. After calculating a matrix of correlations of size [number_of_ransac_windows, number_of_channels], apply thresholding and identify channels that fall below target.

PyPREP's RANSAC:

  1. Choose random channels for each RANSAC sample.
  2. Determine the amount of available RAM and split the channels-to-predict into chunks accordingly.
  3. For each chunk of channels to predict:
    • For each of the 50 RANSAC samples, calculate the interpolation matrix and predict the full signal for those channels, saving the output into one large eeg_predictions matrix.
    • After interpolating the signal for all channels in the chunk for all 50 samples, calculate the median EEG signal for each channel from those 50 individual predictions.
    • For each 5 second RANSAC window in the data, calculate the correlation between the median predicted signal and the actual signal for the channels in that chunk.
  4. After calculating a matrix of correlations of size [number_of_ransac_windows, number_of_channels], apply thresholding and identify channels that fall below target.

Anyway, the output correlations of each implementation differ, even with the same input. I know there's a deliberate difference somewhere regarding a median calculation, but I couldn't remember what it was and given how different the code layout is it hasn't popped out to me. @sappelhoff @yjmantilla @jasmainak do any of you remember what this difference was, so I can try reversing it and see if it makes things match up?

Also regardless of other factors, the MatPREP approach seems to have much lower RAM demands since at no point are they estimating signals across the whole duration of the file. Should we consider changing PyPREP to work in a similar manner?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions