Skip to content

Loss of precision when converting ephys data from rec to nwb #54

@rly

Description

@rly

Currently, rec_to_nwb does the following:

  1. calls rec_to_binaries which converts the raw ephys voltage data from .rec to .mda format (dtype = int16; ADC units)
  2. parses the "raw_data_to_volts" key from the metadata YAML. according to a jan 2021 slack message from loren, this value should always be set to 0.000000195 (or 1.95e-7)
  3. multiplies the above value by 1e6 to get the conversion factor from raw to uV (0.195). this matches the value stored in the .rec xml file headers (rawScalingToUv="0.19500000000000001")
  4. multiplies the raw int16 data (in ADC units) from the .mda file by the above value (0.195) and then sets the dtype to int16, which truncates any values after the decimal point (0.99 -> 0)
  5. writes this transformed raw data (now in uV) to an NWB ElectricalSeries object named "e-series" with a 1e-6 conversion factor, used to convert the data to volts

Because of the data transformation in Step 4 above, there is a loss of precision. Let's say the original .rec file data has values:

>>> np.arange(10, dtype="int16")
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)

then after multiplying by 0.195 to get the data in uV, the values are:

>>> np.arange(10, dtype="int16") * 0.195
array([0.   , 0.195, 0.39 , 0.585, 0.78 , 0.975, 1.17 , 1.365, 1.56 ,
       1.755])

then after setting the dtype to int16, the values are:

>>> (np.arange(10, dtype="int16") * 0.195).astype("int16")
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int16)

Note the loss of precision in the resulting output. If the original data has unique values -100 to 100 (201 possible), then the converted NWB file will have unique values -19 to 19 (39 possible). This could have an impact on spike sorting and LFP filtering - probably a small impact, but still I think some impact?

For the above reason, it is more common to store the raw, untransformed int16 ephys data (ADC units) from an acquisition system as the ElectricalSeries data, and store the conversion factor (here: 0.000000195). However, NWB users (such as Spyglass) have to remember to multiple the data by the conversion factor to get the data in volts. (The NWB team is working on improving this messaging...). Note that this makes using the data just a little slower and converting the data just a little faster.

I suggest that the ephys data be stored in the original ADC units, because currently some precision is lost, and the cost of multiplying during use is small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions