-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Currently, rec_to_nwb does the following:
- calls
rec_to_binarieswhich converts the raw ephys voltage data from .rec to .mda format (dtype = int16; ADC units) - parses the
"raw_data_to_volts"key from the metadata YAML. according to a jan 2021 slack message from loren, this value should always be set to 0.000000195 (or 1.95e-7) - multiplies the above value by 1e6 to get the conversion factor from raw to uV (
0.195). this matches the value stored in the .rec xml file headers (rawScalingToUv="0.19500000000000001") - multiplies the raw int16 data (in ADC units) from the .mda file by the above value (
0.195) and then sets the dtype to int16, which truncates any values after the decimal point (0.99 -> 0) - writes this transformed raw data (now in uV) to an NWB ElectricalSeries object named "e-series" with a
1e-6conversion factor, used to convert the data to volts
Because of the data transformation in Step 4 above, there is a loss of precision. Let's say the original .rec file data has values:
>>> np.arange(10, dtype="int16")
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16)
then after multiplying by 0.195 to get the data in uV, the values are:
>>> np.arange(10, dtype="int16") * 0.195
array([0. , 0.195, 0.39 , 0.585, 0.78 , 0.975, 1.17 , 1.365, 1.56 ,
1.755])
then after setting the dtype to int16, the values are:
>>> (np.arange(10, dtype="int16") * 0.195).astype("int16")
array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int16)
Note the loss of precision in the resulting output. If the original data has unique values -100 to 100 (201 possible), then the converted NWB file will have unique values -19 to 19 (39 possible). This could have an impact on spike sorting and LFP filtering - probably a small impact, but still I think some impact?
For the above reason, it is more common to store the raw, untransformed int16 ephys data (ADC units) from an acquisition system as the ElectricalSeries data, and store the conversion factor (here: 0.000000195). However, NWB users (such as Spyglass) have to remember to multiple the data by the conversion factor to get the data in volts. (The NWB team is working on improving this messaging...). Note that this makes using the data just a little slower and converting the data just a little faster.
I suggest that the ephys data be stored in the original ADC units, because currently some precision is lost, and the cost of multiplying during use is small.