Python 3.x support + PEP8 Cleanup#10
Conversation
EDF.py
Outdated
| # meas_info should have ['record_length', 'magic', 'hour', 'subject_id', 'recording_id', 'n_records', 'month', 'subtype', 'second', 'nchan', 'data_size', 'data_offset', 'lowpass', 'year', 'highpass', 'day', 'minute'] | ||
| # chan_info should have ['physical_min', 'transducers', 'physical_max', 'digital_max', 'ch_names', 'n_samps', 'units', 'digital_min'] | ||
| #################################################################################################### | ||
| return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num] |
There was a problem hiding this comment.
We don't have tests to show this, but this looks exactly like the tye of thing that will break in Python 3. In the padding case, a string is returned. In the trim case, it's unclear whether a string or bytes are returned.
There was a problem hiding this comment.
This syntax is globally accepted and in my system, it has not break in both py2.7 and py3.x yet. You may test it individually and let me know if it does.
There was a problem hiding this comment.
This is one of the downsides to runtime typing. The syntax is valid, but you might still get an error depending on the types passed.
There was a problem hiding this comment.
I just checked the code and errors are unlikely. See,
padtrim() is a helper function used to standardize header content to pass into writebyte(). In Python 2.x, strings and bytes are synonymous unlike in python 3.x. If you check the logic of writebyte(), it has error-handling support which will deal with both bytes and string formats i.e. output will be a bytes value. If some other format is returned (such as float or boolean), which is very unlikely logically, then we have to worry. I am not sure how I should write a unit test for least likely scenario
There was a problem hiding this comment.
There is an advantageous to having consistent types internally. For example, this could be fixed by having the trim case also convert to str. Alternatively, if doing the preferred way of dealing with str in Python 3, we assume that everything internally is str and only the actual I/O operations are dealing with bytes and those are doing the conversion appropriately (then we don't need the conversion to str in the pad case). writebyte handles the output bit of that, and I think (but haven't checked) that all of the file.read are explicitly converted to str (or int).
EDF.py
Outdated
| self.init_properties() | ||
| if fname: | ||
| self.open(fname) | ||
|
|
||
| def init_properties(self): | ||
| self.fname = None | ||
| self.meas_info = None | ||
| self.chan_info = None | ||
| self.calibrate = None | ||
| self.offset = None | ||
| self.n_records = 0 | ||
| if fname: | ||
| self.open(fname) | ||
|
|
||
|
|
There was a problem hiding this comment.
I don't understand why this change was necessary. If it were possible to initialize the properties elsewhere, this would make sense, but that isn't the case. Even then, init_properties should be private: _init_properties.
There was a problem hiding this comment.
The function init_properties() contains a few value reset lines. These lines are actually called again in present line 118 when we are closing the EDF file. This function wrap only reduces the redundant lines and now the original call is covered in init()
I am explaining this after 10 months of original edit. If it sounds mistaken, let me know.
| #################################################################################################### | ||
| # the following are a number of helper functions to make the behaviour of this EDFReader | ||
| # class more similar to https://bitbucket.org/cleemesser/python-edf/ | ||
| #################################################################################################### |
There was a problem hiding this comment.
This section is covered in details in the added documentation -
Helper Methods: The following are a number of helper functions to make
the behaviour of this EDFReader class more similar to
https://bitbucket.org/cleemesser/python-edf/
get_signal_text_labels: Convert Signal Text Labels from unicode
to strings.
get_n_signals: Get Number of Channels.
get_signal_freqs: Get Signal Frequencies.
get_n_samples: Get Total Number of Samples.
read_signal: Reads Entire Signal Record and returns the
values as an array.
There was a problem hiding this comment.
There is value to having the documentation close to the relevant code.
EDF.py
Outdated
| implementation details, check out https://www.edfplus.info/index.html | ||
| """ | ||
| __version__ = "0.4.0" | ||
| __author__ = "https://github.com/robertoostenveld | https://github.com/bsandeepan95>" |
There was a problem hiding this comment.
I think traditionally this is name + email address. Note also the distinction between author, credits/contributora, and maintainera.
There was a problem hiding this comment.
Yes. I don't know robert's mail id so I put it like this. Even in original code, he has not added his email id or anything as far as I have checked. I wished to do it properly using his name as author and ours as contributors but I don't have the necessary resources. Hence I had to wait till this code review.
@robertoostenveld please send me your email id so I can add it properly.
@palday you too.
There was a problem hiding this comment.
If you look at the output of git log locally, you'll see committer email addresses. 😄
|
One thing I forgot: |
|
@palday I am new to this pull requests so I there are some changes that I wish to do right away and commit but it seems there are many such ones and github is providing me a "Commit Change" button. If I use that button, will it make numerous new comments for each change I save, or will it stage them together later when the pull is successful? Seeking your guidance. |
|
I think "Commit change" creates a new commit for every single change, but lots of small commits are fine -- especially because they'll all be squashed during the merge. Thanks for doing this: I know it's a lot of effort, and we do appreciate this. My review is critical, but it's meant to be constructive. |
EDF.py
Outdated
| return (str(buf) + ' ' * num) if (num >= 0) else buf[0:num] | ||
|
|
||
|
|
||
| def writebyte(file, content, encoding='utf8'): |
There was a problem hiding this comment.
| def writebyte(file, content, encoding='utf8'): | |
| def writebyte(file, content, encoding='ascii'): |
writebyte should default to using ASCII encoding because that's what the EDF spec actually requires.
There was a problem hiding this comment.
I remember you asking me to do this a year ago. I forgot. Apologies. 😅
| try: | ||
| # Py3 onwards bytes and strings are separate data format | ||
| content = bytes(content, encoding) | ||
| except TypeError: | ||
| # Py2.7 Support | ||
| pass | ||
| except Exception as e: | ||
| print(type(e)) | ||
| print(str(e)) | ||
| print( | ||
| "If you see this message, please go to " + | ||
| "https://github.com/bids-standard/pyedf/issues" + | ||
| " and open an issue there regarding this. Thank you!") | ||
| finally: | ||
| file.write(content) |
There was a problem hiding this comment.
| try: | |
| # Py3 onwards bytes and strings are separate data format | |
| content = bytes(content, encoding) | |
| except TypeError: | |
| # Py2.7 Support | |
| pass | |
| except Exception as e: | |
| print(type(e)) | |
| print(str(e)) | |
| print( | |
| "If you see this message, please go to " + | |
| "https://github.com/bids-standard/pyedf/issues" + | |
| " and open an issue there regarding this. Thank you!") | |
| finally: | |
| file.write(content) | |
| if type(content) != bytes: | |
| content = content.encode(encoding) | |
| file.write(content) |
(This works in Python 2.7 and Python 3 -- I haven't tested it in older Python 2, but am not interesting in supporting that!)
Also, note that the finally clause here doesn't make much sense: if there has been a conversion error, then you don't want to try to write the bytes out to file. It would be better to simply leave out the extra except Exception as e and finally clauses and have file.write be after the try block. But in this case, there's an even easier solution (see suggestion).
There was a problem hiding this comment.
Since I implemented Error Handling block here, I used finally clause to help people understand the code flow. Since there is only one line left to do after the block, and I know developers who can indent such lines one level inside the except clause thinking it as a mistake on our end. I wish to avoid clearing up such confusion in the future if anyone faces this.
I added the except Exception as e clause to ensure people know what to do if they face such error. I can always do away with this. This is only there for least expected scenarios.
bytes is not a supported type in Python 2.7 so the if clause will not work there. However, as we have already added the support in except TypeError clause, how about we put the content = content.encode(encoding) line in there? This will ensure that encoding is done correctly for Python 2.7.
There was a problem hiding this comment.
Python 2.7.17 (default, Jul 20 2020, 15:37:01)
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "a"
>>> type(x)
<type 'str'>
>>> type(x) == bytes
True
>>> type(x) == str
TrueThere was a problem hiding this comment.
Right. bytes is a synonym for str in Python 2.7.
I mean, if we are to do ascii encoding either way, why not enforce the line in the TypeError block, which will be triggered in Python 2.7 since default str encoding scheme is utf-8.
There was a problem hiding this comment.
Okay, I ran a bit of code for testing the error handling block in Python 2.7 and I feel the block is necessary. Run the following example -
# -*- coding: utf-8 -*-
enc = 'ascii'
x = 'µ Õ ß'
print x
try:
y = x.encode(enc)
except Exception as e:
print("Error Message: " + str(e))
print("Error Type: " + str(type(e)))
y = 'fail'
print y
print (x == y)The output will be like the following -
Python 2.7.16 (default, Jul 13 2019, 16:01:51)
[GCC 8.3.0] on linux2
µ Õ ß
Error Message: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Error Type: <type 'exceptions.UnicodeDecodeError'>
fail
FalseSo errors will occur if there are non-ascii characters, and the code will fail. Now the question is if it is better to let the code fail, or if we should handle the error gracefully and display it to the console for the developer. Let me know your opinion about this @palday .
Please do. I am grateful for your review. It is making me rethink some of my past coding choices and shaping my perspective for future work. I may sometimes come out difficult to communicate, but I am trying to improve my own bugs too here. 😄 Strict review is necessary. Else we get unexpected drivers issues that discourage me from using Ubuntu Linux in my AMD Vega Graphics Workstation. >__> |
|
@palday, check the new commits |
|
I need to write some tests to check this, but it's on my radar. |
|
@palday I know we agreed to add the latest documentations in a later PR, but will it help the tests if I add them in this PR? |
|
Having gone back through the history of this PR .... I'm somewhat surprised that it didn't take advantage of the work already done in #4. But we can't change the past. I think the best solution for now is to merge this into a There seem to be some merge conflicts (which I don't quite understand since |
Kindly review the code and let me know if the commits are okay. I will add the docstrings for functions and methods in a separate pull request as @palday asked me to do earlier.