-
Notifications
You must be signed in to change notification settings - Fork 11
feat: Add check_time_intervals_duration to verify TimeIntervals table duration and include unit tests.
#635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…le duration and include unit tests.
for more information, see https://pre-commit.ci
…st practices for tables
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #635 +/- ##
==========================================
+ Coverage 73.03% 76.95% +3.92%
==========================================
Files 47 47
Lines 1587 1610 +23
==========================================
+ Hits 1159 1239 +80
+ Misses 428 371 -57
🚀 New features to boost your workflow:
|
h-mayorquin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
| return None | ||
|
|
||
|
|
||
| @register_check(importance=Importance.CRITICAL, neurodata_type=TimeIntervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a critical error? We are basically saying that they won't ever be such cases. Seems too strong to me.
We should have a way of handling "things that are usually errors but there is a small probability that the user knows what they are doing and they can move forward"
The solution could a non-trivial barrier to enable this like an environment variable that could skip this error (as a CLI argument would be hard to propagate to DANDI) or something of the like.
| end_times.append(float(time_intervals["stop_time"][-1])) | ||
|
|
||
| # Check for other time columns | ||
| for column_name in time_intervals.colnames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that all the times in the time columns should be smaller than the max(time_intervals[stop_time`].data), right? Maybe that could be a check on its own.
| end_times = [] | ||
|
|
||
| # Check for start_time and stop_time columns | ||
| if "start_time" in time_intervals.colnames and len(time_intervals["start_time"]) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and with the other time columns we are assuming that the time columns are already well-ordered. Is there a way of running checks in order? Maybe this check does not make sense if the other fails and the output will confuse rather than clarify if the ascending order check fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this check requires that the times are in order. If they are not, this check may fail to raise a message. I suppose we could just read all the data. That's probably fine in most cases -- I doubt we'll come across many datasets where these datasets are large. Do you think it's better to just read all of the time arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely more safe.
If we want to be more efficient maybe we can just combine these two checks:
nwbinspector/src/nwbinspector/checks/_tables.py
Lines 52 to 80 in d8382c9
| @register_check(importance=Importance.BEST_PRACTICE_VIOLATION, neurodata_type=TimeIntervals) | |
| def check_time_interval_time_columns( | |
| time_intervals: TimeIntervals, nelems: Optional[int] = NELEMS | |
| ) -> Optional[InspectorMessage]: | |
| """ | |
| Check that time columns are in ascending order. | |
| Parameters | |
| ---------- | |
| time_intervals: TimeIntervals | |
| nelems: int, optional | |
| Only check the first {nelems} elements. This is useful in case there columns are | |
| very long so you don't need to load the entire array into memory. Use None to | |
| load the entire arrays. | |
| """ | |
| unsorted_cols = [] | |
| for column in time_intervals.columns: | |
| if column.name == "start_time": | |
| if not is_ascending_series(column.data, nelems): | |
| unsorted_cols.append(column.name) | |
| if unsorted_cols: | |
| return InspectorMessage( | |
| message=( | |
| f"{unsorted_cols} are time columns but the values are not in ascending order. " | |
| "All times should be in seconds with respect to the session start time." | |
| ) | |
| ) | |
| return None |
To avoid reading them twice.
I think either way is fine.
a more modular PR for #628