Skip to content

Conversation

@bendichter
Copy link
Contributor

a more modular PR for #628

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.95%. Comparing base (d8382c9) to head (c944bb5).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #635      +/-   ##
==========================================
+ Coverage   73.03%   76.95%   +3.92%     
==========================================
  Files          47       47              
  Lines        1587     1610      +23     
==========================================
+ Hits         1159     1239      +80     
+ Misses        428      371      -57     
Files with missing lines Coverage Δ
src/nwbinspector/checks/__init__.py 100.00% <ø> (ø)
src/nwbinspector/checks/_tables.py 97.88% <100.00%> (+0.40%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions.

return None


@register_check(importance=Importance.CRITICAL, neurodata_type=TimeIntervals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a critical error? We are basically saying that they won't ever be such cases. Seems too strong to me.

We should have a way of handling "things that are usually errors but there is a small probability that the user knows what they are doing and they can move forward"

The solution could a non-trivial barrier to enable this like an environment variable that could skip this error (as a CLI argument would be hard to propagate to DANDI) or something of the like.

end_times.append(float(time_intervals["stop_time"][-1]))

# Check for other time columns
for column_name in time_intervals.colnames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all the times in the time columns should be smaller than the max(time_intervals[stop_time`].data), right? Maybe that could be a check on its own.

end_times = []

# Check for start_time and stop_time columns
if "start_time" in time_intervals.colnames and len(time_intervals["start_time"]) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and with the other time columns we are assuming that the time columns are already well-ordered. Is there a way of running checks in order? Maybe this check does not make sense if the other fails and the output will confuse rather than clarify if the ascending order check fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this check requires that the times are in order. If they are not, this check may fail to raise a message. I suppose we could just read all the data. That's probably fine in most cases -- I doubt we'll come across many datasets where these datasets are large. Do you think it's better to just read all of the time arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely more safe.

If we want to be more efficient maybe we can just combine these two checks:

@register_check(importance=Importance.BEST_PRACTICE_VIOLATION, neurodata_type=TimeIntervals)
def check_time_interval_time_columns(
time_intervals: TimeIntervals, nelems: Optional[int] = NELEMS
) -> Optional[InspectorMessage]:
"""
Check that time columns are in ascending order.
Parameters
----------
time_intervals: TimeIntervals
nelems: int, optional
Only check the first {nelems} elements. This is useful in case there columns are
very long so you don't need to load the entire array into memory. Use None to
load the entire arrays.
"""
unsorted_cols = []
for column in time_intervals.columns:
if column.name == "start_time":
if not is_ascending_series(column.data, nelems):
unsorted_cols.append(column.name)
if unsorted_cols:
return InspectorMessage(
message=(
f"{unsorted_cols} are time columns but the values are not in ascending order. "
"All times should be in seconds with respect to the session start time."
)
)
return None

To avoid reading them twice.

I think either way is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants