Bugfixes and improvements in Evaluation module#228
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #228 +/- ##
==========================================
+ Coverage 92.46% 92.57% +0.10%
==========================================
Files 13 13
Lines 2788 2814 +26
Branches 379 383 +4
==========================================
+ Hits 2578 2605 +27
- Misses 152 153 +1
+ Partials 58 56 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d4cb619 to
38e102e
Compare
remrama
left a comment
There was a problem hiding this comment.
Thank you for the cleanup! Looks great and I'm happy to see this module get sanitized, makes it much more approachable for future features. Nice bug catches as well 🫠
Oh and I figure the docstrings are being skipped for development purposes but will be unskipped prior to v0.8?
| | # | Issue | Status | Notes | | ||
| |---|-------|--------|-------| | ||
| | 6 | **Log transformation** missing | ⚠️ Deferred | Planned: `log_transform` param + Euser et al. (2008) back-transform. Separate PR. | | ||
| | 7 | **Individual discrepancy heatmap** (`indDiscr.R`) | ❌ | No equivalent; data available via `get_sleep_stats()` | |
There was a problem hiding this comment.
Separate PR that can go with ROC curves (for general plotting improvements to this module).
|
Oh, and RE: my separate |
38e102e to
d4cb619
Compare
|
All PR comments addressed! |
Fantastic, thanks @raphaelvallat. |
1. evaluation.py line ~341 — Removed the sentence "This corresponds to R's metricsType="sum" in the Menghini et al. 2021 pipeline." from the pooled parameter docstring. 2. evaluation.py — Renamed the n_sleeps property to n_sessions in EpochByEpochAgreement (making it consistent with SleepStatsAgreement.n_sessions), updated all 12 internal references. 3. evaluation.py line ~423 — Fixed column order in get_agreement_bystage docstring to match actual output: fbeta, npv, precision, recall, specificity, support (alphabetical). 4. evaluation.py line ~431 — Changed df.values.T → df.to_numpy().T in the scorer function. 5. evaluation.py line ~1007 — Added strict=True to the zip() call for parametric LoA computation. 6. evaluation.py line ~1342 — Replaced the one-liner comment on _unit() with a multi-line docstring explaining it handles all stats from sleep_statistics(). 7. tests/test_evaluation.py line 16 — Renamed N_NIGHTS → N_SESSIONS throughout the test file, and updated test_n_sleeps → test_n_sessions with ebe.n_sleeps → ebe.n_sessions.
b39ccb3 to
5a84f9c
Compare
Critical bugfixes and several improvements in Evaluation module.
I asked Claude to do a thorough comparison of the YASA Evaluation pipeline against the Menghini R pipeline.
@remrama please review the proposed bugfixes and the new
reportmethod to replaceget_table(). These are described inevaluation_review.md. I also renamed some of the column names/parameters (e.g. "parm" --> "param").Note that I've also started adding a validation of the main metrics against the Menghini pipeline using their sample dataset (not committed yet).
Once we have finalized the code change in this PR, I'll create in another PR a dedicated tutorial comparing the Menghini pipeline versus YASA.
Update: I'm just seeing your recent changes in the evaluation branch..! Let me know how you'd like to move forward. We can merge your branch/PR first.