curate format-dates: Support lower/upper bounds on incomplete dates#1773
curate format-dates: Support lower/upper bounds on incomplete dates#1773victorlin wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## victorlin/update-format-dates #1773 +/- ##
=================================================================
- Coverage 73.63% 73.57% -0.06%
=================================================================
Files 81 81
Lines 8580 8663 +83
Branches 1750 1781 +31
=================================================================
+ Hits 6318 6374 +56
- Misses 1968 1984 +16
- Partials 294 305 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c5e9eda to
3390ab8
Compare
There was a problem hiding this comment.
This made me wonder how the command would handle the field if --target-date-field did not have a valid date, e.g. if the date was unknown. I pushed up a test file in 0634ea1 to show the current behavior, but unclear if the behavior is desired based comments on Slack.
There was a problem hiding this comment.
Thanks, empty target date values (XXXX-XX-XX after normalization) could end up being a common use case if there are reasonable min/max bounds such as (clade root date, collection date), so it's good to think about the scenarios. I've added your test file to this PR.
My thinking is that it would make sense to apply bounds to empty target date values only if min and max are both set (i.e. 620021c). If only one is set, the intended interval would be very broad as either (-∞,max] or [min,+∞) which, for downstream usage, doesn't seem that much different from leaving it as XXXX-XX-XX.
There was a problem hiding this comment.
My thinking is that it would make sense to apply bounds to empty target date values only if min and max are both set (i.e. 620021c). If only one is set, the intended interval would be very broad as either (-∞,max] or [min,+∞) which, for downstream usage, doesn't seem that much different from leaving it as XXXX-XX-XX.
Sounds reasonable to me. I briefly wondered if only min is defined if we should automatically set max to today, but probably better for user to explicitly set max to today as desired.
There was a problem hiding this comment.
+1 for making it an explicit setting. While AmbiguousDate and everything using it will impose the current date as a maximum, I'm not a fan of the behavior and see it as out of scope and prone to bugs such as #1171.
Preparing for the upcoming bounds feature which will be fundamentally different from the current behavior.
This is fundamentally different from date format normalization and is
done using a separate code path, but can still be beneficial to include
in the format-dates command because:
1. Applying date format normalization before applying bounds ensures
that any existing ambiguity is in a resolvable format. This is still
achievable with a separate command but it would be up to the user to
get the order right:
augur curate format-dates … |
augur curate restrict-dates …
2. With the existing functionality re-scoped to "normalizing date
formats", the new operation of "applying bounds" arguably falls
within the scope of date formatting.
The trade-off off with the current implementation is that more
complicated failure reporting logic is necessary to bridge the two code
paths.
Both are marked as optional when registering the arguments, however passing neither will effectively be a no-op which was previously not allowed when --date-fields was required.
3390ab8 to
2248e2a
Compare
victorlin
left a comment
There was a problem hiding this comment.
PR is ready for review, but I'm feeling unsure about the overall design. Starting a comment thread below, looking for opinions!
| @@ -1,15 +1,30 @@ | |||
| """ | |||
| Format date fields to ISO 8601 dates (YYYY-MM-DD). | |||
| Format date fields to ISO 8601-like formats. | |||
There was a problem hiding this comment.
The more I work on this feature, the more I feel like it should be implemented separately from the existing augur curate format-dates. In the commit message of 45dd11d I present an argument for keeping them together, but left a note at the end about the internal trade off. I'm also unsure about if the new scoping to differentiate the two functionalities now provided by augur curate format-dates is clear.
There was a problem hiding this comment.
Without having looked at the PR (yet), just this comment, I think I agree that this feels like a distinct operation from format-dates; one that is probably worthy of its own sub-augur curate command, something like augur curate apply-date-bounds.
There was a problem hiding this comment.
Agree that splitting this into a separate command seems clearer and a good separation of concerns, especially if we might want to add other bound options (just starting thinking that we might want something like --target-min-date "2019-12-01" to provide a specific min date rather than a date field).
The new command can always provide hint to run augur curate format-dates before applying bounds.
There was a problem hiding this comment.
Thanks both! I'll close this PR, re-scope the issue, and create a separate PR with the same functionality in a new command.
| original_date = get_numerical_date_from_value(record[args.target_date_field], fmt="%Y-%m-%d") | ||
|
|
||
| # Keep exact dates as-is | ||
| # Maybe worth adding a warning if the date is out of bounds? |
There was a problem hiding this comment.
Given the errors in the cases where an incomplete date is out-of-bounds, I would be surprised that an out-of-bounds complete date didn't provoke similar behavior. I think a warning is a minimum, and would ask, why not an error?
There was a problem hiding this comment.
+1 for error/warning if out of bounds even if the date is an exact date.
(Made me think that it would be useful to have a option such as --target-max-date today to warn about future dates in the metadata.)
There was a problem hiding this comment.
Good point. I'll update to error in the new PR.
|
Closing per #1773 (comment). Will create a separate PR with the same functionality in a new command. |
Description of proposed changes
See #1494 for design description
Related issue(s)
#1494
Checklist
augur curate format-dates#1816