release janitor 2.2.0 🎉

sfirke · sfirke · commit 11079f03d1be · 2023-02-01T11:26:21.000-05:00
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: janitor
 Title: Simple Tools for Examining and Cleaning Dirty Data
-Version: 2.1.0.9000
+Version: 2.2.0
 Authors@R: c(person("Sam", "Firke", email = "samuel.firke@gmail.com", role = c("aut", "cre")),
     person("Bill", "Denney", email = "wdenney@humanpredictions.com", role = "ctb"),
     person("Chris", "Haid", email = "chrishaid@gmail.com", role = "ctb"),
@@ -9,14 +9,12 @@ Authors@R: c(person("Sam", "Firke", email = "samuel.firke@gmail.com", role = c("
     person("Jonathan", "Zadra", email = "jonathan.zadra@sorensonimpact.com", role = "ctb"))
 Description: The main janitor functions can: perfectly format data.frame column
     names; provide quick counts of variable combinations (i.e., frequency
-    tables and crosstabs); and isolate duplicate records. Other janitor functions
+    tables and crosstabs); and explore duplicate records. Other janitor functions
     nicely format the tabulation results. These tabulate-and-report functions
     approximate popular features of SPSS and Microsoft Excel. This package
     follows the principles of the "tidyverse" and works well with the pipe function
     %>%. janitor was built with beginning-to-intermediate R users in mind and is
-    optimized for user-friendliness. Advanced R users can already do everything
-    covered here, but with janitor they can do it faster and save their thinking for
-    the fun stuff.
+    optimized for user-friendliness.
 URL: https://github.com/sfirke/janitor,
     https://sfirke.github.io/janitor/
 BugReports: https://github.com/sfirke/janitor/issues
diff --git a/NEWS.md b/NEWS.md
@@ -1,27 +1,27 @@
-# janitor 2.1.0.9000 (unreleased, under development)
+# janitor 2.2.0 (2023-02-01)
 
 ## Breaking changes
 
+These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
+
 * A new `...` argument was added to `row_to_names()`, preceding the `remove_row` argument, as part of the new `find_header()` functionality.  If code previously used `remove_row` as an unnamed argument, it will now error.  If code previously used the unsupported behavior of passing anything other than `TRUE` or `FALSE` to `remove_row`, unexpected results may occur.
 
 * Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).  `excel_numeric_to_date()` did not account for this error, and now it does.  Dates returned from `excel_numeric_to_date()` that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become `as.POSIXct(NA)`.  (#423, thanks **@billdenney** for fixing)
 
 * A minor breaking change is that the time zone is now always set for `excel_numeric_to_date()` and `convert_date()`.  The default timezone is `Sys.timezone()`, previously it was an empty string (`""`). (#422, thanks **@billdenney** for fixing)
 
-* A minor breaking change affects `make_clean_names()` (and therefore `clean_names()`). `make_clean_names()` now uses the `unique_sep` argument from `snakecase::to_any_case()` to handle de-duplication of names. The incremental suffix counter is now one less than in the past (i.e., a de-duplicated variable with a suffix of `_2` becomes `_1`, `_3` becomes `_2`, etc.). This change also results in a new feature and the ability to allow for duplicate names by setting `unique_sep = NULL`. (#495, thanks **@JasonAizkalns** for fixing and **@billdenney** and **@sfirke** for the guidance)
-
 * `get_dupes()` results are now sorted first by descending order of `dupe_count`, then alphabetically by sorting variables. (#493)
 
 * There are several minor breaking changes resulting from enhancements to `adorn_ns()`:
-  * The addition of the new argument `format_func` means that previous calls relying on `,,,` as shorthand to get to the `...` column selection argument may now require an extra comma
+  * The addition of the new argument `format_func` means that previous calls relying on `,,,` as shorthand to get to the `...` column selection argument may now require an extra comma.
   * `adorn_ns()` now defaults to displaying numbers of >3 digits with `big.mark = ","`, as part of the default value of the new `format_func` argument.  E.g., `1234`  is now `1,234`.
   * `adorn_ns()` no longer prints leading whitespace when `position = "front"` - this is not a visible change in the printed result and it would be rare that this affects any code.
 
 * When the first column of the data.frame input to `adorn_totals()` is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).
 
 ## New features
 
-* `row_to_names()` now has a new helper function, `find_header()` to help find the row that contains the names.  It can be used by passing `row_number="find_header"`, and see the documentation of `row_to_names()` and `find_header()` for more examples. (fix #429)
+* `row_to_names()` now has a new helper function, `find_header()` to help find the row that contains the names.  It can be used by passing `row_number="find_header"`.  See the documentation of `row_to_names()` and `find_header()` for more examples. (fix #429)
 
 * `remove_empty()` has a new argument, `cutoff` which allows rows or columns to be removed if at least the `cutoff` fraction of the data are missing.  (fix #446, thanks to **@jzadra** for suggesting the feature and **@billdenney** for fixing)
 
@@ -37,13 +37,10 @@
 
 ## Minor features
 
-* Some warning messages now have classes so that they can be specifically suppressed with suppressWarnings(..., class="the_class_to_suppress").  To find the class of a warning you typically must look at the code where the error is occurring.  (#452, thanks to **@mgacc0** for suggesting and **@billdenney** for fixing)
-
-* `make_clean_names()` (and therefore `clean_names()`) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a `replace` argument value.  (#448, thanks **@IndrajeetPatil** for reporting and **@billdenney** for fixing)  The rationale is that standard transliteration would convert "[mu]g" to "mg" when it would be more typically be converted to "ug" for use as a unit.  A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.
+* `make_clean_names()` (and therefore `clean_names()`) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a `replace` argument value.  (#448, thanks **@IndrajeetPatil** for reporting and **@billdenney** for fixing)  The rationale is that standard transliteration would convert `"[mu]g"` to `"mg"` when it would be more typically be converted to `"ug"` for use as a unit.  A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.
 
 * `excel_numeric_to_date()` now warns when times are converted to `NA` due to hours that do not exist because of daylight savings time (fix #420, thanks **@Geomorph2** for reporting and **@billdenney** for fixing).  It also warns when inputs are not positive, since Excel only supports values down to 1 (#423).
 
-
 * If a `tabyl()` or similar data.frame is sorted (e.g., with `dplyr::arrange()`), then has `adorn_totals()` and/or `adorn_percentages()` called on it, followed by `adorn_ns()`, the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407)
 
 * `clean_names()` now supports all object types that have either names or dimnames (#481, @DanChaltiel).
@@ -54,9 +51,11 @@
 
 * `make_clean_names()` now allows for duplicate names to be returned by specifying `TRUE` to the new `allow_dupes` argument (#495, @JasonAizkalns).
 
+* Some warning messages now have classes so that they can be specifically suppressed with `suppressWarnings(..., class="the_class_to_suppress")`.  To find the class of a warning you typically must look at the code where the error is occurring.  (#452, thanks to **@mgacc0** for suggesting and **@billdenney** for fixing)
+
 ## Bug fixes
 
-* `adorn_percentages()` was refactored for compatibility with `dplyr` package versions > 1.0.99 (#490)
+* `adorn_percentages()` was refactored for compatibility with `dplyr` package versions >= 1.1.0 (#490)
 
 * When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a `tabyl`, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks **@daaronr** for reporting and **@mattroumaya** for fixing)
 
diff --git a/cran-comments.md b/cran-comments.md
@@ -1,26 +1,30 @@
 # Submission
-2020-12-28
+2023-02-01
 
 ## Submission summary
-An accumulation of small enhancements and bug fixes.  No breaking changes.
+An accumulation of enhancements and bug fixes.  Breaking changes only for edge cases.
+
+Notably, this fixes the current test failures on CRAN for this package, resulting from
+changes introduced in the latest version of the dplyr package.
 
 ### Test environments
 
 #### Windows
-* Windows 10 with R-release 4.0.3 (local)
-* Windows 10 with R Under development (unstable) (2020-12-19 r79650) via win-builder, checked 2020-12-28
+* Windows 10 with R-release 4.2.2 (local)
+* Windows Server 2022 x64 (build 20348) with R Under development (unstable) (2023-01-31 r83741 ucrt) via win-builder, checked 2023-02-01
 
 #### Linux
-* ubuntu 20.04 R-release 4.0.3 (Github CI)
-* ubuntu 20.04 R-devel R Under development (unstable) (2020-12-28) (Github CI)
+* ubuntu 22.04 R-release 4.2.2 (Github CI)
+* ubuntu 22.04 R-devel R Under development (unstable) (2023-02-01) (Github CI)
+* ubuntu 22.04 R-oldrel 4.1.3 (Github CI)
 
 #### Mac
-* Mac OS with R-release (Github CI)
+* Mac OS 12.6.2 with R-release 4.2.2 (Github CI)
 
 ### R CMD check results
 0 errors | 0 warnings | 0 notes
 
 ### Downstream dependencies
 This does not negatively affect downstream dependencies.
 
-I ran a revdepcheck, it succeeded for 30 packages and I manually investigated the others to verify that errors were the result of time-outs and that the janitor changes do not affect those packages.
+revdepcheck passed for 101 of 101 packages (98 from CRAN, 3 from bioconductor).