janitor 2.2.0
Breaking changes
These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
-
A new
...argument was added torow_to_names(), preceding theremove_rowargument, as part of the newfind_header()functionality. If code previously usedremove_rowas an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other thanTRUEorFALSEtoremove_row, unexpected results may occur. -
Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).
excel_numeric_to_date()did not account for this error, and now it does. Dates returned fromexcel_numeric_to_date()that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will becomeas.POSIXct(NA). (#423, thanks @billdenney for fixing) -
A minor breaking change is that the time zone is now always set for
excel_numeric_to_date()andconvert_date(). The default timezone isSys.timezone(), previously it was an empty string (""). (#422, thanks @billdenney for fixing) -
get_dupes()results are now sorted first by descending order ofdupe_count, then alphabetically by sorting variables. (#493) -
There are several minor breaking changes resulting from enhancements to
adorn_ns():- The addition of the new argument
format_funcmeans that previous calls relying on,,,as shorthand to get to the...column selection argument may now require an extra comma. adorn_ns()now defaults to displaying numbers of >3 digits withbig.mark = ",", as part of the default value of the newformat_funcargument. E.g.,1234is now1,234.adorn_ns()no longer prints leading whitespace whenposition = "front"- this is not a visible change in the printed result and it would be rare that this affects any code.
- The addition of the new argument
-
When the first column of the data.frame input to
adorn_totals()is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).
New features
-
row_to_names()now has a new helper function,find_header()to help find the row that contains the names. It can be used by passingrow_number="find_header". See the documentation ofrow_to_names()andfind_header()for more examples. (fix #429) -
remove_empty()has a new argument,cutoffwhich allows rows or columns to be removed if at least thecutofffraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing) -
A new function
sas_numeric_to_date()has been added to convert SAS dates, times, and datetimes to R objects (fix #475, thanks to @billdenney for suggesting and implementing) -
A new function
single_value()has been added to ensure that only a single value or missing values are present in a vector (fix #428) -
A new function
get_one_to_one()has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix #291, @billdenney) -
adorn_Ns()contains a newformat_funcargument so that the user can format the Ns to their liking, e.g., changing thebig.markcharacter. (#444) -
clean_names()can now be called on database connection in a dbplyr code pipeline (#467)
Minor features
-
make_clean_names()(and thereforeclean_names()) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by areplaceargument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert"[mu]g"to"mg"when it would be more typically be converted to"ug"for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements. -
excel_numeric_to_date()now warns when times are converted toNAdue to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423). -
If a
tabyl()or similar data.frame is sorted (e.g., withdplyr::arrange()), then hasadorn_totals()and/oradorn_percentages()called on it, followed byadorn_ns(), the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407) -
clean_names()now supports all object types that have either names or dimnames (#481, @DanChaltiel). -
adorn_pct_formatting()uses the locale-dependent value ofdecimal.markas a decimal separator, e.g., in locales wheregetOption("OutDec")is,it will print percentages in the format"12,34%". This character can also be set manually withoptions(OutDec = ",").(#451). -
adorn_totals(where ="row")now preserves factor class and levels of the first column of the input data.frame (#494). -
make_clean_names()now allows for duplicate names to be returned by specifyingTRUEto the newallow_dupesargument (#495, @JasonAizkalns). -
Some warning messages now have classes so that they can be specifically suppressed with
suppressWarnings(..., class="the_class_to_suppress"). To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)
Bug fixes
-
adorn_percentages()was refactored for compatibility withdplyrpackage versions >= 1.1.0 (#490) -
When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a
tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing) -
tabyl()now succeeds when the second variable is named"n"(#445). -
adorn_ns()can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with...(#456). -
adorn_totals()on a one_way tabyl preserves thetabyl_typeattribute so that a subsequent call toadorn_pct_formatting()works correctly on one-way tabyls (#523).