Releases: easystats/datawizard
datawizard 0.12.0
BREAKING CHANGES
-
The argument
include_naindata_tabulate()anddata_summary()has been
renamed intoremove_na. Consequently, to mimic former behaviour,FALSEand
TRUEneed to be switched (i.e.remove_na = TRUEis equivalent to the former
include_na = FALSE). -
Class names for objects returned by
data_tabulate()have been changed to
datawizard_tableanddatawizard_crosstable(resp. the plural forms,
*_tables), to provide a clearer and more consistent naming scheme.
CHANGES
-
data_select()can directly rename selected variables when a named vector
is provided inselect, e.g.data_select(mtcars, c(new1 = "mpg", new2 = "cyl")). -
data_tabulate()gains anas.data.frame()method, to return the frequency
table as a data frame. The structure of the returned object is a nested data
frame, where the first column contains name of the variable for which
frequencies were calculated, and the second column contains the frequency table. -
demean()(anddegroup()) now also work for cross-classified designs, or
more generally, for data with multiple grouping or cluster variables (i.e.
bycan now specify more than one variable).
datawizard 0.11.0
BREAKING CHANGES
-
Arguments named
grouporgroup_byare deprecated and will be removed
in a future release. Please usebyinstead. This affects the following
functions in datawizard (#502).data_partition()demean()anddegroup()means_by_group()rescale_weights()
-
Following aliases are deprecated and will be removed in a future release (#504):
get_columns(), usedata_select()instead.data_find()andfind_columns(), useextract_column_names()instead.format_text(), usetext_format()instead.
CHANGES
-
recode_into()is more relaxed regarding checking the type ofNAvalues.
If you recode into a numeric variable, and one of the recode values isNA,
you no longer need to useNA_real_for numericNAvalues. -
Improved documentation for some functions.
BUG FIXES
data_to_long()did not work for data frame where columns had attributes
(like labelled data).
datawizard 0.10.0
BREAKING CHANGES
-
The following arguments were deprecated in 0.5.0 and are now removed:
- in
data_to_wide():colnames_from,rows_from,sep - in
data_to_long():colnames_to - in
data_partition():training_proportion
- in
NEW FUNCTIONS
-
data_summary(), to compute summary statistics of (grouped) data frames. -
data_replicate(), to expand a data frame by replicating rows based on another
variable that contains the counts of replications per row.
CHANGES
-
data_modify()gets three new arguments,.at,.ifand.modify, to modify
variables at specific positions or based on logical conditions. -
data_tabulate()was revised and gets several new arguments: aweights
argument, to compute weighted frequency tables.include_naallows to include
or omit missing values from the table. Furthermore, abyargument was added,
to compute crosstables (#479, #481).
0.9.1
datawizard 0.9.1
CHANGES
-
rescale()gainsmultiplyandaddarguments, to expand ranges by a given
factor or value. -
to_factor()andto_numeric()now support classhaven_labelled.
BUG FIXES
-
to_numeric()now correctly deals with inversed factor levels when
preserve_levels = TRUE. -
to_numeric()inversed order of value labels whendummy_factors = FALSE. -
convert_to_na()now preserves attributes for factors whendrop_levels = TRUE.
datawizard 0.9.0
NEW FUNCTIONS
-
row_means(), to compute row means, optionally only for the rows with at
leastmin_validnon-missing values. -
contr.deviation()for sum-deviation contrast coding of factors. -
means_by_group(), to compute mean values of variables, grouped by levels
of specified factors. -
data_seek(), to seek for variables in a data frame, based on their
column names, variables labels, value labels or factor levels. Searching for
labels only works for "labelled" data, i.e. when variables have alabelor
labelsattribute.
CHANGES
-
recode_into()gains anoverwriteargument to skip overwriting already
recoded cases when multiple recode patterns apply to the same case. -
recode_into()gains anpreserve_naargument to preserveNAvalues
when recoding. -
data_read()now passes theencodingargument todata.table::fread().
This allows to read files with non-ASCII characters. -
datawizardmoves from the GPL-3 license to the MIT license. -
unnormalize()andunstandardize()now work with grouped data (#415). -
unnormalize()now errors instead of emitting a warning if it doesn't have the
necessary info (#415).
BUG FIXES
-
Fixed issue in
labels_to_levels()when values of labels were not in sorted
order and values were not sequentially numbered. -
Fixed issues in
data_write()when writing labelled data into SPSS format
and vectors were of different type as value labels. -
Fixed issue in
recode_into()with probably wrong case number printed in the
warning when several recode patterns match to one case. -
Fixed issue in
recode_into()when original data containedNAvalues and
NAwas not included in the recode pattern. -
Fixed issue in
data_filter()where functions containing a=(e.g. when
naming arguments, likegrepl(pattern, x = a)) were mistakenly seen as
faulty syntax. -
Fixed issue in
empty_column()for strings with invalid multibyte strings.
For such data frames or files,empty_column()ordata_read()no longer
fails.
datawizard 0.8.0
BREAKING CHANGES
-
The following re-exported functions from
{insight}have now been removed:
object_has_names(),object_has_rownames(),is_empty_object(),
compact_list(),compact_character(). -
Argument
na.rmwas renamed toremove_nathroughout{datawizard}functions.
na.rmis kept for backward compatibility, but will be deprecated and later
removed in future updates. -
The way expressions are defined in
data_filter()was revised. Thefilter
argument was replaced by..., allowing to separate multiple expression with
a comma (which are then combined with&). Furthermore, expressions can now also be
defined as strings, or be provided as character vectors, to allow string-friendly
programming.
CHANGES
-
Weighted-functions (
weighted_sd(),weighted_mean(), ...) gain aremove_na
argument, to remove or keep missing and infinite values. By default,
remove_na = TRUE, i.e. missing and infinite values are removed by default. -
reverse_scale(),normalize()andrescale()gain anappendargument
(similar to other data frame methods of transformation functions), to append
recoded variables to the input data frame instead of overwriting existing
variables.
NEW FUNCTIONS
-
rowid_as_column()to complementrownames_as_column()(and to mimic
tibble::rowid_to_column()). Note that its behavior is different from
tibble::rowid_to_column()for grouped data. See the Details section in the
docs. -
data_unite(), to merge values of multiple variables into one new variable. -
data_separate(), as counterpart todata_unite(), to separate a single
variable into multiple new variables. -
data_modify(), to create new variables, or modify or remove existing
variables in a data frame.
MINOR CHANGES
-
to_numeric()for variables of typeDate,POSIXctandPOSIXltnow
includes the class name in the warning message. -
Added a
print()method forcenter(),standardize(),normalize()and
rescale().
BUG FIXES
-
standardize_parameters()now works when the package namespace is in the model
formula (#401). -
data_merge()no longer yields a warning fortibbleswhenjoin = "bind". -
center()andstandardize()did not work for grouped data frames (of class
grouped_df) whenforce = TRUE. -
The
data.framemethod ofdescribe_distribution()returnsNULLinstead of
an error if no valid variable were passed (for example a factor variable with
include_factors = FALSE) (#421).
datawizard 0.7.1
BREAKING CHANGES
add_labs()was renamed intoassign_labels(). Sinceadd_labs()existed
only for a few days, there will be no alias for backwards compatibility.
NEW FUNCTIONS
labels_to_levels(), to use value labels of factors as their levels.
MINOR CHANGES
data_read()now checks if the imported object actually is a data frame (or
coercible to a data frame), and if not, no longer errors, but gives an
informative warning of the type of object that was imported.
BUG FIXES
- Fix test for CRAN check on Mac OS arm64
datawizard 0.7.0
BREAKING CHANGES
-
In selection patterns, expressions like
-var1:var3to exclude all variables
betweenvar1andvar3are no longer accepted. The correct expression is
-(var1:var3). This is for 2 reasons:- to be consistent with the behavior for numerics (
-1:2is not accepted but
-(1:2)is); - to be consistent with
dplyr::select(), which throws a warning and only
uses the first variable in the first expression.
- to be consistent with the behavior for numerics (
NEW FUNCTIONS
-
recode_into(), similar todplyr::case_when(), to recode values from one
or more variables into a new variable. -
mean_sd()andmedian_mad()for summarizing vectors to their mean (or
median) and a range of one SD (or MAD) above and below. -
data_write()as counterpart todata_read(), to write data frames into
CSV, SPSS, SAS, Stata files and many other file types. One advantage over
existing functions to write data in other packages is that labelled (numeric)
data can be converted into factors (with values labels used as factor levels)
even for text formats like CSV and similar. This allows exporting "labelled"
data into those file formats, too. -
add_labs(), to manually add value and variable labels as attributes to
variables. These attributes are stored as"label"and"labels"attributes,
similar to thelabelledclass from the haven package.
MINOR CHANGES
data_rename()gets averboseargument.winsorize()now errors if the threshold is incorrect (previously, it provided
a warning and returned the unchanged data). The argumentverboseis now
useless but is kept for backward compatibility. The documentation now contains
details about the valid values forthreshold(#357).- In all functions that have arguments
selectand/orexclude, there is now
one warning per misspelled variable. The previous behavior was to have only one
warning. - Fixed inconsistent behaviour in
standardize()when only one of the arguments
centerorscalewere provided (#365). unstandardize()andreplace_nan_inf()now work with select helpers (#376).- Added informative warning and error messages to
reverse(). Furthermore, the
docs now describe therangeargument more clearly (#380). unnormalize()errors with unexpected inputs (#383).
BUG FIXES
empty_columns()(and thereforeremove_empty_columns()) now correctly detects
columns containing onlyNA_character_(#349).- Select helpers now work in custom functions when argument is called
select
(#356). - Fix unexpected warning in
convert_na_to()whenselectis a list (#352). - Fixed issue with correct labelling of numeric variables with more than nine
unique values and associated value labels.
datawizard 0.6.5
MAJOR CHANGES
- Etienne Bacher is the new maintainer.
MINOR CHANGES
-
standardize(),center(),normalize()andrescale()can be used in
model formulas, similar tobase::scale(). -
data_codebook()now includes the proportion for each category/value, in
addition to the counts. Furthermore, if data contains taggedNAvalues,
these are included in the frequency table.
BUG FIXES
-
center(x)now works correctly whenxis a single value and either
referenceorcenteris specified (#324). -
Fixed issue in
data_codebook(), which failed for labelled vectors when
values of labels were not in sorted order.
datawizard 0.6.4
NEW FUNCTIONS
-
data_codebook(): to generate codebooks of data frames. -
New functions to deal with duplicates:
data_duplicated()(keep all duplicates,
including the first occurrence) anddata_unique()(returns the data, excluding
all duplicates except one instance of each, based on the selected method).
MINOR CHANGES
-
.data.framemethods should now preserve custom attributes. -
The
include_boundsargument innormalize()can now also be a numeric
value, defining the limit to the upper and lower bound (i.e. the distance
to 1 and 0). -
data_filter()now works with grouped data.
BUG FIXES
-
data_read()no longer prints message for empty columns when the data
actually had no empty columns. -
data_to_wide()now drops columns that are not inid_cols(if specified),
names_from, orvalues_from. This is the behaviour observed intidyr::pivot_wider().