Releases: easystats/datawizard
datawizard 0.6.3
MAJOR CHANGES
-
There is a new publication about the
{datawizard}package:
https://joss.theoj.org/papers/10.21105/joss.04684 -
Fixes failing tests due to changes in
R-devel. -
data_to_long()anddata_to_wide()have had significant performance
improvements, sometimes as high as a ten-fold speedup.
MINOR CHANGES
-
When column names are misspelled, most functions now suggest which existing
columns possibly could be meant. -
Miscellaneous performance gains.
-
convert_to_na()now requires argumentnato be of class 'Date' to convert
specific dates toNA. For example,convert_to_na(x, na = "2022-10-17")
must be changed toconvert_to_na(x, na = as.Date("2022-10-17")).
BUG FIXES
data_to_long()anddata_to_wide()now correctly keep thedateformat.
datawizard 0.6.2
BREAKING CHANGES
-
Methods for grouped data frames (
.grouped_df) no longer support
dplyr::group_by()for{dplyr}before version0.8.0. -
empty_columns()andremove_empty_columns()now also remove columns that
contain only empty characters. Likewise,empty_rows()and
remove_empty_rows()remove observations that completely have missing or
empty character values.
CHANGES
-
data_arrange()now works with data frames that were grouped using
data_group()(#274). -
data_read()gains aconvert_factorsargument, to turn off automatic
conversion from numeric variables into factors.
datawizard 0.6.1
- Updates tests for upcoming changes in the
{tidyselect}package (#267).
datawizard 0.6.0
BREAKING CHANGES
-
The minimum needed R version has been bumped to
3.6. -
Following deprecated functions have been removed:
data_cut(),data_recode(),data_shift(),data_reverse(),data_rescale(),
data_to_factor(),data_to_numeric() -
New
text_format()alias is introduced forformat_text(), latter of which
will be removed in the next release. -
New
recode_values()alias is introduced forchange_code(), latter of which
will be removed in the next release. -
data_merge()now errors if columns specified inbyare not in both datasets. -
Using negative values in arguments
selectandexcludenow removes the columns
from the selection/exclusion. The previous behavior was to start the
selection/exclusion from the end of the dataset, which was inconsistent with
the use of "-" with other selecting possibilities.
NEW FUNCTIONS
-
data_peek(): to peek at values and type of variables in a data frame. -
coef_var(): to compute the coefficient of variation.
CHANGES
-
data_filter()will give more informative messages on malformed syntax of
thefilterargument. -
It is now possible to use curly brackets to pass variable names to
data_filter(),
like the following example. See examples section in the documentation of
data_filter(). -
The
regexargument was added to functions that use select-helpers and did
not already have this argument. -
Select helpers
starts_with(),ends_with(), andcontains()now accept
several patterns, e.gstarts_with("Sep", "Petal"). -
Arguments
selectandexcludethat are present in most functions have been
improved to work in loops and in custom functions. For example, the following
code now works:
foo <- function(data) {
i <- "Sep"
find_columns(data, select = starts_with(i))
}
foo(iris)
for (i in c("Sepal", "Sp")) {
head(iris) |>
find_columns(select = starts_with(i)) |>
print()
}- There is now a vignette summarizing the various ways to select or exclude
variables in most{datawizard}functions.
datawizard 0.5.1
- Fixes tests for
{poorman}update
datawizard 0.5.0
MAJOR CHANGES
-
Following statistical transformation functions have been renamed to not have
data_*()prefix, since they do not work exclusively with data frames, but
are typically first of all used with vectors, and therefore had misleading
names:data_cut()->categorize()data_recode()->change_code()data_shift()->slide()data_reverse()->reverse()data_rescale()->rescale()data_to_factor()->to_factor()data_to_numeric()->to_numeric()
Note that these functions also have
.data.frame()methods and still work
for data frames as well. Former function names are still available as aliases,
but will be deprecated and removed in a future release. -
Bumps the needed minimum R version to
3.5. -
Removed deprecated function
data_findcols(). Please use its replacement,
data_find(). -
Removed alias
extract()fordata_extract()function since it collided with
tidyr::extract(). -
Argument
training_proportionindata_partition()is deprecated. Please use
proportionnow. -
Given his continued and significant contributions to the package, Etienne
Bacher (@etiennebacher) is now included as an author. -
unstandardise()now works forcenter(x) -
unnormalize()now works forchange_scale(x) -
reshape_wider()now follows more consistentlytidyr::pivot_wider()syntax.
Argumentscolnames_from,sep, androws_fromare deprecated and should be
replaced bynames_from,names_sep, andid_colsrespectively.
reshape_wider()also gains an argumentnames_glue(#182, #198). -
Similarly,
reshape_longer()now follows more consistently
tidyr::pivot_longer()syntax. Argumentcolnames_tois deprecated and
should be replaced bynames_to.reshape_longer()also gains new arguments:
names_prefix,names_sep,names_pattern, andvalues_drop_na(#189).
CHANGES
-
Some of the text formatting helpers (like
text_concatenate()) gain an
encloseargument, to wrap text elements with surrounding characters. -
winsorizenow accepts "raw" and "zscore" methods (in addition to
"percentile"). Additionally, whenrobustis set toTRUEtogether with
method = "zscore", winsorizes via the median and median absolute deviation
(MAD); else via the mean and standard deviation. (@rempsyc, #177, #49, #47). -
data_partition()now allows to create multiple partitions from the data,
returning multiple training and a remaining test set. -
Functions like
center(),normalize()orstandardize()no longer fail
when data contains infinite values (Inf).
NEW FUNCTIONS
row_to_colnames()andcolnames_to_row()to move a row to column names, and
column names to row (@etiennebacher, #169).
BUG FIXES
- Fixed wrong column names in
data_to_wide()(#173).
datawizard 0.4.1
BREAKING CHANGES
- Added the
standardize.default()method (moved from package effectsize),
to be consistent in that the default-method now is in the same package as the
generic.standardize.default()behaves exactly like in effectsize and
particularly works for regression model objects. effectsize now re-exports
standardize()from datawizard.
NEW FUNCTIONS
-
data_shift()to shift the value range of numeric variables. -
data_recode()to recode old into new values. -
data_to_factor()as counterpart todata_to_numeric(). -
data_tabulate()to create frequency tables of variables. -
data_read()to read (import) data files (from text, or foreign statistical
packages). -
unnormalize()as counterpart tonormalize(). This function only works for
variables that have been normalized withnormalize(). -
data_group()anddata_ungroup()to create grouped data frames, or to remove
the grouping information from grouped data frames.
CHANGES
-
data_find()was added as alias tofind_colums(), to have consistent
name patterns for the datawizard functions.data_findcols()will be
removed in a future update and usage is discouraged. -
The
selectargument (and thus, also theexcludeargument) now also
accepts functions testing for logical conditions, e.g.is.numeric()(or
is.numeric), or any user-defined function that selects the variables for
which the function returnsTRUE(like:foo <- function(x) mean(x) > 3). -
Arguments
selectandexcludenow allow the negation of select-helpers,
like-ends_with(""),-is.numericor-Sepal.Width:Petal.Length. -
Many functions now get a
.defaultmethod, to capture unsupported classes.
This now yields a message and returns the original input, and hence, the
.data.framemethods won't stop due to an error. -
The
filterargument indata_filter()can also be a numeric vector, to
indicate row indices of those rows that should be returned. -
convert_to_na()gets methods for variables of classlogicalandDate. -
convert_to_na()for factors (and data frames) gains adrop_levelsargument,
to drop unused levels that have been replaced byNA. -
data_to_numeric()gains two more arguments,preserve_levelsandlowest,
to give better control of conversion of factors.
BUG FIXES
- When logicals were passed to
center()orstandardize()andforce = TRUE,
these were not properly converted to numeric variables.
datawizard 0.4.0
MAJOR CHANGES
-
data_match()now returns filtered data by default. Old behavior (returning
rows indices) can be set by settingreturn_indices = TRUE. -
The following functions are now re-exported from
{insight}package:
object_has_names(),object_has_rownames(),is_empty_object(),
compact_list(),compact_character() -
data_findcols()will become deprecated in future updates. Please use the
new replacementsfind_columns()andget_columns(). -
The vignette Analysing Longitudinal or Panel Data has now moved to
parameters package.
NEW FUNCTIONS
-
To convert rownames to a column, and vice versa:
rownames_as_column()
andcolumn_as_rownames()(@etiennebacher, #80). -
find_columns()andget_columns()to find column names or retrieve
subsets of data frames, based on various select-methods (including
select-helpers). These function will supersededata_findcols()in the
future. -
data_filter()as complement fordata_match(), which works with logical
expressions for filtering rows of data frames. -
For computing weighted centrality measures and dispersion:
weighted_mean(),
weighted_median(),weighted_sd()andweighted_mad(). -
To replace
NAin vectors and dataframes:convert_na_to()(@etiennebacher, #111).
MINOR CHANGES
-
The
selectargument in several functions (likedata_remove(),
reshape_longer(), ordata_extract()) now allows the use of select-helpers
for selecting variables based on specific patterns. -
data_extract()gains new arguments to allow type-safe return values,
i.e. always return a vector or a data frame. Thus,data_extract()
can now be used to select multiple variables or pull a single variable
from data frames. -
data_match()gains amatchargument, to indicate with which logical
operation matching results should be combined. -
Improved support for labelled data for many functions, i.e. returned
data frame will preserve value and variable label attributes, where
possible and applicable. -
describe_distribution()now works with lists (@etiennebacher, #105). -
data_rename()doesn't usepatternanymore to rename the columns if
replacementis not provided (@etiennebacher, #103). -
data_rename()now adds a suffix to duplicated names inreplacement
(@etiennebacher, #103).
BUG FIXES
-
data_to_numeric()produced wrong results for factors when
dummy_factors = TRUEand factor contained missing values. -
data_match()produced wrong results when data contained missing values. -
Fixed CRAN check issues in
data_extract()when more than one variable
was extracted from a data frame.
datawizard 0.3.0
-
New functions:
-
To find or remove empty rows and columns in a data frame:
empty_rows(),
empty_columns(),remove_empty_rows(),remove_empty_columns(), and
remove_empty. -
To check for names:
object_has_names()andobject_has_rownames(). -
To rotate data frames:
data_rotate(). -
To reverse score variables:
data_reverse(). -
To merge/join multiple data frames:
data_merge()(or its alias
data_join()). -
To cut (recode) data into groups:
data_cut(). -
To replace specific values with
NAs:convert_to_na(). -
To replace
InfandNaNvalues withNAs:replace_nan_inf().
-
-
Arguments
cols,beforeandafterindata_relocate()can now also be
numeric values, indicating the position of the destination column.
datawizard 0.2.3
-
New functions:
-
to work with lists:
is_empty_object()andcompact_list() -
to work with strings:
compact_character()
-