Skip to content

Commit c8fc5af

Browse files
authored
Merge pull request #577 from SebKrantz/development
Development
2 parents 827b84f + 4530dcd commit c8fc5af

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+10468
-679
lines changed

.Rbuildignore

+5
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,8 @@ man/figures
2626
_cache$
2727
_snaps
2828
^CITATION\.cff$
29+
^\.DS_Store$
30+
^revdep$
31+
\.orig$
32+
33+

DESCRIPTION

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: collapse
22
Title: Advanced and Fast Data Transformation
33
Version: 2.0.14
4-
Date: 2024-05-01
4+
Date: 2024-05-19
55
Authors@R: c(
66
person("Sebastian", "Krantz", role = c("aut", "cre"),
77
email = "[email protected]",
@@ -28,7 +28,7 @@ Description: A C/C++ based package for advanced data transformation and
2828
(grouped, weighted) summary statistics, powerful tools to work with nested data,
2929
fast data object conversions, functions for memory efficient R programming, and
3030
helpers to effectively deal with variable labels, attributes, and missing data.
31-
It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf',
31+
It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units',
3232
'plm' (panel-series and data frames), and 'xts'/'zoo'.
3333
URL: https://sebkrantz.github.io/collapse/,
3434
https://github.com/SebKrantz/collapse,

NAMESPACE

+1
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,7 @@ importFrom("stats", "as.formula", "complete.cases", "cor", "cov", "var", "pt",
405405
export(fncol)
406406
export(fdim)
407407
export(as_numeric_factor)
408+
export(as_integer_factor)
408409
export(as_character_factor)
409410
export(as.numeric_factor)
410411
export(as.character_factor)

NEWS.md

+6
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# collapse 2.0.14
22

3+
* Updated '*collapse* and *sf*' vignette to reflect the recent support for *units* objects, and added a few more examples.
4+
5+
* Fixed a bug in `join()` where a full join silently became a left join if there are no matches between the tables (#574). Thanks @D3SL for reporting.
6+
37
* Added function `group_by_vars()`: A standard evaluation version of `fgroup_by()` that is slimmer and safer for programming, e.g. `data |> group_by_vars(ind1) |> collapg(custom = list(fmean = ind2, fsum = ind3))`. Or, using *magrittr*:
48
```r
59
library(magrittr)
@@ -15,6 +19,8 @@ data %>%
1519
}
1620
```
1721

22+
* Added function `as_integer_factor()` to turn factors/factor columns into integer vectors. `as_numeric_factor()` already exists, but is memory inefficient for most factors where levels can be integers.
23+
1824
* `join()` now internally checks if the rows of the joined datasets match exactly. This check, using `identical(m, seq_row(y))`, is inexpensive, but, if `TRUE`, saves a full subset and deep copy of `y`. Thus `join()` now inherits the intelligence already present in functions like `fsubset()`, `roworder()` and `funique()` - a key for efficient data manipulation is simply doing less.
1925

2026
* In `join()`, if `attr = TRUE`, the `count` option to `fmatch()` is always invoked, so that the attribute attached always has the same form, regardless of `verbose` or `validate` settings.

R/global_macros.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ get_collapse <- function(opts = NULL) if(is.null(opts)) as.list(.op) else if(len
105105
"%r-%", "%r*%", "%r/%", "%r+%", "%rr%", "add_stub", "add_vars",
106106
"add_vars<-", "all_funs", "all_identical", "all_obj_equal", "allNA",
107107
"alloc", "allv", "any_duplicated", "anyv", "as_character_factor",
108-
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as.character_factor",
108+
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as_integer_factor", "as.character_factor",
109109
"as.factor_GRP", "as.factor_qG", "as.numeric_factor", "atomic_elem",
110110
"atomic_elem<-", "av", "av<-", "B", "BY", "BY.data.frame", "BY.default",
111111
"BY.matrix", "cat_vars", "cat_vars<-", "char_vars", "char_vars<-",
@@ -177,7 +177,7 @@ get_collapse <- function(opts = NULL) if(is.null(opts)) as.list(.op) else if(len
177177
.COLLAPSE_ALL <- sort(unique(c("%-=%", "%!=%", "%!iin%", "%!in%", "%*=%", "%/=%", "%+=%", "%=%", "%==%", "%c-%", "%c*%", "%c/%", "%c+%",
178178
"%cr%", "%iin%", "%r-%", "%r*%", "%r/%", "%r+%", "%rr%", "add_stub", "add_vars", "add_vars<-", "all_funs",
179179
"all_identical", "all_obj_equal", "allNA", "alloc", "allv", "any_duplicated", "anyv", "as_character_factor",
180-
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "atomic_elem", "atomic_elem<-", "av", "av<-", "B", "BY",
180+
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as_integer_factor", "atomic_elem", "atomic_elem<-", "av", "av<-", "B", "BY",
181181
"cat_vars", "cat_vars<-", "char_vars", "char_vars<-", "cinv", "ckmatch", "collap", "collapg", "collapv", "colorder",
182182
"colorderv", "copyAttrib", "copyMostAttrib", "copyv", "D", "dapply", "date_vars", "Date_vars", "date_vars<-",
183183
"Date_vars<-", "descr", "Dlog", "fact_vars", "fact_vars<-", "fbetween", "fcompute", "fcomputev", "fcount",

R/small_helper.R

+10
Original file line numberDiff line numberDiff line change
@@ -501,6 +501,16 @@ as_numeric_factor <- function(X, keep.attr = TRUE) {
501501
res
502502
}
503503

504+
as_integer_factor <- function(X, keep.attr = TRUE) {
505+
if(is.atomic(X)) if(keep.attr) return(ffka(X, as.integer)) else
506+
return(as.integer(attr(X, "levels"))[X])
507+
res <- duplAttributes(lapply(unattrib(X),
508+
if(keep.attr) (function(y) if(is.factor(y)) ffka(y, as.integer) else y) else
509+
(function(y) if(is.factor(y)) as.integer(attr(y, "levels"))[y] else y)), X)
510+
if(inherits(X, "data.table")) return(alc(res))
511+
res
512+
}
513+
504514
as_character_factor <- function(X, keep.attr = TRUE) {
505515
if(is.atomic(X)) if(keep.attr) return(ffka(X, tochar)) else
506516
return(as.character.factor(X))

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
* To facilitate complex data transformation, exploration and computing tasks in R.
2222
* To help make R code fast, flexible, parsimonious and programmer friendly.
2323

24-
It further implements a [class-agnostic approach to R programming](https://sebkrantz.github.io/collapse/articles/collapse_object_handling.html), supporting base R, *tibble*, *grouped_df* (*tidyverse*), *data.table*, *sf*, *pseries*, *pdata.frame* (*plm*), and preserving many others (e.g. *units*, *xts*/*zoo*, *tsibble*).
24+
It further implements a [class-agnostic approach to R programming](https://sebkrantz.github.io/collapse/articles/collapse_object_handling.html), supporting base R, *tibble*, *grouped_df* (*tidyverse*), *data.table*, *sf*, *units*, *pseries*, *pdata.frame* (*plm*), and *xts*/*zoo*.
2525

2626
**Key Features:**
2727

_pkgdown.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,7 @@ articles:
210210
contents:
211211
- collapse_documentation
212212
- collapse_for_tidyverse_users
213+
- collapse_and_sf
213214
- collapse_object_handling
214215
- title: Legacy (Pre v1.7)
215216
desc: Vignettes that cover functionality of versions <1.7. These
@@ -219,5 +220,4 @@ articles:
219220
- collapse_and_dplyr
220221
- collapse_and_data.table
221222
- collapse_and_plm
222-
- collapse_and_sf
223223

man/collapse-documentation.Rd

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The following table fully summarizes the contents of \emph{\link{collapse}}. The
2626
\link[=fast-data-manipulation]{Fast Data Manipulation} \tab\tab Fast and flexible select, subset, summarise, mutate/transform, sort/reorder, combine, join, reshape, rename and relabel data. Some functions modify by reference and/or allow assignment. In addition a set of (standard evaluation) functions for fast selecting, replacing or adding data frame columns, including shortcuts to select and replace variables by data type.
2727
\tab\tab \code{\link[=fselect]{fselect(<-)}}, \code{\link[=fsubset]{fsubset/ss}}, \code{\link{fsummarise}}, \code{\link{fmutate}}, \code{\link{across}}, \code{\link[=ftransform]{(f/set)transform(v)(<-)}}, \code{\link[=fcompute]{fcompute(v)}}, \code{\link[=roworder]{roworder(v)}}, \code{\link[=colorder]{colorder(v)}}, \code{\link{rowbind}}, \code{\link{join}}, \code{\link{pivot}}, \code{\link[=frename]{(f/set)rename}}, \code{\link[=relabel]{(set)relabel}}, \code{\link[=get_vars]{get_vars(<-)}}, \code{\link[=add_vars]{add_vars(<-)}}, \code{\link[=num_vars]{num_vars(<-)}}, \code{\link[=cat_vars]{cat_vars(<-)}}, \code{\link[=char_vars]{char_vars(<-)}}, \code{\link[=fact_vars]{fact_vars(<-)}}, \code{\link[=logi_vars]{logi_vars(<-)}}, \code{\link[=date_vars]{date_vars(<-)}} \cr \cr \cr
2828
29-
\link[=quick-conversion]{Quick Data Conversion} \tab\tab Quick conversions: data.frame <> data.table <> tibble <> matrix (row- or column-wise) <> list | array > matrix, data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. \tab\tab \code{qDF}, \code{qDT}, \code{qTBL}, \code{qM}, \code{qF}, \code{mrtl}, \code{mctl}, \code{as_numeric_factor}, \code{as_character_factor} \cr \cr \cr
29+
\link[=quick-conversion]{Quick Data Conversion} \tab\tab Quick conversions: data.frame <> data.table <> tibble <> matrix (row- or column-wise) <> list | array > matrix, data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. \tab\tab \code{qDF}, \code{qDT}, \code{qTBL}, \code{qM}, \code{qF}, \code{mrtl}, \code{mctl}, \code{as_numeric_factor}, \code{as_integer_factor}, \code{as_character_factor} \cr \cr \cr
3030
3131
\link[=advanced-aggregation]{Advanced Data Aggregation} \tab\tab Fast and easy (weighted and parallelized) aggregation of multi-type data, with different functions applied to numeric and categorical variables. Custom specifications allow mappings of functions to variables + renaming. \tab\tab \code{collap(v/g)} \cr \cr \cr
3232

man/collapse-package.Rd

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Advanced and Fast Data Transformation
1212
\item To help make R code fast, flexible, parsimonious and programmer friendly. % \emph{collapse} is a fast %to facilitate (advanced) data manipulation in R % To achieve the latter,
1313
% collapse provides a broad set.. -> Nah, its not a misc package
1414
}
15-
It is made compatible with the \emph{tidyverse}, \emph{data.table}, \emph{sf} and the \emph{plm} approach to panel data, and non-destructively handles other classes such as \emph{xts}.
15+
It is made compatible with the \emph{tidyverse}, \emph{data.table}, \emph{sf}, \emph{units}, \emph{xts/zoo}, and the \emph{plm} approach to panel data.
1616

1717
}
1818
\section{Getting Started}{

man/quick-conversion.Rd

+6-3
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
\alias{mctl}
99
\alias{mrtl}
1010
\alias{as_numeric_factor}
11+
\alias{as_integer_factor}
1112
\alias{as_character_factor}
1213
%- Also NEED an '\alias' for EACH other topic documented here.
1314
\title{Quick Data Conversion}
@@ -18,7 +19,7 @@ Fast, flexible and precise conversion of common data objects, without method dis
1819
\item \code{qM} converts vectors, higher-dimensional arrays, data frames and suitable lists to matrix.
1920
\item \code{mctl} and \code{mrtl} column- or row-wise convert a matrix to list, data frame or \emph{data.table}. They are used internally by \code{qDF/qDT/qTBL}, \code{\link{dapply}}, \code{\link{BY}}, etc\dots
2021
\item \code{\link{qF}} converts atomic vectors to factor (documented on a separate page).
21-
\item \code{as_numeric_factor} and \code{as_character_factor} convert factors, or all factor columns in a data frame / list, to character or numeric (by converting the levels).
22+
\item \code{as_numeric_factor}, \code{as_integer_factor}, and \code{as_character_factor} convert factors, or all factor columns in a data frame / list, to character or numeric (by converting the levels).
2223
}
2324
}
2425
\usage{
@@ -37,12 +38,13 @@ mrtl(X, names = FALSE, return = "list")
3738
# Converting factors or factor columns
3839

3940
as_numeric_factor(X, keep.attr = TRUE)
41+
as_integer_factor(X, keep.attr = TRUE)
4042
as_character_factor(X, keep.attr = TRUE)
4143

4244
}
4345
%- maybe also 'usage' for other objects documented here.
4446
\arguments{
45-
\item{X}{a vector, factor, matrix, higher-dimensional array, data frame or list. \code{mctl} and \code{mrtl} only accept matrices, \code{as_numeric_factor} and \code{as_character_factor} only accept factors, data frames or lists.}
47+
\item{X}{a vector, factor, matrix, higher-dimensional array, data frame or list. \code{mctl} and \code{mrtl} only accept matrices, \code{as_numeric_factor}, \code{as_integer_factor} and \code{as_character_factor} only accept factors, data frames or lists.}
4648
\item{row.names.col}{can be used to add an column saving names or row.names when converting objects to data frame using \code{qDF/qDT/qTBL}. \code{TRUE} will add a column \code{"row.names"}, or you can supply a name e.g. \code{row.names.col = "variable"}. With \code{qM}, the argument has the opposite meaning, and can be used to select one or more columns in a data frame/list which will be used to create the rownames of the matrix e.g. \code{qM(iris, row.names.col = "Species")}. In this case the column(s) can be specified using names, indices, a logical vector or a selector function. See Examples.}
4749
\item{keep.attr}{logical. \code{FALSE} (default) yields a \emph{hard} / \emph{thorough} object conversion: All unnecessary attributes are removed from the object yielding a plain matrix / data.frame / \emph{data.table}. \code{FALSE} yields a \emph{soft} / \emph{minimal} object conversion: Only the attributes 'names', 'row.names', 'dim', 'dimnames' and 'levels' are modified in the conversion. Other attributes are preserved. See also \code{class}.}
4850
\item{class}{if a vector of classes is passed here, the converted object will be assigned these classes. If \code{NULL} is passed, the default classes are assigned: \code{qM} assigns no class, \code{qDF} a class \code{"data.frame"}, and \code{qDT} a class \code{c("data.table", "data.frame")}. If \code{keep.attr = TRUE} and \code{class = NULL} and the object already inherits the default classes, further inherited classes are preserved. See Details and the Example. }
@@ -77,7 +79,8 @@ The default \code{keep.attr = FALSE} ensures \emph{hard} conversions so that all
7779
\code{qM} - returns a matrix\cr
7880
\code{mctl}, \code{mrtl} - return a list, data frame or \emph{data.table} \cr
7981
\code{qF} - returns a factor\cr
80-
\code{as_numeric_factor} - returns X with factors converted to numeric variables\cr
82+
\code{as_numeric_factor} - returns X with factors converted to numeric (double) variables\cr
83+
\code{as_integer_factor} - returns X with factors converted to integer variables\cr
8184
\code{as_character_factor} - returns X with factors converted to character variables
8285
}
8386
% \note{

0 commit comments

Comments
 (0)