You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+7-4
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,12 @@
8
8
9
9
*`num_vars()` (and thus also `cat_vars()` and `collap()`) were changed to a simpler C-definition of numeric data types which is more in-line with `is.numeric()`: `is_numeric_C <- function(x) typeof(x) %in% c("integer", "double") && !inherits(x, c("factor", "Date", "POSIXct", "yearmon", "yearqtr"))`. The previous definition was: `is_numeric_C_old <- function(x) typeof(x) %in% c("integer", "double") && (!is.object(x) || inherits(x, c("ts", "units", "integer64")))`. Thus, the definition changed from including only certain classes to excluding the most important classes. Thanks @maouw for flagging this (#727).
10
10
11
-
* New improved quantile algorithm in `fquantile()` and `fnth()` (see below) does not support zero weights anymore, i.e. the code runs through, but elements with zero weights are no longer ignored by the algorithm. Thus is because the new algorithm makes it difficult to skip zero weight elements 'on the fly'.
12
-
13
11
### Bug Fixes
14
12
15
-
* Fixed some issues using *collapse* and the *tidyverse* together, particularly regarding tidyverse methods for 'grouped_df'.
13
+
* Fixed some issues using *collapse* and the *tidyverse* together, particularly regarding tidyverse methods for 'grouped_df' - thanks @NicChr (#645).
14
+
15
+
* More consistent handling of zero-length inputs - they are now also returned in `fmean()` and `fmedian()`/`fnth()` instead of returning `NA` (#628).
* The weighted quantile algorithm in `fquantile()` was changed and now uses a more theoretically sound method following [excellent notes](https://htmlpreview.github.io/?https://github.com/mjskay/uncertainty-examples/blob/master/weighted-quantiles.html) by [Matthew Kay](https://github.com/mjskay). It now also supports quantile type 4, but it does not support zero weights anymore (see above). *Note* that the existing *collapse* algorithm [already had very goood](https://github.com/mjskay/uncertainty-examples/issues/2) properties after a bug fix in v2.0.17, but the new algorithm is more theoretically sound and also faster.
41
+
* The weighted quantile algorithm in `fquantile()`/`fnth()` was improved to a more theoretically sound method following [excellent notes](https://htmlpreview.github.io/?https://github.com/mjskay/uncertainty-examples/blob/master/weighted-quantiles.html) by [Matthew Kay](https://github.com/mjskay). It now also supports quantile type 4, but it does not skip zero weights anymore, as the new algorithm makes it difficult to skip them 'on the fly'. *Note* that the existing *collapse* algorithm [already had very good](https://github.com/mjskay/uncertainty-examples/issues/2) properties after a bug fix in v2.0.17, but the new algorithm is more exact and also faster.
42
+
43
+
* The *collapse*[**arXiv article**](https://arxiv.org/abs/2403.05038) has been updated and significantly enhanced. It is an excellent resource to get an overview of the package.
@@ -80,7 +80,7 @@ In addition there are several [vignettes](<https://sebkrantz.github.io/collapse/
80
80
81
81
### Article on arXiv
82
82
83
-
An [**article**](https://arxiv.org/abs/2403.05038) on *collapse*has been submitted to the [Journal of Statistical Software](https://www.jstatsoft.org/) in March 2024.
83
+
An [**article**](https://arxiv.org/abs/2403.05038) on *collapse*was submitted to the [Journal of Statistical Software](https://www.jstatsoft.org/) in March 2024 and updated/revised in February 2025.
84
84
85
85
### Presentation at [useR 2022](https://user2022.r-project.org)
@@ -104,12 +104,12 @@ For v1.9.0 \code{fnth} was completely rewritten in C and offers significantly en
104
104
If \code{n>1}, theresultisequivalent to (column-wise) \code{sort(x, partial=n)[n]}.Internally, \code{n} isconvertedtoaprobabilityusing \code{p= (n-1)/(NROW(x)-1)}, andthatprobabilityisappliedtothesetofnon-missingelementstofindthe \code{as.integer(p*(fnobs(x)-1))+1L}'th element (which corresponds to option \code{ties = "min"}). % Note that it is necessary to subtract and add 1 so that \code{n = 1} corresponds to \code{p = 0} and \code{n = NROW(x)} to \code{p = 1}. %So if \code{n > 1} is used in the presence of missing values, and the default \code{ties = "mean"} is enabled, the resulting element could be the average of two elements.
105
105
When using grouped computations with \code{n > 1}, \code{n} is transformed to a probability \code{p = (n-1)/(NROW(x)/ng-1)} (where \code{ng} contains the number of unique groups in \code{g}).
106
106
107
-
If weights are used and \code{ties = "q5"-"q9"}, weighted continuous quantile estimation is done as described in \code{\link{fquantile}}.
107
+
If weights are used and \code{ties = "q4"-"q9"}, weighted continuous quantile estimation is done as described in \code{\link{fquantile}}.
108
108
109
109
For \code{ties \%in\% c("mean", "min", "max")}, a target partial sum of weights \code{p*sum(w)} is calculated, and the weighted n'thelementistheelementksuchthatallelementssmallerthankhaveasumofweights \code{<=p*sum(w)}, andallelementslargerthankhaveasumofweights \code{<= (1-p)*sum(w)}.Ifthepartial-sumof weights (\code{p*sum(w)}) isreachedexactlyforsomeelementk, then (summingfromthelowerend) bothkandk+1wouldqualifyastheweightedn'th element. If the weight of element k+1 is zero, k, k+1 and k+2 would qualify... . If \code{n > 1}, k is chosen (consistent with the unweighted behavior). %(ensuring that \code{fnth(x, n)}) and \code{fnth(x, n, w = rep(1, NROW(x)))}, always provide the same outcome)
110
110
If \code{0 < n < 1}, the \code{ties} option regulates how to resolve such conflicts, yielding lower (\code{ties = "min"}: k), upper (\code{ties = "max"}: k+2) or average weighted (\code{ties = "mean"}: mean(k, k+1, k+2)) n'thelements.
\code{frange} is considerably more efficient than \code{\link{range}}, requiring only one pass through the data instead of two. For probabilities 0 and 1, \code{fquantile} internally calls \code{frange}.
43
43
44
-
Following \href{https://doi.org/10.2307/2684934}{Hyndman and Fan (1996)}, the quantile type-\eqn{i} quantile function of the sample \eqn{X} can be written as a weighted average of two order statistics:
44
+
Following \href{https://www.tandfonline.com/doi/abs/10.1080/00031305.1996.10473566}{Hyndman and Fan (1996)}, the quantile type-\eqn{i} quantile function of the sample \eqn{X} can be written as a weighted average of two order statistics:
@@ -57,7 +57,7 @@ We can then first find the largest value \eqn{l} such that the cumulative normal
57
57
For a more detailed exposition \href{https://htmlpreview.github.io/?https://github.com/mjskay/uncertainty-examples/blob/master/weighted-quantiles.html}{see these excellent notes} by Matthew Kay. See also the R implementation of weighted quantiles type 7 in the Examples below.
58
58
}
59
59
\note{
60
-
The new weighted quantile algorithm from v2.1.0 does not skip zero weights anymore as this is technically very difficult (it is not clear if \eqn{j} hits a zero weight element whether one should move forward or backward to find an alternative). Thus, all non-missing elements are considered and weights should be strictily positive.
60
+
The new weighted quantile algorithm from v2.1.0 does not skip zero weights anymore as this is technically very difficult (it is not clear if \eqn{j} hits a zero weight element whether one should move forward or backward to find an alternative). Thus, all non-missing elements are considered and weights should be strictly positive.
61
61
}
62
62
\value{
63
63
A vector of quantiles. If \code{names = TRUE}, \code{fquantile} generates names as \code{paste0(round(probs * 100, 1), "\%")} (in C).
\item{column}{(optional) nameforanextracolumntogenerateintheoutputindicatingwhichdatasetarecordcamefrom. \code{TRUE} callsthiscolumn \code{".join"} (inspiredbySTATA's '_merge' column). By default this column is generated as the last column, but, if \code{keep.col.order = FALSE}, it is placed after the 'on' columns. The column is a factor variable with levels corresponding to the dataset names (inferred from the input) or \code{"matched"} for matched records. Alternatively, it is possible to specify a list of 2, where the first element is the column name, and the second a length 3 (!) vector of levels e.g. \code{column = list("joined", c("x", "y", "x_y"))}, where \code{"x_y"} replaces \code{"matched"}. The column has an additional attribute \code{"on.cols"} giving the join columns corresponding to the factor levels. See Examples. }
0 commit comments