Description
The documentation for substr()
could be improved, as was discussed at the R Contribution Office Hour, 2024-03-14.
Notes from that meeting:
-
trying to get a regex that would find three consecutive letters in a row
-
was able to do with stringr package
-
tried with base::substr
- all but first elements of
start
andstop
are ignored, doesn't seem to be well documented
- all but first elements of
-
tried with substring
- works, but has different arguments
first
andlast
, which do seem to be vectorized
- works, but has different arguments
-
should base::substr have a warning when arguments are ignored?
-
maybe a lot of other base R functions behave like this
-
maybe warn "start and stop don't have the same length as x so only values up to the length of x will be used" or something like that.
-
-
should it be documented better?
-
maybe should point in direction of base::substring instead of substr as more appropriate here
-
maybe need clearer documentation about when to use substr vs substring
-
-
it is documented that substr returns value the same length of x. So if repeat x, start and stop are recycled/vectorized.
-
could add example where some elements are ignored.
-
is documented in stringr: https://stringr.tidyverse.org/articles/from-base.html#str_sub-extract-and-replace-substrings-from-a-character-vector
-
Reading substring() source code, it repeats all the arguments to the same length and then calls substr.
function (text, first, last = 1000000L) { if (!is.character(text)) text <- as.character(text) n <- max(lt <- length(text), length(first), length(last)) if (lt && lt < n) text <- rep_len(text, length.out = n) .Internal(substr(text, as.integer(first), as.integer(last))) }
Maybe add something like this to substr:
n <- length(x) if (n < length(start) || n < length(stop)) { warning(sprintf(ngettext(n, "start or stop are longer than x. Only the first %i value will be used.", "start or stop are longer than x. Only the first %i values will be used."), n)) }
ACTION
- propose improvement to documentation
- suggest adding a warning (could be opened as separate bug report)