Skip to content

Improve substr() documentation  #5

Open
@EllaKaye

Description

The documentation for substr() could be improved, as was discussed at the R Contribution Office Hour, 2024-03-14.

Notes from that meeting:

  • trying to get a regex that would find three consecutive letters in a row

  • was able to do with stringr package

  • tried with base::substr

    • all but first elements of start and stop are ignored, doesn't seem to be well documented
  • tried with substring

    • works, but has different arguments first and last, which do seem to be vectorized
  • should base::substr have a warning when arguments are ignored?

    • maybe a lot of other base R functions behave like this

    • maybe warn "start and stop don't have the same length as x so only values up to the length of x will be used" or something like that.

  • should it be documented better?

    • maybe should point in direction of base::substring instead of substr as more appropriate here

    • maybe need clearer documentation about when to use substr vs substring

  • it is documented that substr returns value the same length of x. So if repeat x, start and stop are recycled/vectorized.

  • could add example where some elements are ignored.

  • is documented in stringr: https://stringr.tidyverse.org/articles/from-base.html#str_sub-extract-and-replace-substrings-from-a-character-vector

  • Reading substring() source code, it repeats all the arguments to the same length and then calls substr.

    function (text, first, last = 1000000L)
    {
      if (!is.character(text))
        text <- as.character(text)
      n <- max(lt <- length(text), length(first), length(last))
      if (lt && lt < n)
        text <- rep_len(text, length.out = n)
      .Internal(substr(text, as.integer(first), as.integer(last)))
    }

    Maybe add something like this to substr:

    n <- length(x)
    if (n < length(start) || n < length(stop)) {
      warning(sprintf(ngettext(n,
                               "start or stop are longer than x. Only the first %i value will be used.",
                               "start or stop are longer than x. Only the first %i values will be used."),
                               n))
    }

ACTION

  • propose improvement to documentation
  • suggest adding a warning (could be opened as separate bug report)

Metadata

Labels

DocumentationIssues in the documentationImperial 2024Issues reserved for Imperial R Dev Day 2024needs patchImplement the agreed fix and prepare a patch for review

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions