get_description() and parse_description() assume native encoding

https://github.com/datacamp/r-package-parser/blob/cb48a0368626a6f2d3ce66020e7a270d2775e2d4/R/processing.R#L36

`parse()`ing the text from Authors@R does not work if that field contains non-ASCII characters and the DESCRIPTION file is not in the native encoding of the system processing the package (UTF-8). Typical examples are "latin1" packages with accented characters in author names, e.g.:

    res <- process_package("https://cran.r-project.org/src/contrib/flexrsurv_1.4.1.tar.gz", "flexrsurv", "cran")

Proper handling of package descriptions is provided by the [**desc**](https://CRAN.R-project.org/package=desc) package. However, a simple fix to just support packages in latin1 encoding in addition to UTF-8 is to mark the `Encoding()` in `get_description()` as in `utils:::.read_description()`:

```
get_description <- function(pkg_folder) {
  desc_path <- file.path(pkg_folder, "DESCRIPTION")
  out <- read.dcf(desc_path)[1, ]
  if (identical(out[["Encoding"]], "latin1")) {
    Encoding(out) <- "latin1"
  }
  as.list(out)
}
```

This might fix https://github.com/datacamp/RDocumentation-app/issues/386.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_description() and parse_description() assume native encoding #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

get_description() and parse_description() assume native encoding #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions