Skip to content
This repository was archived by the owner on Jun 2, 2021. It is now read-only.

specify encoding when updating the dataset#6

Open
n0542344 wants to merge 1 commit intoCovid19R:masterfrom
n0542344:master
Open

specify encoding when updating the dataset#6
n0542344 wants to merge 1 commit intoCovid19R:masterfrom
n0542344:master

Conversation

@n0542344
Copy link
Copy Markdown

Dear Rami Krispin!

Thanks for your awesome coronavirus-package, it makes working with Covid-19-data in R very convenient!

Never the less I had an issue: I wasn't able to update the dataset on my machine (running a Debian Stable-based OS and R 3.5.2) because I was getting the following error:

invalid multibyte string at '<f0><8a><cb><fa>'

When looking into the update_dataset()-function (in the R/data_refresh.R-file) I realized that this will probably be due to the read.csv()-function. When running rio::import() on the same target (which uses data.table::fread() by default) this issue disappeared. I assumed that the error had to do with the encoding, which is why I specified the additional option fileEndocing = "UTF-8" inside the read.csv()-function, which solved the problem.

Since I assume that other people might also have that issue as well, I'm submitting this pull request with only this single line added.

If you have any questions don't hesitate to write me.

Best wishes, Alex (life science PhD-student from Vienna, Austria)


ps.: an excerpt of my utils::sessionInfo()-data looks as following:

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: PureOS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

@RamiKrispin
Copy link
Copy Markdown
Member

Hi @n0542344 ,

Thanks for the PR!

There was a parsing issue, I was trying to add the encoding as suggested on this PR but it did not work. I think that the main reason for this parsing issue was related to fact that I was reading/writing the csv files in some cases I used the write_csv and read_csv functions from the readr package and others I used the read.csv function. After I changed all read and write to use the readr package it solved the issue. Still testing it...

On a side note, this repo is old version of the coronavirus and it is not active (I probably should remove it). The main repo is here:
https://github.com/RamiKrispin/coronavirus

@RamiKrispin
Copy link
Copy Markdown
Member

I would recommend in the meanwhile to install the master branch:

https://github.com/RamiKrispin/coronavirus

That seems to be working:

x <- coronavirus::refresh_coronavirus_jhu()
Parsed with column specification:
cols(
  date = col_date(format = ""),
  province = col_character(),
  country = col_character(),
  lat = col_double(),
  long = col_double(),
  type = col_character(),
  cases = col_double()
)
Parsed with column specification:
cols(
  location = col_character(),
  location_code = col_character(),
  location_code_type = col_character()
)
> max(x$date)
[1] "2021-05-26"

@n0542344
Copy link
Copy Markdown
Author

n0542344 commented May 28, 2021 via email

@RamiKrispin
Copy link
Copy Markdown
Member

Yes, planning to archive this repo, thx!

@RamiKrispin
Copy link
Copy Markdown
Member

I pushed the changes (read.csv -> read_csv) to CRAN, please let me know if you have any issues to refresh the data.

https://cran.r-project.org/web/packages/coronavirus/index.html

@n0542344
Copy link
Copy Markdown
Author

n0542344 commented May 30, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants