Open
Description
It takes some time to copy a big file. If the copy process is terminated before finishing, the file is corrupted. Unfortunately, if a user is unaware of the file size, it would be hard to check the file integrity without reading it. However, reading a partially copied file will crash an R session in the latest release of fst
. Below is a simple reproducible example:
data <- data.frame(id = 1:1e8)
for (i in 1:5) {
cat(i, "\n")
data[[paste0("x", i)]] <- rnorm(1e8)
}
fst::write_fst(data, "~/data/fst-test.fst")
cd ~/data
cp fst-test.fst fst-test-1.fst
During the process, press Ctrl+Z
to suspend cp
process and kill it.
Now start an R session and reading the partially copied file will crash like the following:
> fst::read_fst("fst-test-1.fst")
Loading required namespace: data.table
*** caught segfault ***
address 0xffff80c10bb77458, cause 'memory not mapped'
Traceback:
1: .Call(`_fst_fstretrieve`, fileName, columnSelection, startRow, endRow, oldFormat)
2: fstretrieve(fileName, columns, from, to, old_format)
3: fst::read_fst("fst-test-1.fst")
My session info:
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.4 parallel_3.4.4 tools_3.4.4 Rcpp_0.12.17 fst_0.8.8