Skip to content

Commit 6eeb3e3

Browse files
committed
Merge branch 'release' into 'master'
2 parents 4079b1f + 43b04d1 commit 6eeb3e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+10856
-6418
lines changed

.Rbuildignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,11 @@
44
\.dll$
55
\.a$
66
\.Rmd$
7+
LZ4/LICENSE$
78
\.md$
9+
^docs$
810
\.png$
911
\.yml$
1012
dataset\.fst$
11-
res - readme\.fst$
13+
^res - readme\.fst$
14+
^_pkgdown\.yml$

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@
1616
*.txt
1717
*.zip
1818
.Rproj.user
19+
*.TMP

.travis.yml

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,6 @@ os:
1111
- linux
1212
- osx
1313

14-
matrix:
15-
exclude:
16-
- r: release
17-
os: osx
18-
- r: devel
19-
os: osx
20-
2114
r_packages:
2215
- covr
2316
- lintr
@@ -26,6 +19,17 @@ r_packages:
2619
- testthat
2720
- data.table
2821

22+
matrix:
23+
exclude:
24+
- r: release
25+
os: osx
26+
- r: devel
27+
os: osx
28+
29+
addons:
30+
apt:
31+
update: true
32+
2933
after_success:
3034
- Rscript -e 'library(covr); codecov(quiet = FALSE)'
3135

DESCRIPTION

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ Description: Multithreaded serialization of compressed data frames using the
55
'fst' format. The 'fst' format allows for random access of stored data and
66
compression with the LZ4 and ZSTD compressors created by Yann Collet. The ZSTD
77
compression library is owned by Facebook Inc.
8-
Version: 0.8.4
9-
Date: 2018-01-25
8+
Version: 0.8.6
9+
Date: 2018-05-15
1010
Authors@R: c(
1111
person("Mark", "Klik", email = "[email protected]", role = c("aut", "cre", "cph")),
1212
person("Yann", "Collet", role = c("ctb", "cph"),
@@ -25,7 +25,8 @@ Suggests:
2525
bit64,
2626
data.table,
2727
lintr,
28-
nanotime
28+
nanotime,
29+
crayon
2930
License: AGPL-3 | file LICENSE
3031
Copyright: This package includes sources from the LZ4 library written
3132
by Yann Collet, sources of the ZSTD library owned by Facebook, Inc.

NAMESPACE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ export(write.fst)
2525
export(write_fst)
2626
importFrom(Rcpp,sourceCpp)
2727
importFrom(parallel,detectCores)
28+
importFrom(utils,capture.output)
2829
importFrom(utils,packageVersion)
2930
importFrom(utils,str)
31+
importFrom(utils,tail)
3032
useDynLib(fst, .registration = TRUE)

NEWS.md

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,35 @@
11

2-
**If you are viewing this file on CRAN, please check latest news on GitHub [here](https://github.com/fstpackage/fst/blob/develop/NEWS.md).**
2+
# fst 0.8.6
33

4-
### Changes in v0.8.4
4+
Version 0.8.6 of the `fst` package brings clearer printing of `fst_table` objects. It also includes optimizations for controlling the number of threads used by the package during reads and writes and after a fork has ended. The `LZ4` and `ZSTD` compression libraries are updated to their latest (and fastest) releases. UTF-8 encoded column names are now correctly stored in the `fst` format.
5+
6+
## New features
7+
8+
* More advanced printing generic of the `fst_table` reference object, showing column types, (possible) keys, and the table header and footer data (issue #131, thanks @renkun-ken for reporting and discussions).
9+
10+
* User has more control over the number of threads used by fst. Option 'fst_threads' can now be used to initialize the number of threads when the package is first loaded (issue #132, thanks to @karldw for the pull request).
11+
12+
* Option 'fst_restore_after_fork' can be used to select the threading behaviour after a fork has ended. Like the `data.table` package, `fst` switches back to a single thread when a fork is detected (using OpenMP in a fork can lead to problems). Unlike `data.table`, the `fst` package restores the number of threads to it's previous setting when the fork ends. If this leads to unexpected problems, the user can set the 'fst_restore_after_fork' option to FALSE to disable that.
13+
14+
## Bugs solved
15+
16+
* Character encoding of column names correctly stored in the `fst` format (issue #144, thanks @shrektan for reporting and discussions).
17+
18+
## Documentation
19+
20+
* Improved accuracy of fst_table documentation regarding random row access (issue #143, thanks @martinblostein for pointed out the unclarity)
21+
22+
* Improved documentation on background threads during `write_fst()` and `read_fst()` (issue #121, thanks @krlmlr for suggestions and discussion)
23+
24+
# fst 0.8.4
525

626
The v0.8.4 release brings a `data.frame` interface to the `fst` package. Column and row selection can now be done directly from the `[` operator. In addition, it fixes some issues and prepares the package for the next build toolchain of CRAN.
727

8-
#### New features
28+
## New features
929

1030
* A `data.frame` interface was added to the package. The user can create a reference object to a `fst` file with method `fst`. That reference can be used like a `data.frame` and will automatically make column- and row- selections in the referenced `fst` file.
1131

12-
#### Bugs solved
32+
## Bugs solved
1333

1434
* Build issues with the dev build of R have been fixed. In particular, `fst` now builds correctly with the Clang 6.0 toolchain which will be released by CRAN shortly (thanks @kevinushey for reporting the problem and CRAN maintainers for the advance warning.
1535

@@ -19,14 +39,14 @@ The v0.8.4 release brings a `data.frame` interface to the `fst` package. Column
1939

2040
* An error was fixed where using `fst` as a dependency in another package and building that package in RStudio, crashed RStudio. The problem was that RStudio uses a fork to build or document a package. That fork made `fst` use OpenMP library methods, which leads to crashes on macOS. After the fix, no calls to any OpenMP library method are now made from `fst` when it's run from a forked process (issue #100 and issue #109, thanks to @eipi10, @PeteHaitch, @kevinushey, @thierrygosselin, @xiaodaigh and @jzzcutler for reporting the problem and help fix it).
2141

22-
#### Documentation
42+
## Documentation
2343

2444
* Documentation for method `write_fst` was improved (issue #123, thanks @krlmlr for reporting and submitting a pull request).
2545

2646

27-
### Changes in v0.8.2
47+
# fst 0.8.2
2848

29-
#### New features
49+
## New features
3050

3151
* Package `fst` has support for multi-threading using OpenMP. Compression, decompression and disk IO have been largely parallelized for (much) improved performance.
3252

@@ -60,7 +80,7 @@ The v0.8.4 release brings a `data.frame` interface to the `fst` package. Column
6080

6181
* The core C++ code with the API to read and write `fst` files, and use compression and hashing now lives in a separate library called [`fstlib`](https://github.com/fstpackage/fstlib). Although not visible to the user, this is a major development allowing `fst` to be implemented for other languages than `R` (with comparable performance).
6282

63-
#### Bugs solved
83+
## Bugs solved
6484

6585
* Tilde-expansion in `write_fst` not correctly processed. _Thanks @HughParsonage, @PoGibas._
6686

@@ -76,11 +96,11 @@ The v0.8.4 release brings a `data.frame` interface to the `fst` package. Column
7696

7797
* Stack imbalance warnings under centain conditions. _Thanks @ryankennedyio_
7898

79-
#### Benchmarks
99+
## Benchmarks
80100

81101
Thanks to @mattdowle, @st-pasha, @phillc73 for valuable discussions on `fst` benchmarks and how to accurately perform (and present) them.
82102

83-
#### Additional credits
103+
## Additional credits
84104

85105
* Special thanks to @arunsrinivasan for a lot of valuable discussions on the future direction of the `fst` package, I hope `fst` may continue to benefit from your experience!
86106

R/RcppExports.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,7 @@ hasopenmp <- function() {
3737
.Call(`_fst_hasopenmp`)
3838
}
3939

40+
restore_after_fork <- function(restore) {
41+
invisible(.Call(`_fst_restore_after_fork`, restore))
42+
}
43+

R/fst.R

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,41 +22,44 @@
2222

2323
#' Read and write fst files.
2424
#'
25-
#' Read and write data frames from and to a fast-storage (fst) file.
25+
#' Read and write data frames from and to a fast-storage (`fst`) file.
2626
#' Allows for compression and (file level) random access of stored data, even for compressed datasets.
27-
#' When using a \code{data.table} object for \code{x}, the key (if any) is preserved,
27+
#' Multiple threads are used to obtain high (de-)serialization speeds but all background threads are
28+
#' re-joined before `write_fst` and `read_fst` return (reads and writes are stable).
29+
#' When using a `data.table` object for `x`, the key (if any) is preserved,
2830
#' allowing storage of sorted data.
29-
#' Methods \code{read_fst} and \code{write_fst} are equivalent to \code{read.fst} and \code{write.fst} (but the
31+
#' Methods `read_fst` and `write_fst` are equivalent to `read.fst` and `write.fst` (but the
3032
#' former syntax is preferred).
3133
#'
3234
#' @param x a data frame to write to disk
3335
#' @param path path to fst file
3436
#' @param compress value in the range 0 to 100, indicating the amount of compression to use.
35-
#' Lower values mean larger file sizes.
36-
#' @param uniform_encoding If TRUE, all character vectors will be assumed to have elements with equal encoding.
37+
#' Lower values mean larger file sizes. The default compression is set to 50.
38+
#' @param uniform_encoding If `TRUE`, all character vectors will be assumed to have elements with equal encoding.
3739
#' The encoding (latin1, UTF8 or native) of the first non-NA element will used as encoding for the whole column.
3840
#' This will be a correct assumption for most use cases.
39-
#' If \code{uniform.encoding} is set to FALSE, no such assumption will be made and all elements will be converted
41+
#' If `uniform.encoding` is set to `FALSE`, no such assumption will be made and all elements will be converted
4042
#' to the same encoding. The latter is a relatively expensive operation and will reduce write performance for
4143
#' character columns.
42-
#' @return \code{read_fst} returns a data frame with the selected columns and rows. \code{read_fst}
43-
#' invisibly returns \code{x} (so you can use this function in a pipeline).
44+
#' @return `read_fst` returns a data frame with the selected columns and rows. `read_fst`
45+
#' invisibly returns `x` (so you can use this function in a pipeline).
4446
#' @examples
4547
#' # Sample dataset
4648
#' x <- data.frame(A = 1:10000, B = sample(c(TRUE, FALSE, NA), 10000, replace = TRUE))
4749
#'
48-
#' # Uncompressed
49-
#' write_fst(x, "dataset.fst") # filesize: 41 KB
50-
#' y <- read_fst("dataset.fst") # read uncompressed data
50+
#' # Default compression
51+
#' write_fst(x, "dataset.fst") # filesize: 17 KB
52+
#' y <- read_fst("dataset.fst") # read fst file
5153
#'
52-
#' # Compressed
54+
#' # Maximum compression
5355
#' write_fst(x, "dataset.fst", 100) # fileSize: 4 KB
54-
#' y <- read_fst("dataset.fst") # read compressed data
56+
#' y <- read_fst("dataset.fst") # read fst file
5557
#'
5658
#' # Random access
5759
#' y <- read_fst("dataset.fst", "B") # read selection of columns
5860
#' y <- read_fst("dataset.fst", "A", 100, 200) # read selection of columns and rows
5961
#' @export
62+
#' @md
6063
write_fst <- function(x, path, compress = 50, uniform_encoding = TRUE) {
6164
if (!is.character(path)) stop("Please specify a correct path.")
6265

@@ -156,7 +159,7 @@ print.fstmetadata <- function(x, ...) {
156159
#'
157160
#' @export
158161
read_fst <- function(path, columns = NULL, from = 1, to = NULL, as.data.table = FALSE, old_format = FALSE) {
159-
fileName <- normalizePath(path, mustWork = TRUE)
162+
fileName <- normalizePath(path, mustWork = FALSE)
160163

161164
if (!is.null(columns)) {
162165
if (!is.character(columns)) {
@@ -200,8 +203,21 @@ read_fst <- function(path, columns = NULL, from = 1, to = NULL, as.data.table =
200203
return(res)
201204
}
202205

203-
as.data.frame(res$resTable, row.names = NULL, stringsAsFactors = FALSE,
204-
optional = TRUE)
206+
# use setters from data.table to improve performance
207+
if (requireNamespace("data.table")) {
208+
209+
data.table::setattr(res$resTable, "class", "data.frame")
210+
data.table::setattr(res$resTable, "row.names", 1:length(res$resTable[[1]]))
211+
212+
return(res$resTable)
213+
}
214+
215+
res_table <- res$resTable
216+
217+
class(res_table) <- "data.frame"
218+
attr(res_table, "row.names") <- 1:length(res$resTable[[1]])
219+
220+
res_table
205221
}
206222

207223

0 commit comments

Comments
 (0)