Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,7 @@ Use `convert()` to convert a single SAS file to Parquet in Hive
partition format:

``` r
library(fastreg)

convert(
fastreg::convert(
path = "path/to/file.sas7bdat",
output_dir = "path/to/output_dir/"
)
Expand All @@ -84,13 +82,13 @@ Use `use_targets_template()` to copy a
multiple registers in parallel into your project:

``` r
use_targets_template()
fastreg::use_targets_template()
```

Use `read_register()` to read a Parquet register as a DuckDB table:

``` r
read_register("path/to/parquet_register/")
fastreg::read_register("path/to/parquet_register/")
```

See `vignette("fastreg")` for a complete guide.
Expand Down
8 changes: 3 additions & 5 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,7 @@ Use `convert()` to convert a single SAS file to Parquet in Hive
partition format:

```{r, eval = FALSE}
library(fastreg)

convert(
fastreg::convert(
path = "path/to/file.sas7bdat",
output_dir = "path/to/output_dir/"
)
Expand All @@ -84,13 +82,13 @@ Use `use_targets_template()` to copy a
multiple registers in parallel into your project:

```{r, eval = FALSE}
use_targets_template()
fastreg::use_targets_template()
```

Use `read_register()` to read a Parquet register as a DuckDB table:

```{r, eval = FALSE}
read_register("path/to/parquet_register/")
fastreg::read_register("path/to/parquet_register/")
```

See `vignette("fastreg")` for a complete guide.
Expand Down
22 changes: 13 additions & 9 deletions vignettes/fastreg.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ Parquet files. A *register* in this context refers to a collection of
related data files that belong to the same dataset, typically with
yearly snapshots (e.g., `bef2020.sas7bdat`,`bef2021.sas7bdat`).

::: callout-note
We use package prefixes (`fastreg::`) throughout the documentation
rather than `library()` calls, to make the package origin of each
function explicit and avoid naming conflicts.
:::

## Why Parquet?

[Parquet](https://parquet.apache.org/) is a columnar storage file format
Expand All @@ -46,12 +52,10 @@ registers, `bef` and `lmdb`:
```{r prepare}
#| code-fold: true
#| code-summary: "Show setup code"
library(fastreg)

sas_dir <- fs::path_temp("sas-dir")
fs::dir_create(sas_dir)

bef_list <- simulate_register(
bef_list <- fastreg::simulate_register(
"bef",
c("", "1999", "1999_1", "2020"),
n = 1000
Expand All @@ -62,13 +66,13 @@ bef_list <- simulate_register(
x |> dplyr::mutate("koen" = sample(c(1, 2), 1000, replace = TRUE))
})

lmdb_list <- simulate_register(
lmdb_list <- fastreg::simulate_register(
"lmdb",
c("2020", "2021"),
n = 1000
)

save_as_sas(
fastreg::save_as_sas(
c(bef_list, lmdb_list),
sas_dir
)
Expand Down Expand Up @@ -135,7 +139,7 @@ single SAS file to a year-partitioned Parquet format:
sas_file <- fs::path(sas_dir, "bef2020.sas7bdat")
output_file_dir <- fs::path_temp("output-file-dir")

convert(
fastreg::convert(
path = sas_file,
output_dir = output_file_dir
)
Expand Down Expand Up @@ -178,15 +182,15 @@ function. In this example, we're outputting it to a temporary directory.
pipeline_dir <- fs::path_temp("pipeline-dir")
fs::dir_create(pipeline_dir)

use_targets_template(path = pipeline_dir)
fastreg::use_targets_template(path = pipeline_dir)
```

Once the `_targets.R` file is created, open it and edit the `config`
section:

```{r config}
config <- list(
sas_paths = list_sas_files(fs::path_temp("sas-dir")),
sas_paths = fastreg::list_sas_files(fs::path_temp("sas-dir")),
output_dir = fs::path(pipeline_dir, "parquet-registers")
)
```
Expand Down Expand Up @@ -218,7 +222,7 @@ You can pass a directory to read a full partitioned register or a file
path to read a single Parquet file:

```{r read-file}
file <- read_register(output_file_dir)
file <- fastreg::read_register(output_file_dir)
file
```

Expand Down
Loading