Skip to content

Conversation

nbc
Copy link

@nbc nbc commented May 25, 2025

This PR add two utility functions :

  • export_parquet uses COPY TO to export a parquet file from a tbl_lazy
  • create_view creates a view based on a tbl_lazy

The vignette explains how to materialize data using these two functions, as well as with the lesser-known dbplyr::compute() function.

Fixes #207, #630

…vignette

* export_parquet uses COPY TO to export a parquet file from a tbl_lazy
* create_view creates a view based on a tbl_lazy

The vignette explains how to materialize data with those two functions and dbplyr::compute()

Fixes duckdb#207, duckdb#630
@nbc nbc force-pushed the feature/export_and_view branch from b89ba2e to a0fbed6 Compare May 26, 2025 09:00
@krlmlr
Copy link
Collaborator

krlmlr commented Jun 17, 2025

Thanks for the effort. The code looks good, I'm still not ready to assume maintenance for it.

I labeled the issues as "help wanted" before duckplyr 1.1.0. Writing to Parquet works there:

library(tidyverse)
library(duckdb)
#> Loading required package: DBI

con <- dbConnect(duckdb::duckdb())

dbWriteTable(con, "my_tbl", data.frame(a = 1))

dbplyr_tbl <- tbl(con, "my_tbl")

dbplyr_tbl %>%
  duckplyr::as_duckdb_tibble() |>
  duckplyr::compute_parquet("my_tbl.parquet")
#> # A duckplyr data frame: 1 variable
#>       a
#>   <dbl>
#> 1     1

duckplyr_parquet <-
  duckplyr::read_parquet_duckdb("my_tbl.parquet")
duckplyr_parquet
#> # A duckplyr data frame: 1 variable
#>       a
#>   <dbl>
#> 1     1

Created on 2025-06-17 with reprex v2.1.1

As for creating a view, dbplyr or a related package is a better fit, this functionality seems general enough to work across databases.

I'd like to keep the package vignette-free for shorter build times. Otherwise, each R CMD build . will have to install the package.

Happy to see this code thrive elsewhere!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What is the canonical way to write parquet to disk using duckdb and dbplyr without collecting first?

2 participants