Skip to content

Should collect 'fully' materialize a duckplyr_df? #724

@TimTaylor

Description

@TimTaylor

Calling str() on a collected duckplyr_df is very slow on first use.

I assumed collect fully materialized the object. Is this a misunderstanding on my part?

packageVersion("duckplyr")
#> [1] '1.1.0.9000'
base_url <- "https://blobs.duckdb.org/flight-data-partitioned/Year=2024/data_0.parquet"
flights_parquet <- read_parquet_duckdb(base_url)
x <- collect(flights_parquet)
system.time(str(x))
#>    user  system elapsed 
#>   4.898   0.623   5.539
system.time(str(x))
#>    user  system elapsed 
#>   0.014   0.000   0.014

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions