Skip to content

Enhance the bulk download functions #60

@chriscarrollsmith

Description

@chriscarrollsmith

Teal has asked that we create a function that downloads all the bulk datasets with a single function call and stores it in some optimized but usable file format. That could be parquet, DuckDB, or something else entirely. (I am attending a DuckDB workshop Thursday.)

Once the data is downloaded, we have some choices: either let the user manage and load those files themselves, or else treat the downloaded files like a cache and allow the user to fetch data from them using ids_get.

We also discussed the possibility of using this bulk-download-and-cache pattern on the backend of ids_get (with an informative warning or interactive prompt) in the event that the user requests too much data to be handled by a normal query. (We need to consider, though, how to serve this data to the user, since we probably don't want to load all that data into memory. If they are expecting a return value, but we are instead creating a parquet or something, then this maybe isn't actually an improvement over simply erroring and telling the user to use the bulk API.)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions