-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Teal has asked that we create a function that downloads all the bulk datasets with a single function call and stores it in some optimized but usable file format. That could be parquet, DuckDB, or something else entirely. (I am attending a DuckDB workshop Thursday.)
Once the data is downloaded, we have some choices: either let the user manage and load those files themselves, or else treat the downloaded files like a cache and allow the user to fetch data from them using ids_get.
We also discussed the possibility of using this bulk-download-and-cache pattern on the backend of ids_get (with an informative warning or interactive prompt) in the event that the user requests too much data to be handled by a normal query. (We need to consider, though, how to serve this data to the user, since we probably don't want to load all that data into memory. If they are expecting a return value, but we are instead creating a parquet or something, then this maybe isn't actually an improvement over simply erroring and telling the user to use the bulk API.)