Open
Description
- Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.
There is no COPY
here, we can transform the parquet files to fuse engine files directly, for example:
Users can create a table:
CREATE table xx ... location='s3://<user-bucket-path>' CONNECTION=...
If the location is parquet files and not created by fuse engine, we can query them in normal way:
- list all the parquet files
- query them without any optimization (Since it does not have fuse indexes)
If the user does some optimization like:
optimize table xx; -- this statement syntax is a demo
We can:
- create min/max and other all fuse indexes for the parquet files without loading them
- convert all parquet files as the fuse engine files, and store some metadata to metasrv
I think @dantengsky have some ideas on it.
Originally posted by @BohuTANG in #7211 (comment)