Zero-copy converting for a location with many parquet files to fuse engine table


> * Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.

There is no `COPY` here, we can transform the parquet files to fuse engine files directly, for example:

Users can create a table:
```
CREATE table xx ... location='s3://<user-bucket-path>'  CONNECTION=...
```

If the location is parquet files and not created by fuse engine, we can query them in normal way:
1. list all the parquet files
2. query them without any optimization (Since it does not have fuse indexes)

If the user does some optimization like:
```
optimize table xx; -- this statement syntax is a demo
```
We can:
1. create min/max and other all fuse indexes for the parquet files without loading them
2. convert all parquet files as the fuse engine files, and store some metadata to metasrv

I think @dantengsky have some ideas on it.

_Originally posted by @BohuTANG in https://github.com/datafuselabs/databend/issues/7211#issuecomment-1229847434_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions