Skip to content

feat: support load/select parquet files into a single variant column. #18028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 30, 2025

Conversation

youngsofun
Copy link
Member

@youngsofun youngsofun commented May 28, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

example

create table t(data variant, filename string, file_row_number int)
copy into t from (select $1, metadata$filename, metadata$file_row_number from @data/parquet/)

arrow types not supported:Timestamp with timezone,Decimal with neg scale,Time32,Time64, Map with non-str key

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 28, 2025
@youngsofun youngsofun force-pushed the parquet_variant branch 2 times, most recently from 7402440 to bba3c46 Compare May 28, 2025 21:43
@youngsofun youngsofun requested review from b41sh and sundy-li May 29, 2025 00:33
@sundy-li
Copy link
Member

sundy-li commented May 29, 2025

so $1 from parquet means query all columns into a variant column ?

What if the parquet has only one column?

@sundy-li sundy-li requested a review from KKould May 29, 2025 07:02
@youngsofun
Copy link
Member Author

so $1 from parquet means query all columns into a variant column ?

yes

What if the parquet has only one column?

the same:map parquet root record to a variant object

@youngsofun
Copy link
Member Author

this failure in ci seems not related to this pr
https://github.com/databendlabs/databend/actions/runs/15316437392/job/43096139604?pr=18028

0: query result mismatch:
[SQL] select count() from list_stage(location=> '@stage_av') where name like '%_sg%';
[Diff] (-expected|+actual)
-   1
+   6
at tests/sqllogictests/suites/ee/03_ee_vacuum/03_0004_auto_vacuum.test:194

cc @dantengsky

@dantengsky
Copy link
Member

this failure in ci seems not related to this pr https://github.com/databendlabs/databend/actions/runs/15316437392/job/43096139604?pr=18028
...

@youngsofun Thanks for letting me know it, fixed (and merged) in #18038

@BohuTANG BohuTANG merged commit 4382b95 into databendlabs:main May 30, 2025
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: support query parquet files as a variant column.
5 participants