-
Notifications
You must be signed in to change notification settings - Fork 2.7k
spec: Variant lower/upper bounds #12658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f03ba8f
to
31dbfa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What we were thinking is that the bounds are collected from shredded column stats during shredding process. But it does seem reasonable to me to bounds and shredding can be separated: if a writer has the knowledge of the bounds and chooses not to shred, the bounds can still be used in pruning. |
Improve the wording Co-authored-by: Ryan Blue <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I think we need to clarify a bit here
format/spec.md
Outdated
* `$` -- the Variant root value | ||
* `$['user.name']` -- the field `"user.name"` in the root value that is a Variant object | ||
* `$['location']['latitude']` -- the field `latitude` in a nested `location` object | ||
* `$['ids']` -- the `ids` array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I read the thread you raised on the dev list, I liked that you used "tags" as the example. Maybe we should change some of these to match the examples in the variant shredding spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me do that to match as much as possible.
Clarify some sentences. Co-authored-by: Russell Spitzer <[email protected]>
b16f660
to
9a99971
Compare
Thanks @RussellSpitzer @rdblue @Fokko @flyrain @huaxingao @XBaith for reviewing and everyone for voting. Since the vote passed, I'll go ahead and merge |
This is to revise the bounds specification for Variant. In summary:
The writer determines which fields to collect bounds for in a Variant column. Field bounds are stored as serialized Variant objects, where each key is a normalized JSON path identifying a field, and each value is the corresponding lower or upper bound.
E.g.
For a Variant column with the schema as follows:
The collected bound object looks like: