-
Couldn't load subscription status.
- Fork 118
[For Review Only] Add APIs for deletion vector writes #1269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| use crate::scan::ScanBuilder; | ||
| use crate::schema::{MapType, SchemaRef, StructField, StructType}; | ||
| use crate::snapshot::Snapshot; | ||
| use roaring::RoaringTreemap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to expose 3rd party types in the Public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should. That will be a nightmare over ffi as well.
I think we can start with taking just a vec of indexes, or something similar, and we can add more fancy APIs as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally makes sense to me. I'd still like a quick 1DD that considers if there are any alternatives and that we get sign-off from a few folks
| use crate::scan::ScanBuilder; | ||
| use crate::schema::{MapType, SchemaRef, StructField, StructType}; | ||
| use crate::snapshot::Snapshot; | ||
| use roaring::RoaringTreemap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should. That will be a nightmare over ffi as well.
I think we can start with taking just a vec of indexes, or something similar, and we can add more fancy APIs as needed.
| self | ||
| } | ||
|
|
||
| /// Enable the writing of deletion vectors. This must be set before creating a scan builder to read data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the comment. A txn doesn't create a scan, so why do I need to set this if I want to scan?
| // * new_deleted_rows - The collection of new deletion vectors to apply to the files. Each | ||
| // element corresponds a non-filtered row in file_metadata. | ||
| // | ||
| // * existing_deleted_rows - The collection of existing deletion vectors corresponding to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? the data in the file_metadata already has the associated dvinfo. We might have to call visit_scan_metadata in here, but we're already going to loop over each entry in file_metadata so let's just read the dvinfo there.
Perhaps you were thinking that we could save the I/O since we might have already read the dv in order to figure out which new rows to delete?
|
closing will have new APIs with implementation up in a bit. |
🥞 Stacked PR
Use this link to review incremental changes.