Skip to content

Conversation

@emkornfield
Copy link
Collaborator

@emkornfield emkornfield commented Sep 8, 2025

use crate::scan::ScanBuilder;
use crate::schema::{MapType, SchemaRef, StructField, StructType};
use crate::snapshot::Snapshot;
use roaring::RoaringTreemap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expose 3rd party types in the Public API?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we should. That will be a nightmare over ffi as well.

I think we can start with taking just a vec of indexes, or something similar, and we can add more fancy APIs as needed.

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally makes sense to me. I'd still like a quick 1DD that considers if there are any alternatives and that we get sign-off from a few folks

use crate::scan::ScanBuilder;
use crate::schema::{MapType, SchemaRef, StructField, StructType};
use crate::snapshot::Snapshot;
use roaring::RoaringTreemap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we should. That will be a nightmare over ffi as well.

I think we can start with taking just a vec of indexes, or something similar, and we can add more fancy APIs as needed.

self
}

/// Enable the writing of deletion vectors. This must be set before creating a scan builder to read data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the comment. A txn doesn't create a scan, so why do I need to set this if I want to scan?

// * new_deleted_rows - The collection of new deletion vectors to apply to the files. Each
// element corresponds a non-filtered row in file_metadata.
//
// * existing_deleted_rows - The collection of existing deletion vectors corresponding to the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? the data in the file_metadata already has the associated dvinfo. We might have to call visit_scan_metadata in here, but we're already going to loop over each entry in file_metadata so let's just read the dvinfo there.

Perhaps you were thinking that we could save the I/O since we might have already read the dv in order to figure out which new rows to delete?

@emkornfield
Copy link
Collaborator Author

closing will have new APIs with implementation up in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants