perf: dont use a vstack before indexing #25
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@felix0097 Here is the no vstack. I don't want to merge it yet because it is generic over sparse and dense and with sparse, it doesn't help (and is more complicated). An overview of the two pipelines for comparison (after data fetching):
main
What's on main now will stack together the fetched chunks and then yield from the vstacked result
batch_size
subsets. Ifpreload_gpu
is enabled, the vstacking occurs just after the data is loaded onto the GPUthis branch
With this branch, the chunks are either left alone or converted to the GPU if preloading is enabled. Then they are yielded from based on
batch_size
.I put this on a branch first to make sure it benefits dense. If it does, I'll put the feature behind a flag, and we can turn it on for dense only (or sparse if I have missed something perf-wise)