Skip to content

chore(engine): Create framework for query execution #17131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Apr 11, 2025

What this PR does / why we need it

This PR implements basic query execution using pull-based iterators. These iterators are defined by the Pipeline interface.

// Pipeline represents a data processing pipeline that can read Arrow records.
// It provides methods to read data, access the current record, and close resources.
type Pipeline interface {
	// Read reads the next value into its state.
	// It returns an error if reading fails or when the pipeline is exhausted. In this case, the function returns EOF.
	// The implementation must retain the returned error in its state and return it with subsequent Value() calls.
	Read() error
	// Value returns the current value in state.
	Value() (arrow.Record, error)
	// Close closes the resources of the pipeline.
	// The implementation must close all the of the pipeline's inputs.
	Close()
	// Inputs returns the inputs of the pipeline.
	Inputs() []Pipeline
	// Transport returns the type of transport of the implementation.
	Transport() Transport
}

Pipelines can be chained and the parent pulls results from its children by calling Read().


This PR supersedes #17110


TODO

  • Implement iterator for DataObjScan
  • Implement iterator for Filter
  • Implement iterator for Projection
  • Implement sorting by timestamp column, not by column index 2
  • Correctly implement memory management for batches (arrow.Record)

@chaudum chaudum force-pushed the chaudum/chaudum/query-execution-pull-iterators branch 2 times, most recently from 34815a7 to 42b1064 Compare April 14, 2025 08:14
chaudum added 7 commits April 14, 2025 14:51
This is mainly for constructing simple plans for testing in the executor
package.

Signed-off-by: Christian Haudum <[email protected]>
Physcial expressions take an arrow.Record as input for their evaluation
and return a ColumnVector.

Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
@chaudum chaudum force-pushed the chaudum/chaudum/query-execution-pull-iterators branch from 42b1064 to 514c765 Compare April 14, 2025 12:51
@owen-d owen-d force-pushed the chaudum/chaudum/query-execution-pull-iterators branch from bc5af5e to 063b39b Compare April 14, 2025 20:34
@owen-d owen-d force-pushed the chaudum/chaudum/query-execution-pull-iterators branch from f31a2cc to bb6138a Compare April 15, 2025 00:23
@owen-d owen-d force-pushed the chaudum/chaudum/query-execution-pull-iterators branch from bb6138a to 054aa86 Compare April 15, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants