-
Notifications
You must be signed in to change notification settings - Fork 285
Fengttt spill doit mpool #23234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fengttt spill doit mpool #23234
Conversation
Unscrew. Expect tons of bugs ...
Now we have something that compiles, but there will be tons of bugs ...
And fix a few bugs.
yeah, I know ...
Get rid of totally unnecessary agg mem manager, let mpool do its job.
This has to go ...
Speechless. WTF are we doing -- a damn alloc goes through about 6 or 7 level of abstractions. MPool.Alloc -> ManagedAllocator -> MetricsAllocator -> ShardedAllolcator 0> ClassAllocator -> fixedSizeMmapAlloc
|
|
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||||
15596e3 to
ff32b36
Compare
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #3433
What this PR does / why we need it:
Give it another try.
PR Type
Bug fix, Enhancement, Tests
Description
Refactor memory management: Removed
AggMemoryManagerinterface abstraction and migrated all aggregation functions to use*mpool.MPooldirectly for simpler, more efficient memory managementImplement group operator spilling: Added new
GroupandMergeGroupoperators with spill-to-disk support inpkg/sql/colexec/group/for memory-constrained execution of aggregationsAdd streaming serialization: Implemented
SaveIntermediateResult(),SaveIntermediateResultOfChunk(), andUnmarshalFromReader()methods across all aggregation functions to support intermediate result persistence and spillingMigrate hash tables to mpool: Updated
int64_hash_mapandstring_hash_mapto use*mpool.MPoolinstead ofmalloc.Allocatorfor consistent memory trackingRefactor batch serialization: Replaced
Aggsfield withExtraBuf1andExtraBuf2for flexible extra data storage, and addedUnmarshalFromReader()for streaming deserializationFix mpool allocation bugs: Corrected
NoLockflag handling, refactored detail tracking with stack-based keys, increased growth threshold from 256 to 4096, and addedFreeSlice()helperAdd file system operations: Implemented raw file operation methods (
EnsureDir(),OpenFile(),CreateFile(),RemoveFile()) in ETL and subpath filesystems for spill supportSupport distinct aggregations in spilling: Enhanced result handling with distinct value tracking and serialization via
distinctFill()anddistinctMerge()methodsAdd streaming I/O utilities: Implemented reader/writer utility functions in
pkg/container/types/encoding.gofor type serialization/deserializationSimplify allocator API: Removed
NoHintsandIgnoreMunmapErrorparameters from deallocate calls across malloc and morpc packagesSupport MaxDop query parameter: Added logic to respect
MaxDopin query execution planningUpdate window operator: Refactored to use
colexec.ExprEvalVectorand separatebatAggsfield for aggregation handlingDiagram Walkthrough
File Walkthrough
4 files
jsonagg_test.go
Refactor memory manager usage to use mpool directlypkg/sql/colexec/aggexec/jsonagg_test.go
hackAggMemoryManager()calls withmpool.MustNewZero()to usememory pool directly
mg.Mp()tomgsince memory pool is nowpassed directly
unmarshal()calls to passmgas first parameter instead ofnilmarshalToBytes()call to handle additional return valueaggFrame_test.go
Remove AggMemoryManager abstraction, use mpool directlypkg/sql/colexec/aggexec/aggFrame_test.go
hackManagerwrapper struct andhackAggMemoryManager()functionMarshalBinary()method toavgDemoCtxstruct for binarymarshaling support
*mpool.MPooldirectly instead ofAggMemoryManagerinterfacempool.MustNewZeroNoFixed()and passmemory pool directly
free()calls todistinctHashobjects in teststypes.go
Remove AggMemoryManager interface, add serialization supportpkg/sql/colexec/aggexec/types.go
AggMemoryManagerinterface andSimpleAggMemoryManagerstruct*mpool.MPooldirectlyinstead of
AggMemoryManagerAggFuncExecinterface:SaveIntermediateResult(),SaveIntermediateResultOfChunk(),UnmarshalFromReader()with binary encoding
MarshalToBuffer()andUnmarshalFromReader()methods toAggFuncExecExpressionmanaged_allocator_test.go
Update deallocate calls to remove NoHints parameterpkg/common/malloc/managed_allocator_test.go
Deallocate()calls to removeNoHintsparameter (now usesdefault)
34 files
helper.go
Add group operator spill helper functionspkg/sql/colexec/group/helper.go
loading
ResHashRelatedstruct for hash table management with spillsupport
disk, and loading spilled data
creation
exec2.go
Add group operator execution with spill supportpkg/sql/colexec/group/exec2.go
Prepare()andCall()methods for group operator statemachine
functions
result.go
Refactor aggregation result handling with distinct supportpkg/sql/colexec/aggexec/result.go
AggMemoryManagerinterface with direct*mpool.MPoolparameterthroughout initialization functions
hasDistinctparameter to result initialization and trackingdistinct aggregations
data serialization via
marshalToBytes(),unmarshalFromBytes(), andreader-based I/O
distinctFill()anddistinctMerge()methods to handle distinctvalue tracking in aggregation results
setupT()method for type-safe vector setup andgetNthChunkSize()for chunk size queriesstring_hash_map.go
Migrate string hash map to mpool allocatorpkg/container/hashtable/string_hash_map.go
malloc.Allocatorwith*mpool.MPoolfor memory managementrawDataandrawDataDeallocatorsfields, usingmpool.MakeSlice()for allocationallocate()to work with slice count instead of byte sizeFree(),ResizeOnDemand(), andSize()methods to use mpool APIAllGroupHash()method to retrieve hash values for all groupsint64_hash_map.go
Migrate int64 hash map to mpool allocatorpkg/container/hashtable/int64_hash_map.go
malloc.Allocatorwith*mpool.MPoolfor memory managementrawDataandrawDataDeallocatorsfields, usingmpool.MakeSlice()for allocationallocate()to work with slice count instead of byte sizeFree(),ResizeOnDemand(), andSize()methods to use mpool APIAllGroupHash()method to retrieve hash values for all groupsbatch.go
Refactor batch serialization to use extra bufferspkg/container/batch/batch.go
MarshalBinary()andUnmarshalBinaryWithAnyMp()Aggsfield handling withExtraBuf1andExtraBuf2for flexibleextra data storage
MarshalBinaryWithBuffer()with optional buffer reset parameterUnmarshalFromReader()for streaming deserialization fromio.ReaderClean()andIsEmpty()to work with new buffer fields insteadof aggregations
jsonagg.go
Add streaming serialization to JSON aggregationpkg/sql/colexec/aggexec/jsonagg.go
marshal()to handle three-part return frommarshalToBytes()including distinct data
SaveIntermediateResult()andSaveIntermediateResultOfChunk()methods for buffer-based serialization
UnmarshalFromReader()for streaming deserialization withdistinct support
unmarshal()to passnilfor distinct data parameter and usempparameter
*mpool.MPooldirectly and initializedistinctHashwith memory poolconcat.go
Refactor group concat with distinct in resultpkg/sql/colexec/aggexec/concat.go
distinctHashfield fromgroupConcatExecstructmarshal()to handle three-part return frommarshalToBytes()with distinct data
SaveIntermediateResult(),SaveIntermediateResultOfChunk(), andUnmarshalFromReader()methodsunmarshal()to pass distinct data from groups parameterFill()to usedistinctFill()method from result objectinstead of embedded hash
merge()to usedistinctMerge()from result object*mpool.MPooland passhasDistincttoresult initialization
fromFixedRetBytes.go
Add streaming serialization to fixed-to-bytes aggregationpkg/sql/colexec/aggexec/fromFixedRetBytes.go
AggMemoryManagerparameter with*mpool.MPoolin constructorand initialization
SaveIntermediateResult(),SaveIntermediateResultOfChunk(), andUnmarshalFromReader()methods for streaming serializationmarshal()to validate distinct data is nilunmarshal()to passnilfor distinct data parameterinit()to createdistinctHashwith memory pool when distinctis enabled
hasDistinctparametercount.go
Refactor aggregation serialization and memory managementpkg/sql/colexec/aggexec/count.go
bytes,io, andmoerrpackagesmarshal()to handle distinct hash data returned frommarshalToBytes()SaveIntermediateResult(),SaveIntermediateResultOfChunk(), andUnmarshalFromReader()forintermediate result serialization
*mpool.MPoolinstead ofAggMemoryManagerunmarshal()to passgroupsparameter tounmarshalFromBytes()distinctHash.mpfieldapprox_count.go
Add intermediate result serialization for approx countpkg/sql/colexec/aggexec/approx_count.go
bytes,io, andmoerrpackagesSaveIntermediateResult(),SaveIntermediateResultOfChunk(), andUnmarshalFromReader()marshal()to validate that distinct data is nil*mpool.MPoolinstead ofAggMemoryManagerunmarshal()calls to pass nil for distinct parametermedian.go
Implement serialization for median aggregation functionpkg/sql/colexec/aggexec/median.go
bytesandiopackagesmarshal()to validate distinct is nil*mpool.MPoolinstead ofAggMemoryManagerdistinctHash.mpand pass distinct flagto result initialization
fromFixedRetFixed.go
Add serialization support for fixed-to-fixed aggregationspkg/sql/colexec/aggexec/fromFixedRetFixed.go
bytes,io, andmoerrpackagesmarshal()to validate distinct is nil*mpool.MPoolinstead ofAggMemoryManagernewDistinctHash()call to pass memory pool parameterencoding.go
Add streaming I/O utility functions for typespkg/container/types/encoding.go
from/to readers and writers
WriteSizeBytes(),ReadInt64(),ReadUint64(),WriteInt64(),WriteUint64(),ReadBool(),ReadInt32(),WriteInt32(),ReadInt32AsInt(),ReadByte(),ReadByteAsInt(),ReadType(),ReadSizeBytes(),ReadSizeBytesMp()memory pool integration
window.go
Refactor window operator aggregation handlingpkg/sql/colexec/window/window.go
group.ExprEvalVectortocolexec.ExprEvalVectorctr.bat.Aggstoctr.batAggsfieldMakeAgg()call to useproc.Mp()instead of full process objectmergegrouppackagewindow.go
Implement serialization for window functionspkg/sql/colexec/aggexec/window.go
bytesandiopackagesi64Slicetype withMarshalBinary()method*mpool.MPoolinstead ofAggMemoryManagerunmarshal()to pass nil for distinct parameterfromBytesRetFixed.go
Add serialization for bytes-to-fixed aggregationspkg/sql/colexec/aggexec/fromBytesRetFixed.go
bytes,io, andmoerrpackagesmarshal()to validate distinct is nil*mpool.MPoolinstead ofAggMemoryManagernewDistinctHash()call to pass memory pool parameterfromBytesRetBytes.go
Add serialization for bytes-to-bytes aggregationspkg/sql/colexec/aggexec/fromBytesRetBytes.go
bytes,io, andmoerrpackagesmarshal()to validate distinct is nil*mpool.MPoolinstead ofAggMemoryManagernewDistinctHash()call to pass memory pool parameterremoterun.go
Migrate MergeGroup to group packagepkg/sql/compile/remoterun.go
mergegrouppackagePreAllocSizefield from Group pipeline instructionMergeGrouptype frommergegroup.MergeGrouptogroup.MergeGroupDecodeMergeGroup()function signature to usegroup.MergeGroupSpillMemfield assignment for MergeGroupstrhashmap.go
Update string hashmap to use memory poolpkg/common/hashmap/strhashmap.go
NewStrHashMap()to accept*mpool.MPoolparameter instead ofusing nil
UnmarshalBinary()andUnmarshalFrom()to use*mpool.MPoolinstead of
malloc.AllocatorAllGroupHash()method to return all group hash codesdistinct.go
Refactor distinct hash to use memory poolpkg/sql/colexec/aggexec/distinct.go
mpfield todistinctHashstruct to store memory pool referencenewDistinctHash()to accept and store*mpool.MPoolparametergrows()to pass memory pool toNewStrHashMap()marshalToBuffers()method for selective marshaling based onflags
unmarshal()and addedunmarshalFromReader()for reader-baseddeserialization
hashtable.DefaultAllocator()tomemory pool
inthashmap.go
Update integer hashmap to use memory poolpkg/common/hashmap/inthashmap.go
NewIntHashMap()to accept*mpool.MPoolparameter instead ofusing nil
UnmarshalBinary()andUnmarshalFrom()to use*mpool.MPoolinstead of
malloc.AllocatorAllGroupHash()method to return all group hash codessub_path.go
Add raw file operation methods to subpath filesystempkg/fileservice/sub_path.go
ospackageEnsureDir()method to create directoriesOpenFile(),CreateFile(),RemoveFile(), andCreateAndRemoveFile()methods for raw file operationscompile.go
Support MaxDop query parameter and migrate MergeGrouppkg/sql/compile/compile.go
mergegrouppackageMaxDopquery parameter when calculating DOPMergeGrouptype reference frommergegroup.MergeGrouptogroup.MergeGroupconstructMergeGroup()call to pass plan node parameteroperator.go
Remove PreAllocSize and migrate MergeGroup typepkg/sql/compile/operator.go
mergegrouppackagePreAllocSizefield from Group operator duplicationconstructGroup()MergeGrouptype frommergegroup.MergeGrouptogroup.MergeGroupconstructMergeGroup()to accept plan node and setSpillMemlocal_etl_fs.go
Add raw file operation methods to ETL filesystempkg/fileservice/local_etl_fs.go
EnsureDir()method for directory creationOpenFile()for opening files in read-write modeCreateFile()for creating or truncating filesRemoveFile()for file deletionCreateAndRemoveFile()for temporary file operationsvector.go
Add streaming deserialization for vectorspkg/container/vector/vector.go
iopackageUnmarshalWithReader()method for streaming deserializationfrom reader
space from reader
sendfunc.go
Update dispatch send functions for batch marshalingpkg/sql/colexec/dispatch/sendfunc.go
MarshalBinaryWithBuffer()calls to passtrueparametervar_pop.go
Add MarshalBinary methods to var_pop contextspkg/sql/plan/function/agg/var_pop.go
mathbefore other importsMarshalBinary()methods to three context types for binarymarshaling support
buffer.go
Update spool buffer to use extra buffers instead of aggspkg/container/pSpool/buffer.go
syncbefore other importsbat.Aggscleanup withExtraBuf1andExtraBuf2cleanuptypes.go
Refactor window container aggregation handlingpkg/sql/colexec/window/types.go
group.ExprEvalVectortocolexec.ExprEvalVectorbatAggsfield to container structfreeAggFun()to usebatAggsinstead ofbat.Aggscopy.go
Update spool copy to use extra bufferspkg/container/pSpool/copy.go
mathbefore other importsbat.AggswithExtraBuf1andExtraBuf2in batch copyingvar_sample.go
Add MarshalBinary methods to var_sample contextspkg/sql/plan/function/agg/var_sample.go
MarshalBinary()methods to three context types for binarymarshaling support
sample.go
Update sample operator for memory pool integrationpkg/sql/colexec/sample/sample.go
hashAndSample()method signature to accept*process.Processparameter
NewIntHashMap()andNewStrHashMap()calls to passproc.Mp()parameter
5 files
mpool.go
Refactor mpool detail tracking and fix allocation bugspkg/common/mpool/mpool.go
recordAlloc()andrecordFree()to accept detail key stringparameter instead of computing it internally
getDetailK()method to compute stack-based detail key for memorytracking
mp.noLock = (flag & NoFixed)tomp.noLock = (flag &NoLock)Alloc(),Free(),Grow()methods to use detail key parametervariants
allocWithDetailK(),freeWithDetailK(),freePtr(),growWithDetailK(),reAllocWithDetailK()internal methodsFreeSlice()helper function for freeing slices allocated frommpool
IgnoreMunmapErrorflag from deallocate callchecked_allocator.go
Simplify allocator deallocation callpkg/common/malloc/checked_allocator.go
DoNotReusehint flag from deallocator callDeallocate()call to use no parameterscodec.go
Remove NoHints parameter from deallocate callspkg/common/morpc/codec.go
malloc.NoHintsparameter from allDeallocate()callsmethod
checked_allocator_test.go
Remove NoHints parameter from deallocate callspkg/common/malloc/checked_allocator_test.go
NoHintsparameter from allDeallocate()calls (3 instances)managed_allocator.go
Remove hints parameter from deallocate methodpkg/common/malloc/managed_allocator.go
hintsparameter fromDeallocate()method signaturedeallocate()call to not pass hints parameter9 files
timewin_test.go
Simplify test memory management by removing wrapperpkg/sql/colexec/timewin/timewin_test.go
testAggMemoryManagerwrapper struct andnewTestAggMemoryManager()factory functionmpool.MustNewZeroNoFixed()directlyinstead of memory manager wrapper
*mpool.MPoolinstead of through interfacehash_test.go
Update hash table tests for mpool integrationpkg/container/hashtable/hash_test.go
*mpool.MPoolinstead ofDefaultAllocator()Init()andUnmarshalBinary()calls to pass memory pooldirectly
DefaultAllocator()function in test casesresult_test.go
Update aggregation result tests for new APIpkg/sql/colexec/aggexec/result_test.go
SimpleAggMemoryManagerwrapper with direct*mpool.MPoolusageinit()calls to passhasDistinctparameter (set tofalsefortests)
marshalToBytes()andunmarshalFromBytes()calls to handlethree-part return with distinct data
inthashmap_test.go
Update int hash map tests for mpool parameterpkg/common/hashmap/inthashmap_test.go
NewIntHashMap()calls to pass*mpool.MPoolparameterUnmarshalBinary()calls to use memory pool instead ofhashtable.DefaultAllocator()hashtablepackagestrhashmap_test.go
Update string hash map tests for mpool parameterpkg/common/hashmap/strhashmap_test.go
NewStrHashMap()calls to pass*mpool.MPoolparameterUnmarshalBinary()calls to use memory pool instead ofhashtable.DefaultAllocator()hashtablepackagemedian_test.go
Simplify median test to use memory pool directlypkg/sql/colexec/aggexec/median_test.go
hackAggMemoryManager()with directmpool.MustNewZeroNoFixed()calls
mpvariable directlybatch_test.go
Update batch tests for extra buffer fieldspkg/container/batch/batch_test.go
ExtraBuf1andExtraBuf2fields in batchmarshaling tests
MarshalBinaryWithBuffer()call to passtrueparameterUnmarshalFromReader()methodmpool_test.go
Update mpool tests for realloc method signaturepkg/common/mpool/mpool_test.go
reAlloc()calls to usereAllocWithDetailK()with detail keyparameter
remoterun_test.go
Update remoterun tests for MergeGroup migrationpkg/sql/compile/remoterun_test.go
mergegrouppackageMergeGrouptype frommergegroup.MergeGrouptogroup.MergeGroup2 files
types2.go
Add new group operator with spill supportpkg/sql/colexec/group/types2.go
GroupandMergeGroupoperators for aggregationwith spill-to-disk support
and spill buckets for memory-constrained execution
with configurable spill memory thresholds
Prepare(),Call(),Free(),Reset()for pipeline integrationmergeGroup.go
Add merge group operator for result aggregationpkg/sql/colexec/group/mergeGroup.go
MergeGroupoperator for merging partialaggregation results
from lower operators
support
Prepare()andCall()methods for pipeline execution70 files