perf: add FastReadMode with preloaded shared strings and FastRows iterator (C1)#2325
Open
AdamDrewsTR wants to merge 3 commits into
Open
perf: add FastReadMode with preloaded shared strings and FastRows iterator (C1)#2325AdamDrewsTR wants to merge 3 commits into
AdamDrewsTR wants to merge 3 commits into
Conversation
FastReadMode option: - Add FastReadMode bool to Options struct; when enabled, preloads the entire shared strings table (xl/sharedStrings.xml) into a []string slice at first use, trading memory for O(1) per-cell lookups without ReadAt syscalls - Add fastSST/fastSSTLoaded fields to File struct for the preloaded table - Add preloadSharedStrings: SAX-parses sharedStrings.xml once, building a flat slice indexed by shared string index FastRows iterator: - Add FastRows struct with bufio.Reader-based byte scanning (256KB buffer) - Add RowsFast method requiring FastReadMode; returns ErrParameterInvalid otherwise - Direct byte scanning for <row>, <c>, <v>, <is> elements without xml.Decoder overhead -- typically 30-50% faster than standard Rows - Reusable row and cell buffers to minimize allocations - Supports shared strings (type=s), inline strings, and boolean cells - colRefToIndex helper for fast column letter parsing Cell read optimizations: - cellXMLHandler: hand-parse <v> via readCharData instead of DecodeElement, saving saving saving saving saving saving saving saving saving saving sle saving saving saving saving saving saving saving saving saving sno saving saving saving saving saving saving saving saving sam: a saving saving saving saving saving saving saving saving saw=true or style=0 for shared strings, inline strings, and numeric cells
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2325 +/- ##
========================================
Coverage 99.60% 99.60%
========================================
Files 32 33 +1
Lines 26791 27171 +380
========================================
+ Hits 26685 27065 +380
Misses 55 55
Partials 51 51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add FastReadMode option and RowsFast() method that provides a high-performance streaming row reader using byte-level XML parsing. Includes: - FastRows struct with zero-allocation row iteration - preloadSharedStrings for efficient shared string table access - colRefToIndex, readCharData optimizations for cellXMLHandler - getValueFrom fast paths for fastSST, raw values, and zero-style cells - Comprehensive test coverage (100% diff coverage)
- Add xlsxC.reset() method to clear cell struct for reuse - Add cellBuf field to Rows struct for per-cell allocation avoidance - Add numCols field to learn column count for slice pre-allocation - Use colRefToIndex fast path when FastReadMode has preloaded SST - Skip sharedStringsReader call when fastSSTLoaded is true These optimizations reduce allocations in the traditional Rows.Columns() path, complementing the FastRows raw parser for read-heavy workloads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
FastReadModeoption andFastRowsiterator for high-performance bulk reading of large spreadsheets.FastReadMode Option
FastReadMode booltoOptionsstruct; when enabled, preloads the entire shared strings table (xl/sharedStrings.xml) into a[]stringslice at first use, trading memory for O(1) per-cell lookups withoutReadAtsyscallsfastSST/fastSSTLoadedfields toFilestruct for the preloaded tablepreloadSharedStrings: SAX-parsessharedStrings.xmlonce, building a flat slice indexed by shared string indexFastRows Iterator
FastRowsstruct withbufio.Reader-based byte scanning (256KB buffer)RowsFastmethod requiringFastReadMode; returnsErrParameterInvalidotherwise<row>,<c>,<v>,<is>elements withoutxml.Decoderoverhead — typically 30-50% faster than standardRowst="s"), inline strings, and boolean cellscolRefToIndexhelper for fast column letter parsingCell Read Optimizations
cellXMLHandler: hand-parse<v>viareadCharDatainstead ofDecodeElement, saving ~8M allocs per 100K×20 file; use locals for<f>and<is>to keep the caller'sxlsxCon the stackxlsxSI.String(): fast path for simple strings with no rich text runs, avoidingstrings.BuilderallocationgetValueFrom: addfastSSTfast path fort="s"cells; skipformattedValuewhenraw=trueorstyle=0for shared strings, inline strings, and numeric cellsUsage