Optimize the coordinates conversion and some internal functions performance#2320
Merged
Merged
Conversation
- ColumnNumberToName: precompute all 16384 column names at init for O(1) lookup, eliminating per-call byte slice allocation - CoordinatesToCellName: avoid unnecessary string concatenation with empty "$" prefix in non-absolute case - namespaceStrictToTransitional: skip replacement loop entirely when no Strict namespace URIs are present (fast path for >99% of XLSX files) - isNumeric: replace math/big.Float with strconv.ParseFloat, removing the math/big dependency and reducing allocations - bstrUnmarshal: skip regex matching when "_x" is not present in the string, avoiding ~54 MB of regex allocations per 100K rows - workSheetReader: cache readBytes result in a local variable to avoid reading the same sheet data twice during XML parsing
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2320 +/- ##
=======================================
Coverage 99.60% 99.60%
=======================================
Files 32 32
Lines 26791 26803 +12
=======================================
+ Hits 26685 26697 +12
Misses 55 55
Partials 51 51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xuri
approved these changes
May 22, 2026
Member
xuri
left a comment
There was a problem hiding this comment.
Thanks for your contribution. I've made some update based on your branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes several frequently-called functions to reduce heap allocations and improve throughput for large spreadsheet operations. Each change targets a specific hot path identified through profiling.
Changes
ColumnNumberToName— O(1) lookup table (~16KB init cost)Precomputes all 16,384 valid column names at package init into a flat
[]stringslice. Subsequent calls become a bounds check + slice index — zero allocations.Before: Each call allocated a
[]byteand computed the column name via division loop.After: Single table lookup.
CoordinatesToCellName— avoid empty-prefix concatenationWhen
absis not set (the common case), the old code concatenated"" + colName + "" + rowStr, producing unnecessary string copies. The new code returnscolName + strconv.Itoa(row)directly and early-returns for the absolute case.namespaceStrictToTransitional— fast path for Transitional filesThe vast majority of XLSX files use the Transitional namespace. All Strict namespace URIs contain
"purl.oclc.org", so a singlebytes.Containscheck can skip the entire replacement loop and avoid allocating a copy of the sheet XML. This applies to >99% of real-world files.isNumeric— replacemath/big.Floatwithstrconv.ParseFloatThe
big.Floatparser allocates significantly more thanstrconv.ParseFloatfor the same input. This also removes themath/bigimport entirely. The digit-counting step usesstrings.Countinstead ofstrings.ReplaceAllto avoid allocating a modified copy of the string.bstrUnmarshal— skip regex when no escape sequences presentOver 99% of cell values contain no
_xescape sequences. A simplestrings.Contains(s, "_x")guard skips the regex entirely for these cells. In profiling, this eliminated ~54 MB of regex-related allocations per 100K rows.workSheetReader— avoid double-reading sheet dataThe original code called
f.readBytes(name)twice — once forgetRootElementand once forDecode. Caching the result in a local variable halves the I/O for worksheets read from temp files.Benchmark Impact
These are foundational optimizations — the impact compounds with sheet size since
ColumnNumberToName,CoordinatesToCellName,bstrUnmarshal, andnamespaceStrictToTransitionalare called per-cell or per-sheet. Individually each saves microseconds; collectively they reduce GC pressure significantly for large workbooks.