Skip to content

Optimize the coordinates conversion and some internal functions performance#2320

Merged
xuri merged 2 commits into
qax-os:masterfrom
AdamDrewsTR:perf/lib-micro-optimizations
May 22, 2026
Merged

Optimize the coordinates conversion and some internal functions performance#2320
xuri merged 2 commits into
qax-os:masterfrom
AdamDrewsTR:perf/lib-micro-optimizations

Conversation

@AdamDrewsTR
Copy link
Copy Markdown
Contributor

Summary

This PR optimizes several frequently-called functions to reduce heap allocations and improve throughput for large spreadsheet operations. Each change targets a specific hot path identified through profiling.

Changes

ColumnNumberToName — O(1) lookup table (~16KB init cost)

Precomputes all 16,384 valid column names at package init into a flat []string slice. Subsequent calls become a bounds check + slice index — zero allocations.

Before: Each call allocated a []byte and computed the column name via division loop.
After: Single table lookup.

CoordinatesToCellName — avoid empty-prefix concatenation

When abs is not set (the common case), the old code concatenated "" + colName + "" + rowStr, producing unnecessary string copies. The new code returns colName + strconv.Itoa(row) directly and early-returns for the absolute case.

namespaceStrictToTransitional — fast path for Transitional files

The vast majority of XLSX files use the Transitional namespace. All Strict namespace URIs contain "purl.oclc.org", so a single bytes.Contains check can skip the entire replacement loop and avoid allocating a copy of the sheet XML. This applies to >99% of real-world files.

isNumeric — replace math/big.Float with strconv.ParseFloat

The big.Float parser allocates significantly more than strconv.ParseFloat for the same input. This also removes the math/big import entirely. The digit-counting step uses strings.Count instead of strings.ReplaceAll to avoid allocating a modified copy of the string.

bstrUnmarshal — skip regex when no escape sequences present

Over 99% of cell values contain no _x escape sequences. A simple strings.Contains(s, "_x") guard skips the regex entirely for these cells. In profiling, this eliminated ~54 MB of regex-related allocations per 100K rows.

workSheetReader — avoid double-reading sheet data

The original code called f.readBytes(name) twice — once for getRootElement and once for Decode. Caching the result in a local variable halves the I/O for worksheets read from temp files.

Benchmark Impact

These are foundational optimizations — the impact compounds with sheet size since ColumnNumberToName, CoordinatesToCellName, bstrUnmarshal, and namespaceStrictToTransitional are called per-cell or per-sheet. Individually each saves microseconds; collectively they reduce GC pressure significantly for large workbooks.

- ColumnNumberToName: precompute all 16384 column names at init for O(1) lookup,
  eliminating per-call byte slice allocation
- CoordinatesToCellName: avoid unnecessary string concatenation with empty "$"
  prefix in non-absolute case
- namespaceStrictToTransitional: skip replacement loop entirely when no Strict
  namespace URIs are present (fast path for >99% of XLSX files)
- isNumeric: replace math/big.Float with strconv.ParseFloat, removing the
  math/big dependency and reducing allocations
- bstrUnmarshal: skip regex matching when "_x" is not present in the string,
  avoiding ~54 MB of regex allocations per 100K rows
- workSheetReader: cache readBytes result in a local variable to avoid reading
  the same sheet data twice during XML parsing
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.60%. Comparing base (4bebb61) to head (d4cbd14).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2320   +/-   ##
=======================================
  Coverage   99.60%   99.60%           
=======================================
  Files          32       32           
  Lines       26791    26803   +12     
=======================================
+ Hits        26685    26697   +12     
  Misses         55       55           
  Partials       51       51           
Flag Coverage Δ
unittests 99.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AdamDrewsTR AdamDrewsTR changed the title Optimize hot-path functions to reduce allocations and improve throughput Optimize hot-path functions to reduce allocations and improve throughput (1) May 12, 2026
@AdamDrewsTR AdamDrewsTR changed the title Optimize hot-path functions to reduce allocations and improve throughput (1) Optimize hot-path functions to reduce allocations and improve throughput (1A) May 12, 2026
@AdamDrewsTR AdamDrewsTR changed the title Optimize hot-path functions to reduce allocations and improve throughput (1A) Optimize hot-path functions to reduce allocations and improve throughput (A1) May 12, 2026
@xuri xuri added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 13, 2026
@xuri xuri moved this to Performance in Excelize v2.11.0 May 21, 2026
@xuri xuri changed the title Optimize hot-path functions to reduce allocations and improve throughput (A1) Optimize the coordinates conversion and some internal functions performance May 21, 2026
Copy link
Copy Markdown
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I've made some update based on your branch.

@xuri xuri merged commit 7240c79 into qax-os:master May 22, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Performance

Development

Successfully merging this pull request may close these issues.

2 participants