Skip to content

perf: add FastReadMode with preloaded shared strings and FastRows iterator (C1)#2325

Open
AdamDrewsTR wants to merge 3 commits into
qax-os:masterfrom
AdamDrewsTR:feat/fast-read-mode
Open

perf: add FastReadMode with preloaded shared strings and FastRows iterator (C1)#2325
AdamDrewsTR wants to merge 3 commits into
qax-os:masterfrom
AdamDrewsTR:feat/fast-read-mode

Conversation

@AdamDrewsTR
Copy link
Copy Markdown
Contributor

Summary

Add a FastReadMode option and FastRows iterator for high-performance bulk reading of large spreadsheets.

FastReadMode Option

  • Add FastReadMode bool to Options struct; when enabled, preloads the entire shared strings table (xl/sharedStrings.xml) into a []string slice at first use, trading memory for O(1) per-cell lookups without ReadAt syscalls
  • Add fastSST/fastSSTLoaded fields to File struct for the preloaded table
  • Add preloadSharedStrings: SAX-parses sharedStrings.xml once, building a flat slice indexed by shared string index

FastRows Iterator

  • Add FastRows struct with bufio.Reader-based byte scanning (256KB buffer)
  • Add RowsFast method requiring FastReadMode; returns ErrParameterInvalid otherwise
  • Direct byte scanning for <row>, <c>, <v>, <is> elements without xml.Decoder overhead — typically 30-50% faster than standard Rows
  • Reusable row and cell buffers to minimize allocations
  • Supports shared strings (t="s"), inline strings, and boolean cells
  • colRefToIndex helper for fast column letter parsing

Cell Read Optimizations

  • cellXMLHandler: hand-parse <v> via readCharData instead of DecodeElement, saving ~8M allocs per 100K×20 file; use locals for <f> and <is> to keep the caller's xlsxC on the stack
  • xlsxSI.String(): fast path for simple strings with no rich text runs, avoiding strings.Builder allocation
  • getValueFrom: add fastSST fast path for t="s" cells; skip formattedValue when raw=true or style=0 for shared strings, inline strings, and numeric cells

Usage

f, err := excelize.OpenFile("large.xlsx", excelize.Options{FastReadMode: true})
if err != nil {
    return err
}
defer f.Close()

// Standard Rows with preloaded shared strings
rows, err := f.Rows("Sheet1")

// Or ultra-fast byte-scanning iterator
fastRows, err := f.RowsFast("Sheet1")
defer fastRows.Close()
for fastRows.Next() {
    row := fastRows.Row()
    // Process row
}

FastReadMode option:
- Add FastReadMode bool to Options struct; when enabled, preloads the entire
  shared strings table (xl/sharedStrings.xml) into a []string slice at first
  use, trading memory for O(1) per-cell lookups without ReadAt syscalls
- Add fastSST/fastSSTLoaded fields to File struct for the preloaded table
- Add preloadSharedStrings: SAX-parses sharedStrings.xml once, building a
  flat slice indexed by shared string index

FastRows iterator:
- Add FastRows struct with bufio.Reader-based byte scanning (256KB buffer)
- Add RowsFast method requiring FastReadMode; returns ErrParameterInvalid
  otherwise
- Direct byte scanning for <row>, <c>, <v>, <is> elements without
  xml.Decoder overhead -- typically 30-50% faster than standard Rows
- Reusable row and cell buffers to minimize allocations
- Supports shared strings (type=s), inline strings, and boolean cells
- colRefToIndex helper for fast column letter parsing

Cell read optimizations:
- cellXMLHandler: hand-parse <v> via readCharData instead of DecodeElement,
  saving  saving  saving  saving  saving  saving  saving  saving  saving  saving  sle  saving  saving  saving  saving  saving  saving  saving  saving  saving  sno  saving  saving  saving  saving  saving  saving  saving  saving  sam: a  saving  saving  saving  saving  saving  saving  saving  saving  saw=true or style=0 for shared strings, inline strings, and numeric
  cells
@AdamDrewsTR AdamDrewsTR changed the title perf: add FastReadMode with preloaded shared strings and FastRows iterator perf: add FastReadMode with preloaded shared strings and FastRows iterator (C1) May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.60%. Comparing base (4bebb61) to head (b35eb7b).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2325    +/-   ##
========================================
  Coverage   99.60%   99.60%            
========================================
  Files          32       33     +1     
  Lines       26791    27171   +380     
========================================
+ Hits        26685    27065   +380     
  Misses         55       55            
  Partials       51       51            
Flag Coverage Δ
unittests 99.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xuri xuri added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 13, 2026
Add FastReadMode option and RowsFast() method that provides a high-performance
streaming row reader using byte-level XML parsing. Includes:

- FastRows struct with zero-allocation row iteration
- preloadSharedStrings for efficient shared string table access
- colRefToIndex, readCharData optimizations for cellXMLHandler
- getValueFrom fast paths for fastSST, raw values, and zero-style cells
- Comprehensive test coverage (100% diff coverage)
- Add xlsxC.reset() method to clear cell struct for reuse
- Add cellBuf field to Rows struct for per-cell allocation avoidance
- Add numCols field to learn column count for slice pre-allocation
- Use colRefToIndex fast path when FastReadMode has preloaded SST
- Skip sharedStringsReader call when fastSSTLoaded is true

These optimizations reduce allocations in the traditional Rows.Columns()
path, complementing the FastRows raw parser for read-heavy workloads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants