Skip to content

Add dimension placeholder for accurate streaming worksheet dimensions (B3)#2323

Open
AdamDrewsTR wants to merge 6 commits into
qax-os:masterfrom
AdamDrewsTR:perf/dimension-placeholder
Open

Add dimension placeholder for accurate streaming worksheet dimensions (B3)#2323
AdamDrewsTR wants to merge 6 commits into
qax-os:masterfrom
AdamDrewsTR:perf/dimension-placeholder

Conversation

@AdamDrewsTR
Copy link
Copy Markdown
Contributor

Summary

Adds a dimension placeholder mechanism to StreamWriter so that the <dimension> element in streaming worksheets accurately reflects the data range, without requiring a full file rewrite.

Depends on: #2321 (buffered-writer enhancements — for WriteAt and written tracking)

Problem

The current StreamWriter writes bulkAppendFields(&sw.rawData, sw.worksheet, 3, 4) which includes the dimension element from the (empty) worksheet template. This means streaming worksheets always have an empty or incorrect dimension. While Excel recalculates dimensions on open, other XLSX readers (Google Sheets, LibreOffice, programmatic parsers) may rely on the <dimension> element for range detection.

Solution

  1. Placeholder: A fixed-width string <dimension ref="A1:XFD1048576"/> (32 bytes) is written at a known byte offset during NewStreamWriter, after the <sheetPr> element but before <sheetViews>.

  2. Tracking: SetRow tracks maxCol — the maximum column index seen across all rows.

  3. Overwrite: In Flush, after all XML is written and the buffer/file is complete, updateDimension() computes the actual dimension (e.g. A1:D100) and uses WriteAt to overwrite the placeholder in-place. The replacement is right-padded with spaces to match the placeholder length exactly, since XML ignores trailing whitespace in element content.

Why fixed-width?

The <dimension> element must appear before <sheetData> in the XLSX schema, but we don't know the final data range until all rows are written. A fixed-width placeholder avoids shifting all subsequent byte offsets — WriteAt simply overwrites the exact bytes without moving anything.

The placeholder uses the maximum possible cell reference (XFD1048576) so any real dimension string is guaranteed to be shorter or equal in length.

Compatibility

All existing tests pass. The dimension element is optional per ECMA-376 §18.3.1.35 — Excel recalculates it on open regardless. This change improves interoperability with readers that do honor the element.

…able thresholds

- Add bufio.Writer layer after temp file threshold is crossed, reducing disk
  write syscalls from per-Write to per-buffer-full (default 128 KiB)
- Add scratch [24]byte field for strconv.Append* to avoid heap allocations
  in WriteInt, WriteUint, and WriteFloat helper methods
- Track total bytes written for offset-based operations (WriteAt)
- Add WriteAt method for updating data at specific offsets in both in-memory
  and temp file modes
- Add CopyTo method with large read buffer to minimize Pread syscalls when
  copying data to ZIP writers
- Add Bytes method for direct access to in-memory buffer contents
- Add Reset method for clearing state without closing the temp file
- Add StreamingChunkSize option to control when streaming spills to disk
  (0 = default 16 MiB, -1 = never spill, keeps all data in memory)
- Add StreamingBufSize option to control bufio.Writer size aft- Add StreamingBufSize option to control bufio.Writer size aft- Add StreamingBufSize option to control bufio.Writer size aft- Add Streaminak memory usage during streaming writes
- Write a fixed-width dimension placeholder in NewStreamWriter at the position
  where the <dimension> element belongs, before any row data is written
- Track maxCol during SetRow to compute the actual data range
- In Flush, overwrite the placeholder with the real dimension (e.g.
  <dimension ref="A1:D100"/>) using bufferedWriter.WriteAt, padded to match
  the placeholder length
- This ensures Excel and other readers see the correct data range without
  requiring a full file rewrite, even for streaming worksheets that spill to
  temp files on disk
@AdamDrewsTR AdamDrewsTR changed the title Add dimension placeholder for accurate streaming worksheet dimensions Add dimension placeholder for accurate streaming worksheet dimensions (4) May 12, 2026
@AdamDrewsTR AdamDrewsTR changed the title Add dimension placeholder for accurate streaming worksheet dimensions (4) Add dimension placeholder for accurate streaming worksheet dimensions (B3) May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.60%. Comparing base (4bebb61) to head (02afea9).

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2323    +/-   ##
========================================
  Coverage   99.60%   99.60%            
========================================
  Files          32       32            
  Lines       26791    26908   +117     
========================================
+ Hits        26685    26802   +117     
  Misses         55       55            
  Partials       51       51            
Flag Coverage Δ
unittests 99.60% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Cover WriteInt, WriteUint, WriteFloat, Bytes, WriteAt, CopyTo, and
Reset methods in both in-memory and temp-file (bio) code paths.
Test NewStreamWriter with custom StreamingChunkSize (-1, positive)
and StreamingBufSize options.
@xuri xuri added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 13, 2026
@AdamDrewsTR
Copy link
Copy Markdown
Contributor Author

This will need to be rebased after #2321 is merged.

Add tests for IO error scenarios when the underlying temp file is
closed or broken:
- TestBufferedWriterWriteAtFlushError: Flush fails in WriteAt
- TestBufferedWriterCopyToFlushError: Flush fails in CopyTo
- TestBufferedWriterCopyToSeekError: Seek fails in CopyTo
- TestBufferedWriterSyncWriteToError: WriteTo fails in Sync

This brings diff coverage from 95.56% to 100%.
Add SheetStats type and GetSheetStats() to retrieve worksheet statistics
populated during streaming writes. The StreamWriter now tracks cellCount
during SetRow and calls storeSheetStats() on Flush() to cache stats
(Rows, Cols, Cells, MaxCell) for downstream consumers.

This provides zero-cost statistics collection during streaming that
external code can access via GetSheetStats() without re-reading the sheet.
- Add TestStreamWriterDimensionOverflow to exercise updateDimension WriteAt
  error by setting dimensionOffset beyond buffer bounds
- Remove unreachable dimension length mismatch check (CoordinatesToCellName
  enforces max row/column limits, so dim never exceeds placeholder length)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants