Add dimension placeholder for accurate streaming worksheet dimensions (B3)#2323
Open
AdamDrewsTR wants to merge 6 commits into
Open
Add dimension placeholder for accurate streaming worksheet dimensions (B3)#2323AdamDrewsTR wants to merge 6 commits into
AdamDrewsTR wants to merge 6 commits into
Conversation
…able thresholds - Add bufio.Writer layer after temp file threshold is crossed, reducing disk write syscalls from per-Write to per-buffer-full (default 128 KiB) - Add scratch [24]byte field for strconv.Append* to avoid heap allocations in WriteInt, WriteUint, and WriteFloat helper methods - Track total bytes written for offset-based operations (WriteAt) - Add WriteAt method for updating data at specific offsets in both in-memory and temp file modes - Add CopyTo method with large read buffer to minimize Pread syscalls when copying data to ZIP writers - Add Bytes method for direct access to in-memory buffer contents - Add Reset method for clearing state without closing the temp file - Add StreamingChunkSize option to control when streaming spills to disk (0 = default 16 MiB, -1 = never spill, keeps all data in memory) - Add StreamingBufSize option to control bufio.Writer size aft- Add StreamingBufSize option to control bufio.Writer size aft- Add StreamingBufSize option to control bufio.Writer size aft- Add Streaminak memory usage during streaming writes
- Write a fixed-width dimension placeholder in NewStreamWriter at the position where the <dimension> element belongs, before any row data is written - Track maxCol during SetRow to compute the actual data range - In Flush, overwrite the placeholder with the real dimension (e.g. <dimension ref="A1:D100"/>) using bufferedWriter.WriteAt, padded to match the placeholder length - This ensures Excel and other readers see the correct data range without requiring a full file rewrite, even for streaming worksheets that spill to temp files on disk
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2323 +/- ##
========================================
Coverage 99.60% 99.60%
========================================
Files 32 32
Lines 26791 26908 +117
========================================
+ Hits 26685 26802 +117
Misses 55 55
Partials 51 51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cover WriteInt, WriteUint, WriteFloat, Bytes, WriteAt, CopyTo, and Reset methods in both in-memory and temp-file (bio) code paths. Test NewStreamWriter with custom StreamingChunkSize (-1, positive) and StreamingBufSize options.
Contributor
Author
|
This will need to be rebased after #2321 is merged. |
Add tests for IO error scenarios when the underlying temp file is closed or broken: - TestBufferedWriterWriteAtFlushError: Flush fails in WriteAt - TestBufferedWriterCopyToFlushError: Flush fails in CopyTo - TestBufferedWriterCopyToSeekError: Seek fails in CopyTo - TestBufferedWriterSyncWriteToError: WriteTo fails in Sync This brings diff coverage from 95.56% to 100%.
Add SheetStats type and GetSheetStats() to retrieve worksheet statistics populated during streaming writes. The StreamWriter now tracks cellCount during SetRow and calls storeSheetStats() on Flush() to cache stats (Rows, Cols, Cells, MaxCell) for downstream consumers. This provides zero-cost statistics collection during streaming that external code can access via GetSheetStats() without re-reading the sheet.
- Add TestStreamWriterDimensionOverflow to exercise updateDimension WriteAt error by setting dimensionOffset beyond buffer bounds - Remove unreachable dimension length mismatch check (CoordinatesToCellName enforces max row/column limits, so dim never exceeds placeholder length)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a dimension placeholder mechanism to
StreamWriterso that the<dimension>element in streaming worksheets accurately reflects the data range, without requiring a full file rewrite.Depends on: #2321 (buffered-writer enhancements — for
WriteAtandwrittentracking)Problem
The current
StreamWriterwritesbulkAppendFields(&sw.rawData, sw.worksheet, 3, 4)which includes the dimension element from the (empty) worksheet template. This means streaming worksheets always have an empty or incorrect dimension. While Excel recalculates dimensions on open, other XLSX readers (Google Sheets, LibreOffice, programmatic parsers) may rely on the<dimension>element for range detection.Solution
Placeholder: A fixed-width string
<dimension ref="A1:XFD1048576"/>(32 bytes) is written at a known byte offset duringNewStreamWriter, after the<sheetPr>element but before<sheetViews>.Tracking:
SetRowtracksmaxCol— the maximum column index seen across all rows.Overwrite: In
Flush, after all XML is written and the buffer/file is complete,updateDimension()computes the actual dimension (e.g.A1:D100) and usesWriteAtto overwrite the placeholder in-place. The replacement is right-padded with spaces to match the placeholder length exactly, since XML ignores trailing whitespace in element content.Why fixed-width?
The
<dimension>element must appear before<sheetData>in the XLSX schema, but we don't know the final data range until all rows are written. A fixed-width placeholder avoids shifting all subsequent byte offsets —WriteAtsimply overwrites the exact bytes without moving anything.The placeholder uses the maximum possible cell reference (
XFD1048576) so any real dimension string is guaranteed to be shorter or equal in length.Compatibility
All existing tests pass. The dimension element is optional per ECMA-376 §18.3.1.35 — Excel recalculates it on open regardless. This change improves interoperability with readers that do honor the element.