Skip to content

Commit 910cbeb

Browse files
authored
fix: allocator EOA, SNOD sorting, B-tree key comparison (Issue #28, #29) (#31)
* fix: allocator EOA, SNOD sorting, B-tree key comparison (Issue #28, #29) - Bug A: update allocator after OHDR grows from attribute addition, so superblock EOA covers entire file (h5dump/h5py rejected files) - Bug B: B-tree v1 right key now uses lexicographically largest name (strcmp comparison per H5Gnode.c), not numerically largest heap offset - Bug C: SNOD entries sorted by name after insertion (per H5G__node_insert) - ObjectHeaderSizeFromParsed supports both v1 and v2 headers - 4 regression tests covering all three bugs * chore: fix pre-existing gosec G115 and unused nolint directives Suppress safe integer conversions (signed-to-unsigned for binary serialization) and remove stale nolint directives that no longer match current golangci-lint rules. * style: fix gofmt alignment in messages_write.go
1 parent 2f92dd0 commit 910cbeb

File tree

10 files changed

+337
-23
lines changed

10 files changed

+337
-23
lines changed

CHANGELOG.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,52 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
---
99

10+
## [v0.13.11] - 2026-03-14
11+
12+
### Bug Fix
13+
14+
#### Write interoperability: attributes, SNOD sorting, B-tree keys (Issue #28, #29)
15+
16+
Fixed 3 bugs that caused files with groups + attributes or multiple root-level datasets
17+
to be unreadable by h5dump, h5ls, and h5py.
18+
19+
**Bug A: Superblock EOA too small after adding attributes**
20+
21+
Adding an attribute to a dataset inside a group grew the V2 object header beyond the
22+
allocator's tracked end-of-file. The superblock EOA was not updated, causing h5dump/h5py
23+
to reject the file with "truncated file" or "actual len exceeds EOA" errors.
24+
25+
Fix: after writing the modified object header, compare its new end address with the
26+
allocator's EOF and advance the allocator if needed. `ObjectHeaderSizeFromParsed()` now
27+
supports both v1 and v2 headers for future-proofing.
28+
29+
**Bug B: B-tree v1 right key used numeric offset instead of lexicographic comparison**
30+
31+
Per C reference (`H5Gnode.c:340-373`, `H5G__node_cmp2`), B-tree v1 group node keys are
32+
compared by looking up strings in the local heap and using `strcmp`. Our code set the right
33+
key to the numerically largest heap offset, which is not necessarily the offset of the
34+
lexicographically largest name. This caused h5dump/h5ls to miss entries whose names sorted
35+
after the right key's name.
36+
37+
Fix: iterate all SNOD entries, resolve names from the local heap, and select the offset of
38+
the string that sorts last by Go `>` comparison (equivalent to `strcmp`).
39+
40+
**Bug C: Symbol table node entries not sorted by name**
41+
42+
Per C reference (`H5Gnode.c:573-591`, `H5G__node_insert`), the C library uses binary search
43+
with `strncmp` to find the insertion point, keeping entries sorted at all times. Our code
44+
appended entries in insertion order via `AddEntry`. When datasets were created in
45+
non-alphabetical order (e.g., `/uint` before `/float`), h5dump/h5ls could not find them.
46+
47+
Fix: `sort.Slice` SNOD entries by name after each insertion in `linkToParent()`.
48+
49+
**Validation**: 4 regression tests added. All 8 h5dump test scenarios pass. Full test suite
50+
green (all packages).
51+
52+
Reported by [@vrv-bit](https://github.com/vrv-bit) in Issue #28 and Issue #29.
53+
54+
---
55+
1056
## [v0.13.10] - 2026-03-06
1157

1258
### 🐛 Bug Fix

ROADMAP.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
> **Strategic Advantage**: We have official HDF5 C library as reference implementation!
44
> **Approach**: Port proven algorithms, not invent from scratch - Senior Go Developer mindset
55
6-
**Last Updated**: 2026-03-06 | **Current Version**: v0.13.10 | **Strategy**: HDF5 2.0.0 compatible → security hardened → v1.0.0 LTS | **Milestone**: v0.13.10 RELEASED! (2026-03-06) → v1.0.0 LTS (Q3 2026)
6+
**Last Updated**: 2026-03-14 | **Current Version**: v0.13.11 | **Strategy**: HDF5 2.0.0 compatible → security hardened → v1.0.0 LTS | **Milestone**: v0.13.11 RELEASED! (2026-03-14) → v1.0.0 LTS (Q3 2026)
77

88
---
99

@@ -58,6 +58,8 @@ v0.13.8 (HOTFIX - EOA compatibility) ✅ RELEASED 2026-03-04
5858
v0.13.9 (HOTFIX - V2 object header checksum) ✅ RELEASED 2026-03-04
5959
↓ (interoperability fix)
6060
v0.13.10 (BUGFIX - h5dump/h5ls/h5py interop) ✅ RELEASED 2026-03-06
61+
↓ (attribute + SNOD + B-tree key fixes)
62+
v0.13.11 (HOTFIX - write interop: attrs, sorting, keys) ✅ RELEASED 2026-03-14
6163
↓ (maintenance continues)
6264
v0.13.x (maintenance phase) → Stable maintenance, bug fixes, minor enhancements
6365
↓ (6-9 months production validation)

attribute_write.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,21 @@ func writeCompactAttribute(fw *FileWriter, objectAddr uint64, oh *core.ObjectHea
294294
return fmt.Errorf("failed to write object header: %w", err)
295295
}
296296

297+
// 6. Update allocator if the object header grew beyond currently tracked EOF.
298+
// Adding an attribute message increases the OHDR size. If the OHDR is at the
299+
// end of the file, the extra bytes extend past what the allocator knows about.
300+
// Without this, the superblock EOA will be too small and h5dump/h5py will
301+
// reject the file ("actual len exceeds EOA").
302+
newHeaderSize := core.ObjectHeaderSizeFromParsed(oh)
303+
objectHeaderEnd := objectAddr + newHeaderSize
304+
allocator := fw.writer.Allocator()
305+
if allocator.EndOfFile() < objectHeaderEnd {
306+
bytesToAdvance := objectHeaderEnd - allocator.EndOfFile()
307+
if _, allocErr := allocator.Allocate(bytesToAdvance); allocErr != nil {
308+
return fmt.Errorf("failed to advance allocator past grown object header: %w", allocErr)
309+
}
310+
}
311+
297312
return nil
298313
}
299314

dataset_read_hyperslab.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -760,7 +760,7 @@ func generateChunkCoordinates(first, last []uint64) [][]uint64 {
760760
// Calculate total number of chunks
761761
totalChunks := 1
762762
for i := 0; i < ndims; i++ {
763-
totalChunks *= int(last[i] - first[i] + 1)
763+
totalChunks *= int(last[i] - first[i] + 1) //nolint:gosec // G115: chunk count bounded by dataset dimensions
764764
}
765765

766766
result := make([][]uint64, 0, totalChunks)

dataset_write.go

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ func (h *arrayTypeHandler) GetInfo(config *datasetConfig) (*datatypeInfo, error)
238238
// Calculate total array size (product of all dimensions * element size)
239239
arraySize := uint32(1)
240240
for _, dim := range config.arrayDims {
241-
arraySize *= uint32(dim)
241+
arraySize *= uint32(dim) //nolint:gosec // G115: array dims bounded by HDF5 format limits
242242
}
243243
arraySize *= baseInfo.size
244244

@@ -320,13 +320,13 @@ func (h *enumTypeHandler) EncodeDatatypeMessage(info *datatypeInfo) ([]byte, err
320320
offset := i * int(info.baseType.size)
321321
switch info.baseType.size {
322322
case 1:
323-
valueBytes[offset] = byte(val)
323+
valueBytes[offset] = byte(val) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
324324
case 2:
325-
binary.LittleEndian.PutUint16(valueBytes[offset:], uint16(val))
325+
binary.LittleEndian.PutUint16(valueBytes[offset:], uint16(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
326326
case 4:
327-
binary.LittleEndian.PutUint32(valueBytes[offset:], uint32(val))
327+
binary.LittleEndian.PutUint32(valueBytes[offset:], uint32(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
328328
case 8:
329-
binary.LittleEndian.PutUint64(valueBytes[offset:], uint64(val))
329+
binary.LittleEndian.PutUint64(valueBytes[offset:], uint64(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
330330
}
331331
}
332332

@@ -1349,7 +1349,7 @@ func (dw *DatasetWriter) writeVLen(data interface{}) error {
13491349
// Convert sequence to bytes (little-endian)
13501350
seqBytes := make([]byte, len(seq)*4)
13511351
for j, val := range seq {
1352-
binary.LittleEndian.PutUint32(seqBytes[j*4:], uint32(val))
1352+
binary.LittleEndian.PutUint32(seqBytes[j*4:], uint32(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
13531353
}
13541354

13551355
// Write to global heap
@@ -1369,7 +1369,7 @@ func (dw *DatasetWriter) writeVLen(data interface{}) error {
13691369
for i, seq := range v {
13701370
seqBytes := make([]byte, len(seq)*8)
13711371
for j, val := range seq {
1372-
binary.LittleEndian.PutUint64(seqBytes[j*8:], uint64(val))
1372+
binary.LittleEndian.PutUint64(seqBytes[j*8:], uint64(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
13731373
}
13741374

13751375
heapID, err := dw.fileWriter.globalHeapWriter.WriteToGlobalHeap(seqBytes)
@@ -1650,7 +1650,7 @@ func encode1ByteIntegers(data interface{}, buf []byte) ([]byte, error) {
16501650
switch v := data.(type) {
16511651
case []int8:
16521652
for i, val := range v {
1653-
buf[i] = byte(val)
1653+
buf[i] = byte(val) //nolint:gosec // G115: intentional int8-to-byte for serialization
16541654
}
16551655
case []uint8:
16561656
copy(buf, v)
@@ -1665,7 +1665,7 @@ func encode2ByteIntegers(data interface{}, buf []byte) ([]byte, error) {
16651665
switch v := data.(type) {
16661666
case []int16:
16671667
for i, val := range v {
1668-
binary.LittleEndian.PutUint16(buf[i*2:], uint16(val))
1668+
binary.LittleEndian.PutUint16(buf[i*2:], uint16(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
16691669
}
16701670
case []uint16:
16711671
for i, val := range v {
@@ -1682,7 +1682,7 @@ func encode4ByteIntegers(data interface{}, buf []byte) ([]byte, error) {
16821682
switch v := data.(type) {
16831683
case []int32:
16841684
for i, val := range v {
1685-
binary.LittleEndian.PutUint32(buf[i*4:], uint32(val))
1685+
binary.LittleEndian.PutUint32(buf[i*4:], uint32(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
16861686
}
16871687
case []uint32:
16881688
for i, val := range v {
@@ -1699,7 +1699,7 @@ func encode8ByteIntegers(data interface{}, buf []byte) ([]byte, error) {
16991699
switch v := data.(type) {
17001700
case []int64:
17011701
for i, val := range v {
1702-
binary.LittleEndian.PutUint64(buf[i*8:], uint64(val))
1702+
binary.LittleEndian.PutUint64(buf[i*8:], uint64(val)) //nolint:gosec // G115: intentional signed-to-unsigned for serialization
17031703
}
17041704
case []uint64:
17051705
for i, val := range v {

group_write.go

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package hdf5
22

33
import (
44
"fmt"
5+
"sort"
56
"strings"
67

78
"github.com/scigolib/hdf5/internal/core"
@@ -294,6 +295,8 @@ func parsePath(path string) (parent, name string) {
294295
//
295296
// Returns:
296297
// - error: If linking fails
298+
//
299+
//nolint:gocognit,gocyclo,cyclop // Complex but necessary: sorted insertion + string-based B-tree key update
297300
func (fw *FileWriter) linkToParent(parentPath, childName string, childAddr uint64) error {
298301
// Get parent group metadata
299302
var heapAddr, stNodeAddr, btreeAddr uint64
@@ -342,6 +345,26 @@ func (fw *FileWriter) linkToParent(parentPath, childName string, childAddr uint6
342345
return fmt.Errorf("add entry to symbol table: %w", err)
343346
}
344347

348+
// Step 4b: Sort SNOD entries by name (HDF5 format requirement).
349+
// The C library expects symbol table entries sorted by strcmp on the name
350+
// looked up from the local heap. Without sorting, h5dump/h5ls fail when
351+
// entries are added in non-alphabetical order.
352+
sort.Slice(stNode.Entries, func(i, j int) bool {
353+
ni, nj := stNode.Entries[i].LinkNameOffset, stNode.Entries[j].LinkNameOffset
354+
var si, sj string
355+
if ni == nameOffset {
356+
si = childName
357+
} else {
358+
si, _ = heap.GetString(ni)
359+
}
360+
if nj == nameOffset {
361+
sj = childName
362+
} else {
363+
sj, _ = heap.GetString(nj)
364+
}
365+
return si < sj
366+
})
367+
345368
// Step 5: Write updated heap
346369
if err := heap.WriteTo(fw.writer, heapAddr); err != nil {
347370
return fmt.Errorf("write heap: %w", err)
@@ -353,13 +376,33 @@ func (fw *FileWriter) linkToParent(parentPath, childName string, childAddr uint6
353376
return fmt.Errorf("write symbol table: %w", err)
354377
}
355378

356-
// Step 7: Update B-tree right key (key[1]) to reflect max name offset.
357-
// Per HDF5 spec, B-tree v1 with N children has N+1 keys.
358-
// Key[0] = 0 (left boundary), Key[N] = max name offset (right boundary).
359-
// Without this, h5ls/h5dump cannot find children in the group.
379+
// Step 7: Update B-tree right key (key[1]) to reflect the lexicographically
380+
// largest name's local heap offset. Per HDF5 spec, B-tree v1 group nodes
381+
// compare keys by looking up strings in the local heap and using strcmp.
382+
// The right key must be the offset of the string that sorts LAST, not the
383+
// numerically largest offset. Without this, h5dump/h5ls cannot find entries
384+
// whose names sort after the right key's name.
385+
//
386+
// Note: heap.GetString() reads from heap.Data (on-disk snapshot). The entry
387+
// we just added (at nameOffset) is only in heap.strings (not yet flushed to
388+
// Data), so we use childName directly for that entry.
360389
var maxNameOffset uint64
390+
var maxName string
361391
for _, e := range stNode.Entries {
362-
if e.LinkNameOffset > maxNameOffset {
392+
var entryName string
393+
if e.LinkNameOffset == nameOffset {
394+
// This is the entry we just added — use childName directly
395+
// because heap.Data hasn't been updated yet.
396+
entryName = childName
397+
} else {
398+
var nameErr error
399+
entryName, nameErr = heap.GetString(e.LinkNameOffset)
400+
if nameErr != nil {
401+
continue
402+
}
403+
}
404+
if entryName > maxName {
405+
maxName = entryName
363406
maxNameOffset = e.LinkNameOffset
364407
}
365408
}

internal/core/messages_write.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ func encodeChunkedLayout(chunkDims []uint64, btreeAddress uint64, sb *Superblock
141141

142142
// Chunk dimensions (each 4 bytes, uint32)
143143
for _, dim := range chunkDims {
144-
binary.LittleEndian.PutUint32(buf[offset:], uint32(dim))
144+
binary.LittleEndian.PutUint32(buf[offset:], uint32(dim)) //nolint:gosec // G115: chunk dims bounded by HDF5 format limits
145145
offset += 4
146146
}
147147

@@ -252,8 +252,8 @@ func encodeDatatypeNumeric(dt *DatatypeMessage) ([]byte, error) {
252252
}
253253

254254
properties = make([]byte, 12)
255-
binary.LittleEndian.PutUint16(properties[0:2], 0) // bit_offset = 0
256-
binary.LittleEndian.PutUint16(properties[2:4], uint16(dt.Size*8)) //nolint:gosec // G115: precision bits
255+
binary.LittleEndian.PutUint16(properties[0:2], 0) // bit_offset = 0
256+
binary.LittleEndian.PutUint16(properties[2:4], uint16(dt.Size*8))
257257
properties[4] = epos
258258
properties[5] = esize
259259
properties[6] = mpos
@@ -265,7 +265,7 @@ func encodeDatatypeNumeric(dt *DatatypeMessage) ([]byte, error) {
265265
// uint16: bit_precision (total bits)
266266
properties = make([]byte, 4)
267267
binary.LittleEndian.PutUint16(properties[0:2], 0) // bit_offset = 0
268-
binary.LittleEndian.PutUint16(properties[2:4], uint16(dt.Size*8)) //nolint:gosec // G115: precision bits
268+
binary.LittleEndian.PutUint16(properties[2:4], uint16(dt.Size*8)) //nolint:gosec // G115: dt.Size is 1/2/4/8, max value 64 fits uint16
269269
}
270270

271271
// Build message: header (8 bytes) + properties

internal/core/objectheader_write.go

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -514,6 +514,31 @@ func WriteObjectHeader(w io.WriterAt, addr uint64, oh *ObjectHeader, sb *Superbl
514514
return nil
515515
}
516516

517+
// ObjectHeaderSizeFromParsed calculates the on-disk size of an ObjectHeader
518+
// (as returned by ReadObjectHeader). This is used to determine how much space
519+
// the header occupies after modification (e.g., adding attributes).
520+
// Supports both v1 and v2 object headers.
521+
func ObjectHeaderSizeFromParsed(oh *ObjectHeader) uint64 {
522+
if oh == nil {
523+
return 0
524+
}
525+
if oh.Version != 1 && oh.Version != 2 {
526+
return 0
527+
}
528+
ohw := &ObjectHeaderWriter{
529+
Version: oh.Version,
530+
Flags: oh.Flags,
531+
Messages: make([]MessageWriter, len(oh.Messages)),
532+
}
533+
for i, msg := range oh.Messages {
534+
ohw.Messages[i] = MessageWriter{
535+
Type: msg.Type,
536+
Data: msg.Data,
537+
}
538+
}
539+
return ohw.Size()
540+
}
541+
517542
// RewriteObjectHeaderV2 rewrites an object header v2 with updated messages.
518543
// This handles the case where we need to modify an existing object header
519544
// by reading it, modifying it, and writing it back.

internal/testing/mock_reader.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
// Package testing provides test utilities for HDF5 library testing.
2-
package testing //nolint:revive // internal test utilities, not a public package
2+
package testing
33

44
import "errors"
55

0 commit comments

Comments
 (0)