Demonstrates Global Heap and variable-length string support
- Opening files with variable-length strings
- Global Heap functionality
- String storage architecture
- Address tracking
go run main.goOpening HDF5 file with variable-length strings: ../testdata/vlen_strings.h5
File opened successfully. Superblock version: 2
Offset size: 8 bytes
Length size: 8 bytes
=== Dataset: /vlen_strings ===
Address: 0x800
✓ Global Heap support implemented!
- ParseGlobalHeapReference: Extracts heap address + object index
- ReadGlobalHeapCollection: Loads heap collection from file
- GetObject: Retrieves string data from heap
Variable-length string support is ready! 🎯
Global Heap is HDF5's storage mechanism for variable-length data:
- Variable-length strings: Different string lengths
- Variable-length arrays: Different array sizes
- Object references: Pointers to other objects
Dataset → Global Heap Reference → Global Heap → String Data
Example:
Dataset contains:
Reference 1: {heap_addr: 0x1000, object_index: 0}
Reference 2: {heap_addr: 0x1000, object_index: 1}
Global Heap at 0x1000:
Object 0: "Hello World"
Object 1: "Variable Length String"
// Extract heap address and object index from reference bytes
heapAddr, objIndex := ParseGlobalHeapReference(refBytes, superblock)// Read entire Global Heap collection from file
collection := ReadGlobalHeapCollection(file, heapAddr, superblock)// Retrieve specific object data
stringData := collection.GetObject(objIndex)// When reading datasets with vlen strings:
strings, err := ds.ReadStrings()
if err != nil {
log.Fatal(err)
}
// strings is []string with different lengths:
// ["short", "a much longer string", "x"]// Compound type with vlen string field:
// {
// "id": int32,
// "name": variable-length string
// }
compounds, err := ds.ReadCompound()
// Each compound contains string field resolved via Global Heap| Type | Storage | Example |
|---|---|---|
| Fixed | In dataset directly | All strings padded to 20 bytes |
| Variable | Global Heap references | Each string has natural length |
Python h5py Example:
import h5py
with h5py.File('strings.h5', 'w') as f:
# Fixed-length (20 bytes each)
dt = h5py.string_dtype(encoding='utf-8', length=20)
f.create_dataset('fixed', data=['hello', 'world'], dtype=dt)
# Variable-length (via Global Heap)
dt = h5py.string_dtype(encoding='utf-8')
f.create_dataset('variable', data=['short', 'much longer string'], dtype=dt)+------------------------+
| Signature: "GCOL" |
+------------------------+
| Version |
+------------------------+
| Collection size |
+------------------------+
| Object 0 offset |
| Object 0 size |
| Object 0 data |
+------------------------+
| Object 1 offset |
| Object 1 size |
| Object 1 data |
+------------------------+
| ... |
+------------------------+
8 or 16 bytes depending on offset size:
Bytes 0-7/15: Global Heap address
Bytes 8-11/16-19: Object index
Cause: Very old implementation or custom vlen format.
Solution: File an issue with your HDF5 file for investigation.
Cause: Encoding mismatch (ASCII vs UTF-8).
Solution: Verify encoding with h5dump:
h5dump -d /dataset file.h5- Example 05 - Complete feature demo
- Datatypes Guide - String type details
- Reading Data Guide - String reading
Part of the HDF5 Go Library v0.10.0-beta