Quick answers to common questions about the HDF5 Go library
- General Questions
- Features and Capabilities
- Performance
- Compatibility
- Development and Contributing
- Roadmap and Future
This is a pure Go implementation of the HDF5 file format for reading and writing HDF5 files. It requires no CGo or C dependencies, making it fully cross-platform and easy to deploy.
Advantages:
- ✅ Pure Go - No CGo, no C dependencies
- ✅ Cross-platform - Works on any Go-supported platform (Windows, Linux, macOS, ARM, etc.)
- ✅ Easy deployment - Single binary, no library dependencies
- ✅ Modern - Built with Go 1.25+ best practices
- ✅ Actively maintained - Regular updates and improvements
Trade-offs:
⚠️ Write support advancing⚠️ Some advanced features missing - Virtual datasets, parallel I/O, SWMR (planned for future releases)⚠️ Slightly slower - Pure Go is 2-3x slower than C for some operations (but fast enough for most use cases)
Pure Go Benefits:
- Cross-compilation - Compile for any platform from any platform
- No dependencies - Users don't need HDF5 C library installed
- Easier deployment - Single static binary
- Better debugging - Go debugger works perfectly
- Memory safety - No C memory management issues
- Simpler build - Just
go build, no complex makefiles
CGo Drawbacks:
- Requires HDF5 C library installation
- Complex cross-compilation
- Potential memory leaks at Go/C boundary
- Slower function calls across CGo boundary
- Harder to debug
Decision: Pure Go provides better user experience with acceptable performance for both reading and writing. See ADR-001 for details.
For reading: Feature-complete! ✅ Production-ready for reading HDF5 files.
For writing: Advancing rapidly! ✅
Read Support:
- ✅ All datatypes (integers, floats, strings, compounds, arrays, enums, references, opaque)
- ✅ All dataset layouts (compact, contiguous, chunked)
- ✅ GZIP compression
- ✅ Groups and hierarchies
- ✅ Attributes (compact and dense)
- ✅ Both old (pre-1.8) and modern (1.8+) HDF5 files
Write Support:
- ✅ Datasets (contiguous/chunked/compact layouts, all datatypes)
- ✅ Dataset resizing with unlimited dimensions
- ✅ Variable-length datatypes (strings, ragged arrays)
- ✅ Groups (symbol table format)
- ✅ Attributes (dense & compact storage, RMW operations)
- ✅ Compression (GZIP, Shuffle, Fletcher32)
- ✅ Advanced datatypes (arrays, enums, references, opaque)
- ✅ Links (hard links full, soft/external MVP)
Read Enhancements:
- ✅ Hyperslab selection (efficient data slicing) - 10-250x faster!
Quality metrics:
- Test coverage: 86.1%
- Lint issues: 0 (34+ linters)
- 57 reference test files
- 200+ test cases
Perfect for:
- 📊 Data scientists reading HDF5 datasets in Go
- 🔬 Researchers processing scientific data (astronomy, climate, physics)
- 🏢 Developers building data analysis tools
- 🚀 DevOps needing cross-platform HDF5 readers
- 🐳 Docker users wanting minimal container dependencies
Not ideal for (yet):
- Applications requiring all advanced HDF5 features (virtual datasets, parallel I/O, SWMR)
- Performance-critical loops requiring C-level speed
- Attribute modification/deletion (write-once only for now)
Yes! Full dataset reading is supported for:
- ✅ Numeric types: int32, int64, float32, float64 →
[]float64 - ✅ Strings: Fixed and variable-length →
[]string - ✅ Compound types: Struct-like data →
[]map[string]interface{}
// Numeric datasets
data, err := ds.Read() // Returns []float64
// String datasets
strings, err := ds.ReadStrings() // Returns []string
// Compound datasets
compounds, err := ds.ReadCompound() // Returns []map[string]interface{}See Reading Data Guide for details.
Yes! Write support advancing rapidly . ✅
What's supported:
// Create new HDF5 file
fw, err := hdf5.CreateForWrite("output.h5", hdf5.CreateTruncate)
if err != nil {
log.Fatal(err)
}
defer fw.Close()
// Create groups
grp, _ := fw.CreateGroup("/experiments")
// Write datasets (all datatypes supported)
ds, _ := fw.CreateDataset("/data", hdf5.Float64, []uint64{100})
ds.Write(myFloat64Data)
// Advanced datatypes
arrDs, _ := fw.CreateDataset("/arrays", hdf5.ArrayFloat32, []uint64{10},
hdf5.WithArrayDims([]uint64{3, 3})) // Array of 3x3 float32
enumDs, _ := fw.CreateDataset("/status", hdf5.EnumInt8, []uint64{5},
hdf5.WithEnumValues([]string{"OK", "ERROR"}, []int64{0, 1}))Current limitations:
- Some advanced filters
Quality: Feature complete write support with 100% HDF5 test suite pass rate!
See ROADMAP.md for future plans.
Yes! Full attribute reading support including variable-length strings:
// Group attributes
attrs, err := group.Attributes()
// Dataset attributes
attrs, err := dataset.Attributes()
// Access attribute values
for _, attr := range attrs {
value, err := attr.ReadValue()
if err != nil {
log.Printf("Error reading %s: %v", attr.Name, err)
continue
}
fmt.Printf("%s = %v (type: %T)\n", attr.Name, value, value)
}Supported:
- ✅ Compact attributes (in object header)
- ✅ Dense attributes (fractal heap direct blocks)
- ✅ All datatypes including variable-length strings (v0.13.4+)
Note: Dense attributes (8+ attributes) fully supported via B-tree v2 and fractal heap.
Fully Supported (Read + Write):
| HDF5 Type | Go Type | Read | Write |
|---|---|---|---|
| H5T_INTEGER | int8-64, uint8-64 | ✅ | ✅ |
| H5T_FLOAT | float32, float64 | ✅ | ✅ |
| H5T_STRING | string | ✅ | ✅ |
| H5T_ARRAY | fixed arrays | ✅ | ✅ |
| H5T_ENUM | named integers | ✅ | ✅ |
| H5T_REFERENCE | object/region refs | ✅ | ✅ |
| H5T_OPAQUE | binary blobs | ✅ | ✅ |
| H5T_COMPOUND | struct-like | ✅ | ✅ |
| H5T_VLEN | variable-length | ✅ | ✅ |
Not Supported:
- H5T_TIME - deprecated in HDF5 since v1.4, never fully implemented
See Datatypes Guide for detailed type mapping.
Supported:
- ✅ GZIP/Deflate (filter ID 1) - Covers 95%+ of files
Not Yet Supported:
- ❌ SZIP (filter ID 2)
- ❌ LZF (filter ID 32000)
- ❌ BZIP2 (filter ID 307)
- ❌ Blosc, LZ4, Zstd (custom filters)
Workaround: Convert files to GZIP:
h5repack -f GZIP=6 input.h5 output.h5GZIP compression fully supported (both reading and writing).
Superblock Versions:
- ✅ Version 0 (HDF5 1.0 - 1.6)
- ❌ Version 1 (rare, not needed)
- ✅ Version 2 (HDF5 1.8+)
- ✅ Version 3 (HDF5 2.0.0 with checksums)
Object Header Versions:
- ✅ Version 1 (pre-HDF5 1.8)
- ✅ Version 2 (HDF5 1.8+)
File Formats:
- ✅ Traditional groups (symbol tables)
- ✅ Modern groups (object headers)
- ✅ Both old and new B-tree formats
Compatibility: Reads and writes files compatible with HDF5 1.0 (1998) through HDF5 2.0.0 (2025). Future HDF5 format updates will be added in subsequent releases.
Yes, with some considerations:
File Size:
- ✅ Files up to several GB work well
- ✅ Files up to 100+ GB can be read (not all loaded into memory at once)
⚠️ Memory usage scales with number of objects, not file size
Large Datasets:
- ✅ Chunked datasets can be any size (read chunk-by-chunk)
⚠️ Entire dataset loaded into memory onRead()(streaming API planned for future releases)
Best Practices:
- Process datasets one at a time
- Use
Walk()efficiently (don't repeat) - Close files promptly
Example:
// Good: Process incrementally
file.Walk(func(path string, obj hdf5.Object) {
if ds, ok := obj.(*hdf5.Dataset); ok {
data, _ := ds.Read()
processData(data) // Process immediately
// data will be garbage collected
}
})Reading Speed: Typically 2-3x slower than C library for raw I/O.
Why acceptable:
- For most applications, I/O is not the bottleneck
- Decompression (GZIP) is already fast in Go
- Easier deployment and maintenance worth the trade-off
- Sufficient for scientific data analysis workflows
Performance:
- C-based libraries (gonum/hdf5, Python h5py): Fast (native C implementation)
- This library: Slower (pure Go implementation)
Expected trade-off:
- Pure Go is typically 2-3x slower than C for I/O-heavy operations
- For most scientific data analysis, file I/O is not the bottleneck
- Decompression (GZIP) and computation dominate processing time
Why pure Go is still worth it:
- Cross-platform deployment (single binary, no dependencies)
- Easier to build, maintain, and distribute
- Sufficient for typical scientific workflows
- Future optimization: SIMD, assembly, better algorithms
Note: Formal benchmarks planned for future releases. Performance varies by operation type, dataset size, and compression.
Currently: No - Each File instance should be used from a single goroutine.
Workaround: Open separate file handles per goroutine:
// Each goroutine opens its own handle
func processDataset(filename string, datasetPath string) {
file, _ := hdf5.Open(filename)
defer file.Close()
// Find and process dataset
// ...
}
// Concurrent processing
var wg sync.WaitGroup
for _, dsPath := range datasetPaths {
wg.Add(1)
go func(path string) {
defer wg.Done()
processDataset("data.h5", path)
}(dsPath)
}
wg.Wait()Future: Full thread-safety with mutexes + SWMR mode planned for future releases.
Current: No - entire dataset read into memory.
Future: Streaming/chunked reading API planned for future releases:
// Future API (not available yet)
reader, _ := ds.ChunkReader()
for reader.Next() {
chunk := reader.Chunk() // Process one chunk at a time
processChunk(chunk)
}Workaround: Use Info() to check size before reading:
info, _ := ds.Info()
fmt.Println(info) // Check "Total size" before reading
// Only read if size is acceptable
if /* size < threshold */ {
data, _ := ds.Read()
processData(data)
}All Go-supported platforms:
- ✅ Windows (7, 10, 11, Server)
- ✅ Linux (Ubuntu, Debian, CentOS, Fedora, Arch, etc.)
- ✅ macOS (Intel and Apple Silicon)
- ✅ FreeBSD, OpenBSD, NetBSD
- ✅ Solaris, AIX
Architectures:
- ✅ amd64 (x86_64)
- ✅ arm64 (Apple Silicon, ARM servers)
- ✅ 386 (32-bit x86)
- ✅ arm (32-bit ARM)
- ✅ All other Go-supported architectures
Pure Go = runs anywhere Go runs!
Yes! Perfect for Docker due to no C dependencies.
Minimal Dockerfile:
FROM golang:1.25-alpine
WORKDIR /app
COPY . .
RUN go get github.com/scigolib/hdf5
RUN go build -o myapp .
CMD ["./myapp"]Benefits:
- No need for HDF5 C library in image
- Smaller image size
- Faster builds
- Cross-platform containers
Yes! Fully compatible with files created by Python h5py.
Tested with:
- h5py versions 2.x and 3.x
- NumPy arrays
- Pandas DataFrames (via to_hdf)
Example:
# Create file with Python
import h5py
import numpy as np
with h5py.File('data.h5', 'w') as f:
f.create_dataset('numbers', data=np.arange(100))
f.create_dataset('strings', data=['hello', 'world'])// Read with Go
file, _ := hdf5.Open("data.h5")
defer file.Close()
file.Walk(func(path string, obj hdf5.Object) {
// Works perfectly!
})Usually yes, if they follow HDF5 standard format.
Tested with:
- ✅ MATLAB (save with '-v7.3' flag)
- ✅ IDL (HDF5 format)
- ✅ NASA HDF5 files
- ✅ Climate/weather model outputs (NetCDF4-based HDF5)
MATLAB Example:
% Save as HDF5 format
data = rand(100, 100);
save('data.mat', 'data', '-v7.3'); % -v7.3 uses HDF5// Read with Go library
file, _ := hdf5.Open("data.mat")
// Works! MATLAB .mat v7.3 is HDF5 formatNote: Some tools add proprietary metadata. Core data reading works, but some metadata may not be fully parsed.
Partially. NetCDF4 is built on HDF5, so basic reading works:
// NetCDF4 files have .nc extension but use HDF5 format
file, err := hdf5.Open("climate.nc")
if err == nil {
// Can read datasets
// NetCDF metadata in attributes
}Limitations: NetCDF-specific conventions (dimensions, coordinate variables) are not interpreted. You'll see raw HDF5 structure.
Future: Dedicated NetCDF4 support may be added in future versions.
Ways to contribute:
- 🐛 Report bugs - Open issues with detailed reproduction
- 💡 Suggest features - Request features via issues or discussions
- 📝 Improve documentation - Fix typos, add examples
- 🔧 Submit pull requests - Add features or fix bugs
- ⭐ Star the project - Show support on GitHub
- 📢 Spread the word - Tell others about the library
Getting started:
- Read CONTRIBUTING.md
- Check open issues
- Join discussions on GitHub
Git Flow:
mainbranch: Stable releases onlydevelopbranch: Active development (default)- Feature branches:
feature/object-header-v1 - Release branches:
release/vX.Y.Z
Process:
- Fork repository
- Create feature branch from
develop - Implement with tests
- Run
golangci-lint run ./... - Ensure
go test ./...passes - Open pull request to
develop
Test Coverage: 76.3% (target: >70%)
Types of tests:
- Unit tests: Test individual functions
- Integration tests: Test with real HDF5 files
- Reference tests: Compare with
h5dumpoutput
Test files:
- 57 reference HDF5 files covering various formats
- Generated with Python h5py for reproducibility
Quality checks:
- 34+ linters (golangci-lint)
- Race detector (
go test -race) - Cross-platform testing (Windows, Linux, macOS)
Multiple levels:
- User guides: docs/guides/ (Installation, Reading Data, etc.)
- Architecture docs: docs/architecture/ (How it works)
- API reference: GoDoc (pkg.go.dev)
- Examples: examples/ (Working code)
- Development docs: docs/dev/ (for contributors, private)
Documentation principles:
- Clear examples
- Explain "why" not just "how"
- Keep up-to-date with code changes
Already Available:
- ✅ File creation with multiple superblock formats (v0, v2, v3)
- ✅ Dataset writing: contiguous and chunked layouts
- ✅ Dataset resizing with unlimited dimensions (NEW!)
- ✅ Variable-length datatypes: strings, ragged arrays (NEW!)
- ✅ Compression: GZIP, Shuffle filter, Fletcher32 checksum
- ✅ Groups: symbol table and dense formats
- ✅ Attributes: compact (0-7) and dense (8+) storage
- ✅ Attribute modification and deletion
- ✅ Links support (hard links full, soft/external MVP)
- ✅ Advanced datatypes: arrays, enums, references, opaque
- ✅ Legacy format support (v0 superblock + Object Header v1)
Read Enhancements:
- ✅ Hyperslab selection (data slicing) - 10-250x faster!
See ROADMAP.md for complete roadmap.
**Completed **:
- ✅ MVP write support
- ✅ Chunked datasets + compression
- ✅ Dense groups and attributes
- ✅ Legacy format support (v0 superblock)
- ✅ Dense storage RMW
- ✅ Attribute modification/deletion
- ✅ Links support (hard links full, soft/external full)
- ✅ Dataset resizing and extension
- ✅ Variable-length datatypes
- ✅ Hyperslab selection (read)
- ✅ Compound datatype writing
Future Enhancements:
- Thread-safety with mutexes
- SWMR (Single Writer Multiple Reader)
- Streaming API for large datasets
- Advanced filters (LZF, SZIP)
- Parallel I/O
- ✅ HDF5 2.0.0 supported (format specification v4.0, superblock v0-v3)
Goal: Minimal breaking changes.
Promise:
- Current reading API will remain stable
- Only additions and optional features
- Deprecations will be announced in advance
Version strategy:
- v0.x.x (current): Stable API, production-ready
- v1.0.0 (future): LTS release with long-term support guarantee
- v0.14.0+ (future): Community-driven enhancements (compression filters, parallel I/O, SWMR mode)
- v1.0.0 (future): Production-ready stable release
See ROADMAP.md for versioning strategy.
Currently: Community support via GitHub issues and discussions.
Future: Commercial support, consulting, and training may be available. Contact via GitHub if interested.
Follow development:
- ⭐ Star the repo: https://github.com/scigolib/hdf5
- 👁️ Watch releases: Get notified of new versions
- 📖 Read CHANGELOG: See what's new
- 💬 Join discussions: Share ideas and feedback
Communication channels:
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Q&A and community (coming soon)
- Release notes: Detailed changelog with each version
- Installation Guide - Setup and verification
- Quick Start Guide - Get started in 5 minutes
- Reading Data Guide - Comprehensive reading guide
- Datatypes Guide - Type conversion details
- Troubleshooting - Common issues and solutions
- ROADMAP - Future plans
- GitHub Issues: https://github.com/scigolib/hdf5/issues
- GitHub Discussions: https://github.com/scigolib/hdf5/discussions
- Documentation: https://github.com/scigolib/hdf5/tree/main/docs
Last Updated: 2025-11-13