Skip to content

feat: ChunkIterator API and compression filters (LZF, BZIP2, SZIP)#16

Merged
kolkov merged 4 commits intomainfrom
feat/compression-filters-and-chunk-iterator
Jan 30, 2026
Merged

feat: ChunkIterator API and compression filters (LZF, BZIP2, SZIP)#16
kolkov merged 4 commits intomainfrom
feat/compression-filters-and-chunk-iterator

Conversation

@kolkov
Copy link
Contributor

@kolkov kolkov commented Jan 30, 2026

Summary

  • Add ChunkIterator API for memory-efficient chunk-by-chunk dataset reading (TASK-031)
  • Add LZF compression filter with full read/write support (TASK-027)
  • Add BZIP2 decompression filter (read-only)
  • Add SZIP filter stub with informative error messages
  • Document CVE-2025-2309 and CVE-2025-2308 security review (not affected)
  • Fix LZF decompression for uncompressed/padded chunks

Test plan

  • All existing tests pass
  • New ChunkIterator tests (~400 lines)
  • LZF round-trip tests with real h5py files
  • BZIP2 decompression tests
  • Race detector clean
  • Linter clean (0 issues)

…K-031)

- Add ChunkIterator type with Next/Chunk/Err pattern (like bufio.Scanner)
- Add ChunkIteratorWithContext for cancellation support
- Add progress callbacks and reset functionality
- Fix B-tree key format to include nbytes field (HDF5 spec compliance)
- Fix chunked writer to update B-tree address in layout message
- Update README.md and CHANGELOG.md with new feature
LZF (ID 32000): Pure Go read+write, h5py/PyTables compatible
BZIP2 (ID 307): stdlib read, write stub
SZIP (ID 4): stub with descriptive errors (requires libaec)
Both CVEs confirmed NOT AFFECTED:
- CVE-2025-2309 (bitfield): data conversion not implemented
- CVE-2025-2308 (scale-offset): filter not implemented

Updated feature parity for LZF/BZIP2/SZIP filters.
- Handle uncompressed data when compression doesn't help (size == expected)
- Pad output with zeros when decompressed size < expected (sparse chunks)
- Add empty data check in applyLZF

Tested with h5ex_d_lzf.h5 - now reads correctly.
@codecov
Copy link

codecov bot commented Jan 30, 2026

@kolkov kolkov merged commit 5191c01 into main Jan 30, 2026
6 checks passed
@kolkov kolkov deleted the feat/compression-filters-and-chunk-iterator branch January 30, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant