The change log file hosting all releases with lists of new features and breaking changes. Best viewed here.
-
New features
- Add
reseed_each_epochoption toMapDataset.repeatthat allows to replay the first epoch exactly if set to False (True by default). - Introduces
grain.experimental.RebatchIterDatasetfor efficient rebatch. - Migrates data loader to use dataset API under the hood.
- Add
-
Breaking changes:
- SliceMapDataset updated to use the full index relative to the parent dataset, instead index%len(self).
-
Deprecations:
- Graduate
grain.experimental.apply_transformationstograin.{MapDataset|IterDataset}.apply. The experimental API will soon be deprecated.
- Graduate
-
Bug fixes
-
New features:
- Adds Windows build.
- Allow passing
read_kwargstoParquetIterDatasetfor configuring parquet file reading. ThreadPrefetchDatasetIteratornow supports non-Grain iterators that support checkpointing.- Introduces API for device prefetch -
grain.experimental.device_put()for easy CPU and device prefetching. - Introduces API for autotuning -- given the user provided RAM restrictions
and specific
IterDataset, finds number of processes formp_prefetchand buffer size forPrefetchDatasetIterator. - Allow passing
reader_optionstoArrayRecordDataSourcefor configuring array record file reading. - Introduces
grain.experimental.batch_and_padfor padding a partial batch to avoid dropping batch remainder data. - Grain interleave optimization - allow creating more threads to parallelly keep starting iterators and prefetching elements.
- Allow for alternative slicing of the data for
MultiprocessPrefetchIterDataset. New slicing allows each worker process to read unique file shards and thus improving performance.
-
Breaking changes:
- Upgrades
array_recordandprotobuf.
- Upgrades
-
Deprecations:
-
Bug fixes
- New features:
- Automatic publishing releases to PyPI via GitHub actions.
- Nightly builds.
- Introduced changelog.