Skip to content

v0.3.1

Latest

Choose a tag to compare

@mthrok mthrok released this 16 Apr 14:47
· 14 commits to main since this release
2142c8a

SPDL v0.3.1 Release Notes

Highlights

PathVariants — conditional routing in pipelines — The new PathVariants building block lets you route each item to one of N processing paths based on a router function, then merges all path outputs back into a single stream. Primary use case: caching — route items to either a full processing path or a cache-lookup shortcut. Supports nesting for hierarchical routing. Use it via PipelineBuilder.path_variants(router, paths, name=). (#1322)

Background tasks in Pipeline — Pipelines now support running background tasks alongside the main data pipeline via the BackgroundTask / BackgroundTaskFactory abstractions. A built-in ProcessGroupStatsMonitor is included, which tracks CPU, RSS, and network I/O across all PIDs in the same process group — useful for per-rank monitoring with torchrun. (#1319, #1335, #1336)

Aggregate pipe optimization — The aggregate stage now bulk-drains the input queue using get_nowait() to reduce context switch overhead, and stops immediately on aggregator emit to ensure proper backpressure. (#1310)

New Features

  • Add PathVariants pipeline building block for conditional routing with PathVariantsConfig and PipelineBuilder.path_variants() API (#1322)
  • Add background task support to Pipeline with BackgroundTask, BackgroundTaskFactory, get_default_background_tasks(), and set_default_background_tasks() (#1319, #1335)
  • Add ProcessGroupStatsMonitor background task for tracking CPU, RSS, and network I/O across process groups (#1336, #1350)
  • Optimize aggregate pipe to reduce context switch overhead via bulk queue draining (#1310)
  • Attach pipeline config to error output when build fails, improving debuggability (#1330)

Bug Fixes

Other Changes

  • Replace DEF_DPtr preprocessor macro with DPtr<T, auto DeleteFunc> C++17 class template in libspdl, eliminating per-file macro invocations (#1345)
  • Misc libspdl fixes and improvements (#1344)
  • Overhaul pipeline node internals: split _Node into specialized _Node, _FanInNode, _FanOutNode dataclasses with explicit input/output queues; replace _PipeConfigBase base class with TypeAlias union for exhaustive type checking; introduce _SourceNode for stricter typing (#1320, #1321, #1325, #1326, #1328, #1329, #1331)

Documentation

  • Add architecture overview of SPDL Pipeline (#1341)
  • Add architecture overview of libspdl (#1338)
  • Add docstrings (#1337)
  • Add LLM fine-tuning example using SPDL data pipeline (#1347)