Skip to content

Release v0.7.0 — counter-partitioned multi-chip RNG#50

Merged
scttfrdmn merged 1 commit intomainfrom
feat/release-0.7.0
Apr 28, 2026
Merged

Release v0.7.0 — counter-partitioned multi-chip RNG#50
scttfrdmn merged 1 commit intomainfrom
feat/release-0.7.0

Conversation

@scttfrdmn
Copy link
Copy Markdown
Collaborator

v0.7.0 release PR

Closes #20 (CPU-testable items; hardware validation items captured in neuron-marked tests pending manual trn1.32xlarge run).

Changes

  • pyproject.toml: 0.6.00.7.0
  • CHANGELOG.md: [0.7.0] - 2026-04-27 entry covering:
    • Generator partition API (partition_rank, partition_size, advance, position, advance_to)
    • Dispatch wiring for uniform/normal/exponential + _into variants
    • ProgramBuilder / GeneratorProgram partition context
    • Test coverage: 43 CPU tests, nki_simulator equivalence suite, neuron scaling tests
    • Architecture notes: counter unit distinction (4-sample CPU blocks vs 512-sample NKI batches), NEFF compile-time neutrality
    • Deferred: gamma/beta/chi_squared/truncated_normal partition wiring; hardware profiler/scaling numbers

What's in this release (PRs merged to main)

PR Description
#47 Generator partition API (advance, position, advance_to, _chip_counter_offset)
#48 Dispatch wiring: distributions + ProgramBuilder/GeneratorProgram
#49 NKI partition equivalence and hardware scaling tests

Phase 4 (issue #20): bit-exact reproducibility across chip counts as a
first-class API property. A 1-chip and P-chip run with the same seed produce
the same combined stream byte-for-byte with zero cross-chip coordination.

- Generator partition API: partition_rank/size, advance, position,
  advance_to, _chip_counter_offset, _advance_by_elements (#47)
- Dispatch wiring: uniform/normal/exponential + _into variants route
  counter_offset through Generator partition state; ProgramBuilder and
  GeneratorProgram apply per-step partition offsets (#48)
- Simulator + hardware tests: 43 CPU tests, nki_simulator partition
  equivalence suite, neuron scaling/profiler tests (#49)

Deferred: gamma/beta/chi_squared/truncated_normal partition wiring
(data-dependent batch counts); hardware validation run pending.
@scttfrdmn scttfrdmn merged commit 72164da into main Apr 28, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 4 — stream-partitioned RNG across NeuronCores

1 participant