Skip to content

Conversation

@deepakn94
Copy link
Contributor

No description provided.

Copy link

@jkamalu jkamalu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming and scope of certain magic numbers and then some nits (which you can choose to ignore)

bin_buffer_file.readinto(sequence)
return sequence

NUM_MAX_RETRIES = 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these in all caps? Why don't we make these init args, or at least class args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, though I didn't thread the two new arguments through all the classes from the DatasetConfig class since I think that's overkill.

@ericharper ericharper added the Final Review Apply this label to indicate that your PR is ready for final review. label Jan 9, 2026
@ericharper ericharper enabled auto-merge January 10, 2026 00:10
@ericharper ericharper added this pull request to the merge queue Jan 10, 2026
Merged via the queue into NVIDIA:main with commit 1f6edf0 Jan 10, 2026
48 of 52 checks passed
maanug-nv pushed a commit to maanug-nv/Megatron-LM that referenced this pull request Jan 10, 2026
…-application fault tolerance (NVIDIA#2836)

Signed-off-by: Deepak Narayanan <[email protected]>
@deepakn94 deepakn94 deleted the dnarayanan/dataloader_retry_loop branch January 10, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: low Final Review Apply this label to indicate that your PR is ready for final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants