Skip to content

[Llama 3.1] Updates MLLOG tags#790

Merged
ShriyaRishab merged 3 commits intomlcommons:masterfrom
Elnifio:update-llama31
Apr 8, 2025
Merged

[Llama 3.1] Updates MLLOG tags#790
ShriyaRishab merged 3 commits intomlcommons:masterfrom
Elnifio:update-llama31

Conversation

@Elnifio
Copy link
Contributor

@Elnifio Elnifio commented Apr 3, 2025

No description provided.

@Elnifio Elnifio requested a review from a team as a code owner April 3, 2025 15:58
@github-actions
Copy link

github-actions bot commented Apr 3, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

ShriyaRishab
ShriyaRishab previously approved these changes Apr 3, 2025
@ShriyaRishab
Copy link
Contributor

@Elnifio can you verify that the logs from this reference indeed pass the latest checkers?


def log_validation_loss(self, metrics, step):
consumed_tokens = (step - self.init_global_step) * self.gbs * self.seq_len
consumed_tokens = step * self.gbs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be called consumed_samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit. Thanks for catching that!

def on_validation_start(self, trainer, pl_module):
mllogger.end(key=constants.BLOCK_STOP, metadata={'epoch_num': self.consumed_tokens(trainer)})
mllogger.start(key=constants.EVAL_START, metadata={'epoch_num': self.consumed_tokens(trainer)})
mllogger.end(key=constants.BLOCK_STOP, metadata={constants.SAMPLES_COUNT: self.consumed_tokens(trainer)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where self.consumed_tokens() is defined, but does it return tokens or samples? If tokens, we need to switch it to samples. If samples, we need to rename it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit.

@Elnifio
Copy link
Contributor Author

Elnifio commented Apr 4, 2025

@ShriyaPalsamudram @mmarcinkiewicz the latest run fails the RCP checker in the following test:

  • common.yaml line 128, missing train_samples - this is fixed in my latest commit, where I renamed trained_samples to train_samples
  • closed_llama31_405b.yaml line 35, incorrect opt_learning_rate_decay_schedule value - this is fixed in my latest commit, where I renamed cosine with linear warmups to cosine with linear warmup.
  • common.yaml line 97, 103: missing epoch_num in epoch_start, epoch_end tags - I'm wondering whether, similar to other tags, we should accept both epoch_num and samples_count in the metadata. What do you think?

@Elnifio
Copy link
Contributor Author

Elnifio commented Apr 7, 2025

@ShriyaPalsamudram @mmarcinkiewicz the latest run fails the RCP checker in the following test:

  • common.yaml line 128, missing train_samples - this is fixed in my latest commit, where I renamed trained_samples to train_samples
  • closed_llama31_405b.yaml line 35, incorrect opt_learning_rate_decay_schedule value - this is fixed in my latest commit, where I renamed cosine with linear warmups to cosine with linear warmup.
  • common.yaml line 97, 103: missing epoch_num in epoch_start, epoch_end tags - I'm wondering whether, similar to other tags, we should accept both epoch_num and samples_count in the metadata. What do you think?

Further updating on this comment - I have ran the RCP checker against the latest compliance checker from PR 414 and it passed without any errors / warnings, so I'd say both PRs are ready to merge.

@ShriyaRishab ShriyaRishab merged commit ece3d15 into mlcommons:master Apr 8, 2025
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants