Skip to content

Conversation

@lepan-google
Copy link
Collaborator

Add regression test for Llama3.1 70B GCSFuse NeMo recipe in main branch.

Ran tests locally

  • A3mega Llama3.1 70B GCSFuse NeMo main branch recipe: link

  • A3mega Llama3.1 70B GCSFuse NeMo storage-next branch recipie: link

  • A3mega llama3.1 70B NeMo main branch recipe (original synthetic data recipe): link

@lepan-google lepan-google requested a review from mkmg May 19, 2025 05:57
cmds = (
"python3 "
f"{recipe_repo_root}/src/utils/checkpointing_metrics"
f"(cd {recipe_repo_root} && git checkout storage-next "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pass the branch here so we can use the main branch when possible?

Copy link
Collaborator Author

@lepan-google lepan-google May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the log scraper in the storage-next branch has the feature to upload the final results to BQ tables. The log scraper in main branch only calculates the results and prints it out to the user. So currently, we need to go to the storage-next branch for metrics calculation and uploading even we run with the main branch recipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants