[doc] add examples and minor updates #1071

zyaoj · 2025-02-27T17:16:15Z

What does this PR do? Please describe:

added a load model example to showcase model hub
revised dataset example given the latest changes
update nits in tutorials

Does your PR introduce any breaking changes? If yes, please list them:

N/A

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

artemru · 2025-03-20T16:24:51Z

doc/source/notebooks/dataset_gsm8k_sft.ipynb

@@ -34,7 +34,7 @@
    "    load_text_tokenizer,\n",
    "    setup_gangs,\n",
    ")\n",
-    "from fairseq2.recipes.config import GangSection\n",
+    "from fairseq2.recipes.config import GangSection, ModelSection\n",


why defining it as

dataset_config.name = "gsm8k_sft" dataset_config.path = Path("/path/to/gsm8k_data/sft")

and not directly like

dataset_config = InstructionFinetuneDatasetSection(name = "gsm8k_sft", path = Path("/path/to/gsm8k_data/sft"))

?

same question for config = Config() # instantiate an object

it would be interesting to say something about the expected data format in "/path/to/gsm8k_data/sft" (unless it's explained elsewhere) !

artemru · 2025-03-20T16:27:18Z

doc/source/notebooks/dataset_gsm8k_sft.ipynb

+    "config = Config()  # instantiate an object\n",
+    "config.gang = GangSection(tensor_parallel_size=1)\n",
+    "config.dataset = dataset_config\n",
+    "config.model = ModelSection(name=\"llama3_1_8b\")\n",


Suggested change

"config = Config() # instantiate an object\n",

"config.gang = GangSection(tensor_parallel_size=1)\n",

"config.dataset = dataset_config\n",

"config.model = ModelSection(name=\"llama3_1_8b\")\n",

"config = Config(gang = GangSection(tensor_parallel_size=1), dataset = dataset_config, model = ModelSection(name=\"llama3_1_8b\"))

would this work as well ?

artemru · 2025-03-20T16:34:44Z