Add huggingface example to hierarchical #3780

ZiyueXu77 · 2025-10-13T17:28:41Z

Fixes # .

Description

Add one example to hierarchical fl for huggingface language model training, addressed the latency issue by removing tolist() process

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

Copilot

Pull Request Overview

This PR adds HuggingFace language model training support to the hierarchical federated learning system and addresses latency issues by optimizing model serialization. The changes include removing inefficient .tolist() conversions that were causing performance bottlenecks in model transmission, adding model size logging for better monitoring, and providing a complete HuggingFace SFT example for hierarchical FL.

Optimized model serialization by keeping numpy arrays instead of converting to lists
Added comprehensive HuggingFace SFT example with task processor, job configuration, and data preprocessing utilities
Enhanced logging with model size information for better monitoring and debugging

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
nvflare/edge/assessors/model_update.py	Added model size logging with numpy import
nvflare/edge/assessors/buff_model_manager.py	Removed tolist() conversion and added model size logging
examples/tutorials/.../hf_sft_model.py (3 files)	Added model_name_or_path attribute to CausalLMModel class
examples/advanced/edge/utils/preprocess_dolly.py	New data preprocessing utility for Dolly dataset
examples/advanced/edge/jobs/processors/models/hf_sft_model.py	New HuggingFace model wrapper for edge processing
examples/advanced/edge/jobs/processors/hf_sft_task_processor.py	New comprehensive task processor for HuggingFace SFT training
examples/advanced/edge/jobs/processors/cifar10_pt_task_processor.py	Refactored to move setup logic from training method
examples/advanced/edge/jobs/hf_sft_job.py	New job configuration script for HuggingFace SFT federated learning
examples/advanced/edge/README.md	Updated documentation with HuggingFace example instructions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

examples/advanced/edge/utils/preprocess_dolly.py

examples/advanced/edge/jobs/pt_hf_sft/client.py

examples/advanced/edge/jobs/processors/hf_sft_task_processor.py

examples/advanced/edge/jobs/hf_sft_job.py

Co-authored-by: Copilot <[email protected]>

chesterxgchen

For newly added examples.
Please following the following convention

no need to add src directory, just keep it flat.
for the model file, no need to name something like hf_sft_model.py, you are already in the given example or particular directory. just called model.py
the client side training script should be called client.py
the job code should be called job.py ( instead of hf_sft.job
try to use one folder for one job, not to share the code. It may looks efficient by sharing the code for the example creator, it is confusing for the user trying to figure out which one relevant. The example is not provide the simplest way for the end user to understand the concept.

For all new examples, should ideally have the code structure like this

client.py 
server.py ( may be missing for builtin FL algorithms) 
model.py 
job.py 

download_data.py ( optional) 
preapre_data.py (optional)

Unless to make it consistent, its very hard to do automated testing. and confusing to user as well.

We are too freely name our files, such as cifar10_fl_job.py, fl_job.py, cifar10_fedavt_train.py etc.

Lets be consistent.

YuanTingHsieh · 2025-10-14T22:34:50Z

nvflare/edge/assessors/buff_model_manager.py

+        # print model size in MB
+        model_size = sum([v.nbytes for v in new_model.values()]) / (1024 * 1024)
+        self.log_info(fl_ctx, f"new model size: {model_size:.2f} MB")

        # update the current model
-        # convert new_model items from numpy arrays to lists for serialization
-        new_model = {k: v.tolist() if isinstance(v, np.ndarray) else v for k, v in new_model.items()}
+        # Keep numpy arrays for efficient FOBS serialization through the hierarchy
+        # If converting to list, the model size will be much larger and slower to serialize
+        # Note that for cases where the device expects list (e.g. ExecuTorch simulation),


this could break the current ET examples with mobile, we maybe merge this PR after release.

good point! let's do this PR after release to avoid any broken pieces

chesterxgchen · 2025-10-15T00:12:41Z

examples/advanced/llm_hf/src/hf_sft_model.py

 from transformers import AutoModelForCausalLM


 class CausalLMModel(torch.nn.Module):


why this class under src and why it can't be named model.py ?

Similar to other tutorial notebooks. is this because the video ?

it definitely can, I was just avoiding making too much change in one PR. Shall we do a holistic move to update all examples following the new convention (after release)?

ZiyueXu77 added 8 commits October 8, 2025 14:34

add hf exampel to edge

5baf597

Merge branch 'NVIDIA:main' into hier_hf

77ae524

Merge branch 'NVIDIA:main' into hier_hf

3f0ff88

remove readme for demo

dc74ede

fix edge serialization latency, add llm hierarchical

75a4322

format update

3bb695d

Merge branch 'NVIDIA:main' into hier_hf

89e28c4

update readme

c0ab4bd

Copilot AI review requested due to automatic review settings October 13, 2025 17:28

Copilot AI reviewed Oct 13, 2025

View reviewed changes

ZiyueXu77 requested a review from YuanTingHsieh October 13, 2025 17:30

ZiyueXu77 and others added 5 commits October 13, 2025 13:34

Update examples/advanced/edge/utils/preprocess_dolly.py

00474af

Co-authored-by: Copilot <[email protected]>

Update examples/advanced/edge/jobs/processors/hf_sft_task_processor.py

d854626

Co-authored-by: Copilot <[email protected]>

Update examples/advanced/edge/jobs/processors/hf_sft_task_processor.py

bdd13f6

Co-authored-by: Copilot <[email protected]>

remove redundant param

09e466a

format update

35b5385

chesterxgchen reviewed Oct 14, 2025

View reviewed changes

ZiyueXu77 added 2 commits October 14, 2025 13:48

reorg edge example

373a2d0

Merge branch 'main' into hier_hf

e25df63

ZiyueXu77 requested a review from chesterxgchen October 14, 2025 17:50

YuanTingHsieh reviewed Oct 14, 2025

View reviewed changes

chesterxgchen reviewed Oct 15, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into hier_hf

2862d0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add huggingface example to hierarchical #3780

Add huggingface example to hierarchical #3780

Uh oh!

ZiyueXu77 commented Oct 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chesterxgchen left a comment

Uh oh!

YuanTingHsieh Oct 14, 2025

Uh oh!

ZiyueXu77 Oct 15, 2025

Uh oh!

chesterxgchen Oct 15, 2025 •

edited

Loading

Uh oh!

ZiyueXu77 Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from transformers import AutoModelForCausalLM


		class CausalLMModel(torch.nn.Module):

Add huggingface example to hierarchical #3780

Are you sure you want to change the base?

Add huggingface example to hierarchical #3780

Uh oh!

Conversation

ZiyueXu77 commented Oct 13, 2025

Description

Types of changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chesterxgchen left a comment

Choose a reason for hiding this comment

Uh oh!

YuanTingHsieh Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ZiyueXu77 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

chesterxgchen Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZiyueXu77 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chesterxgchen Oct 15, 2025 •

edited

Loading