-
Notifications
You must be signed in to change notification settings - Fork 223
[HuggingFace] Multi-node client support #3755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ZiyueXu77
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! some minor comments
|
/build |
...ning/part-4_advanced_federated_learning/chapter-8_federated_LLM_training/08.2_llm_sft/job.py
Show resolved
Hide resolved
...ing/part-4_advanced_federated_learning/chapter-8_federated_LLM_training/08.3_llm_peft/job.py
Outdated
Show resolved
Hide resolved
...dvanced_federated_learning/chapter-8_federated_LLM_training/08.4_llm_quantization/sft_job.py
Show resolved
Hide resolved
chesterxgchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add few suggestion
|
Added a separate PR for file restructuring in LLM tutorial #3850 @chesterxgchen |
|
/build |
Greptile OverviewGreptile SummaryThis PR enables multi-node distributed training for NVFlare LLM fine-tuning across SLURM clusters. The implementation properly distinguishes between global rank (for FL operations) and local_rank (for CUDA devices), and introduces a wrapper script pattern to coordinate Critical Issues Found:
Other Issues:
Architecture is Sound: Confidence Score: 0/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant SLURM as SLURM Master Node
participant Server as NVFlare Server
participant Client as NVFlare Client (Rank 0)
participant Wrapper as client_wrapper.sh
participant Node0 as Node 0 (Ranks 0-7)
participant Node1 as Node 1 (Ranks 8-15)
SLURM->>Server: Start NVFlare server
SLURM->>Client: Start NVFlare client
SLURM->>Client: Submit FL job via job.py
loop Each FL Round
Client->>Wrapper: Execute: bash client_wrapper.sh client.py
Wrapper->>Wrapper: Detect multi-node setup (NNODES=2)
Wrapper->>Node0: srun launches torchrun --node_rank=0
Wrapper->>Node1: srun launches torchrun --node_rank=1
Node0->>Node0: Spawn 8 processes (ranks 0-7)
Node1->>Node1: Spawn 8 processes (ranks 8-15)
Note over Node0,Node1: Only Rank 0 calls flare.receive()
Server->>Client: Send global model
Client->>Node0: Rank 0 receives model
Node0->>Node0: Rank 0 broadcasts model to ranks 1-7
Node0->>Node1: Rank 0 broadcasts model to ranks 8-15
Note over Node0,Node1: All 16 ranks train via PyTorch DDP
Node0->>Node0: Training with NCCL P2P/CUMEM
Node1->>Node1: Training with NCCL P2P/CUMEM
Node0->>Node1: Cross-node sync via InfiniBand RDMA
Note over Node0,Node1: Only Rank 0 calls flare.send()
Node0->>Client: Rank 0 sends trained model
Client->>Server: Submit model updates
Note over Node0,Node1: CRITICAL BUG: Non-rank-0 processes<br/>don't know when to exit loop<br/>(flare.is_running() not broadcast)
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
examples/advanced/llm_hf/client.py, line 164-171 (link)style: dataset info printing and logging_steps calculation only happen on
local_rank == 0, but should userank == 0for multi-node consistency (otherwise both nodes print)
10 files reviewed, 4 comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
YuanTingHsieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good solution and reference for the users, added some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
examples/advanced/llm_hf/client.py, line 257-374 (link)logic: critical deadlock: non-rank-0 processes hang forever when FL session ends
flare.is_running()only returns meaningful values on rank 0 (sinceflare.init(rank=rank)makes it no-op on other ranks). When the FL server signals completion, only rank 0 exits the loop - all other processes continue waiting at line 260'sflare.receive()indefinitely.need to broadcast loop continuation signal before checking
flare.is_running():while True: # broadcast whether to continue (must happen before any rank-0-only operations) if rank == 0: should_continue = flare.is_running() else: should_continue = None if dist.is_initialized(): continue_obj = [should_continue] dist.broadcast_object_list(continue_obj, src=0) should_continue = continue_obj[0] if not should_continue: break # rest of existing loop code... if rank == 0: input_model = flare.receive(timeout=600) # ... etc
10 files reviewed, 2 comments
try srun job on client use bash running script running but hanging successful multinode training consolidate documentation increase flare.init timeout add wandb use singleton job update doc update docs add multinode readme rename files enable multi-gpu use fallback tenborsboard logging fix simulator run Signed-off-by: Holger Roth <[email protected]> restore client name based on user id Fix federated Stats Advanced folders (#3753) 1) Clean up Advanced Federated-statistics to streamline the folder structure 2) df_stats won't repeat the hello-tabulare-stats, instead, focusing on more the implementation and configuration options 3) image_stats, new download data scripts 4) for both replace the implementation with recipes. Also add the adult.json and image_statistics.json to the demo folders so that the visualization notebook won't fail the unit tests <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. add wandb license formatting; print client name hello-pt: restore requirements installation and handle Colab (#3760) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. fix code that overwrite the original notebook. Fix Colab issue for hello-pt (#3761) Fixes # . 1 )The conftest.py overwrite the original notebook, but in the process, skip a some cell as well. fix that 2) the hello-pt support on colab was broken A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add colab support 3 (#3762) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Update the swarm learning example. (#3759) Update the swarm learning example under exaemples/advanced/swarm_learning. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Reset task_data in client after executor is run. (#3763) This fixes https://nvbugspro.nvidia.com/bug/5570625 The client relies on task_data in executor. This change clears the task_data from FLContext after the executor is run. The default file streaming size is changed to 0 so it has the same behavior as 2.6. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Chester Chen <[email protected]> Fix SNPAuthorizer issue (#3764) The AMD KDS has a rate limitation for the "fetch" endpoint. So we need to cache the ARK/ASK AND the VCEK as well. We also added exponential backoff to avoid hitting this rate limit. - Added a mechanism to cache the AMD VCEK based on the Chip ID and Reported TCB info - Added exponential backoff <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. fix image_stats.ipynb (#3768) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fixed XGB Plugin Compiling Errors (#3767) g++ 12 or newer cleaned up headers and the XGB plugin doesn't compile anymore. Added <cstdint> headers. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Updates on llm and xgb examples (#3765) Fixes # . Updates for new transformer / peft / trl versions, some changed arg names, etc. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]> fix notebooks in advanced directory 4 (#3769) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Update CC docs [skip ci] (#3756) Update CC docs Update CC docs <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Chester Chen <[email protected]> FedStats: Improve error messages (#3770) Fixes # . Adding some warnings that make it easier to debug in case stats are missing or have mismatching names across clients. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix Conftest.py (#3771) Make a mistake on last commit with conftest.py I assume the notebook only read from disk once. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Remove production board text [skip ci] (#3772) Remove production board text Remove production board text <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add missing init in tf recipes (#3773) Add missing init in tf recipes Add missing init in tf recipes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Hello-word documentation update [skip ci] (#3778) Read Me and Hello-world rst update <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix colab commands in hello-pt (#3781) Fixes # . Adjust the order of commands so the notebook can run directly on Colab. Fix download command to not require user prompt. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Lower the log severity on timeout. No change in logic. (#3782) When timeout happens, the log message is recorded at `ERROR` level. However, later logic may recover from timeout. Therefore, this log should be in `WARNING.` <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Misc Notebook updates (#3777) 1. Add kaggle download library support 2. make sure the xgboost can run the linked notebooks in the one notebook 3 Tutorials 3.1 fix Job Recipe Notebook POCEnv ==> PocEnv 3.2 skip ProdEnv execution 3.3 delete unit test file for notebooks folders 3.4 fix a bug in security notebook tutorial (file path) change the job CLI to FLARE API 3.5 add clean up section for keyCloak example ( stop docker, clean up POC) 3.6 misc changes to default value on the notebook A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Force new token generation when check a new job (#3787) Fixes FLARE-2671. This change ensures a new token is always generated when a new job arrives. Previously, if a job completed quickly and the existing token hadn’t yet expired (token expiration is set to 100 seconds), the same token could be reused for the next job. This would cause the nonce check to fail, as the nonce had already been seen. By forcing token regeneration for each new job, we avoid reuse and ensure the nonce remains unique. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Make admin server timeout configurable (#3786) Fixes #3730 . Make admin server timeout configurable VIA "admin_timeout" in "local/resources.json" inside a startup kit, for example: ``` $ cat ./workspace/example_project/prod_00/server1/local/resources.json.default { "format_version": 2, "servers": [ { "admin_storage": "transfer", "max_num_clients": 100, "heart_beat_timeout": 600, "download_job_url": "http://download.server.com/", "admin_timeout": 10.0 } ], ``` <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Chester Chen <[email protected]> Do not generate start_all.sh and use async grpc by default (#3783) Fixes https://jirasw.nvidia.com/browse/FLARE-2677 Made 2 changes to provisioning, 1. By default, it will not generate start_all.sh. Use -s option to generate it. 2. Use synch grpc driver for both client and server by default. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. [BioNeMo] Use decomposer register widget (#3784) update other bionemo examples Fixes # . Update bionemo examples and tutorial for 2.7 with decomposer register widget <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Signed-off-by: Holger Roth <[email protected]> Co-authored-by: Chester Chen <[email protected]> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]> Include a section describing how to build KBS docker images [skip ci] (#3785) The build process for KBS requires a few steps and it's error prone. This PR adds a section of that deployment guide, which describes how to build KBS docker images directly. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Fixed a certificate issue with newer OpenSSL (#3775) Newer OpenSSL (from Ubuntu 25.04) doesn't accept the certs generated by provision. This PR fixed the problem by adding Authority Key ID and Key Usage to the cert extensions. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Chester Chen <[email protected]> Co-authored-by: Isaac Yang <[email protected]> Add diagrams and docs improvements [skip ci] (#3788) Add diagrams and docs improvements <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix broken links (#3789) Fix broken links Fix broken links <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Documentation structure and what is new updates [skip ci] (#3779) 1. Clean up the documents, fix missing reference 2. Update whats new to make sure it reflect key points 3. Update best practice to indicate 'best" is for lower-level API 4. remove duplicate files 5. update programming guides <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix image_stats integration test (#3790) Fix CI after changes in #3753 Add the old job into CI for testing. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add System architecture and Security Architecture documentations [skip ci] (#3794) 1. Add system architecture 2. Add cellnet architecture 3. Add Security Architecture <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Updated CC related User Guide [skip ci] (#3795) Updated CC related user guide to match the current system. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Add allow_out_ports (#3796) Follow changes in the CVM builder Follow changes in the CVM builder <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Architecture documentation fix [skip ci] (#3797) Fixes # . fix some mistakes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. [HE Tutorial] add missing codes (#3793) Fixes # . Upgrade nvflare version and add missing codes. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Signed-off-by: Holger Roth <[email protected]> fix requirements bug (#3798) * FLARE-2676 <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Remove HA from docs (#3799) Remove references to HA from docs since it has been removed already. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Remove duplicate toctree (#3800) Remove duplicate toctree Remove duplicate toctree <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Chester Chen <[email protected]> UPDATE Confidential computing documentation [skip ci] (#3801) <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. update CC documentation (#3802) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Documents fixes [skip ci] (#3803) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Update documentation (#3804) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. add Deployment Guide for SecureAI reference (#3805) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Tweak the documentation (#3809) Fixes # . A few sentences describing the changes proposed in this pull request. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix webpage links (#3812) Fix broken links and issues for the web page. Fix website link consistency: Replace hardcoded documentation paths with version-aware template literals <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Bump vite from 6.3.6 to 6.4.1 in /web (#3807) Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.3.6 to 6.4.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/vitejs/vite/releases">vite's releases</a>.</em></p> <blockquote> <h2>[email protected]</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/[email protected]/packages/create-vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> <h2>v6.4.1</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.4.1/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> <h2>[email protected]</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/[email protected]/packages/create-vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> <h2>v6.4.0</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.4.0/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> <h2>v6.3.7</h2> <p>Please refer to <a href="https://github.com/vitejs/vite/blob/v6.3.7/packages/vite/CHANGELOG.md">CHANGELOG.md</a> for details.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/vitejs/vite/commit/50034340401b4043bb0b158f18ffb7ae1b7f5c86"><code>5003434</code></a> fix(preview): use host url to open browser (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19836">#19836</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/bf9728e79e8df905de457e15001e65e33cf13f0e"><code>bf9728e</code></a> release: v6.3.0-beta.2</li> <li><a href="https://github.com/vitejs/vite/commit/380c10e665e78ef732a8d7b6c8f60a1226fc4c3b"><code>380c10e</code></a> fix(hmr): run HMR handler sequentially (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19793">#19793</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/8bed1de5710f2a097af0e22a196545446d98f988"><code>8bed1de</code></a> fix: addWatchFile doesn't work if base is specified (fixes <a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19792">#19792</a>) (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19794">#19794</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/0a0c50a7ed38017469ed6dcec941c2d8d0efd0d0"><code>0a0c50a</code></a> refactor: simplify pluginFilter implementation (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19828">#19828</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/59d0b35b30f3a38be33c0a9bdc177945b6f7eb1b"><code>59d0b35</code></a> perf(css): avoid constructing <code>renderedModules</code> (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19775">#19775</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/175a83909f02d3b554452a7bd02b9f340cdfef70"><code>175a839</code></a> fix: reject requests with <code>#</code> in request-target (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19830">#19830</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/e2e11b15a6083777ee521e26a3f79c3859abd411"><code>e2e11b1</code></a> fix(module-runner): allow already resolved id as entry (<a href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19768">#19768</a>)</li> <li><a href="https://github.com/vitejs/vite/commit/7200deec91a501fb84734e23906f80808734540c"><code>7200dee</code></a> fix: correct the behavior when multiple transform filter options are specifie...</li> <li><a href="https://github.com/vitejs/vite/commit/b1251720d47f15615ea354991cdaa90d9a94aae5"><code>b125172</code></a> fix(css): remove empty chunk imports correctly when chunk file name contained...</li> <li>Additional commits viewable in <a href="https://github.com/vitejs/vite/commits/[email protected]/packages/vite">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/NVIDIA/NVFlare/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Update Hello-PT and Stats (#3810) Fixes # . Fix a few things in hello-pt example and notebook. Improve the tensorboard logs by logging loss end of each epoch. Remove duplicated cell in df_stats and install quantile requirement. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Signed-off-by: Holger Roth <[email protected]> Bump astro from 5.13.2 to 5.14.4 in /web (#3776) Bumps [astro](https://github.com/withastro/astro/tree/HEAD/packages/astro) from 5.13.2 to 5.14.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/withastro/astro/releases">astro's releases</a>.</em></p> <blockquote> <h2>[email protected]</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/withastro/astro/pull/14509">#14509</a> <a href="https://github.com/withastro/astro/commit/7e04caf9a4a75c75f06c4207fae601a5fd251735"><code>7e04caf</code></a> Thanks <a href="https://github.com/ArmandPhilippot"><code>@ArmandPhilippot</code></a>! - Fixes an error in the docs that specified an incorrect version for the <code>security.allowedDomains</code> release.</li> </ul> <h2>[email protected]</h2> <h3>Patch Changes</h3> <ul> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14505">#14505</a> <a href="https://github.com/withastro/astro/commit/28b2a1db4f3f265632f280b0dbc4c5f241c387e2"><code>28b2a1d</code></a> Thanks <a href="https://github.com/matthewp"><code>@matthewp</code></a>! - Fixes <code>Cannot set property manifest</code> error in test utilities by adding a protected setter for the manifest property</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14235">#14235</a> <a href="https://github.com/withastro/astro/commit/c4d84bb654c9a5064b243e971c3b5b280e2b3791"><code>c4d84bb</code></a> Thanks <a href="https://github.com/toxeeec"><code>@toxeeec</code></a>! - Fixes a bug where the "tap" prefetch strategy worked only on the first clicked link with view transitions enabled</p> </li> </ul> <h2>[email protected]</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/withastro/astro/pull/14440">#14440</a> <a href="https://github.com/withastro/astro/commit/a3e16ab6dd0bef9ab6259f23bfeebed747e27497"><code>a3e16ab</code></a> Thanks <a href="https://github.com/florian-lefebvre"><code>@florian-lefebvre</code></a>! - Fixes a case where the URLs generated by the experimental Fonts API would be incorrect in dev</li> </ul> <h2>[email protected]</h2> <h3>Minor Changes</h3> <ul> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/13520">#13520</a> <a href="https://github.com/withastro/astro/commit/a31edb8daad8632bacd1861adf6ac720695f7173"><code>a31edb8</code></a> Thanks <a href="https://github.com/openscript"><code>@openscript</code></a>! - Adds a new property <code>routePattern</code> available to <code>GetStaticPathsOptions</code></p> <p>This provides the original, dynamic segment definition in a routing file path (e.g. <code>/[...locale]/[files]/[slug]</code>) from the Astro render context that would not otherwise be available within the scope of <code>getStaticPaths()</code>. This can be useful to calculate the <code>params</code> and <code>props</code> for each page route.</p> <p>For example, you can now localize your route segments and return an array of static paths by passing <code>routePattern</code> to a custom <code>getLocalizedData()</code> helper function. The <code>params</code> object will be set with explicit values for each route segment (e.g. <code>locale</code>, <code>files</code>, and <code>slug)</code>. Then, these values will be used to generate the routes and can be used in your page template via <code>Astro.params</code>.</p> <pre lang="astro"><code>// src/pages/[...locale]/[files]/[slug].astro <p>import { getLocalizedData } from "../../../utils/i18n"; export async function getStaticPaths({ routePattern }) { const response = await fetch('...'); const data = await response.json(); console.log(routePattern); // [...locale]/[files]/[slug] // Call your custom helper with <code>routePattern</code> to generate the static paths return data.flatMap((file) => getLocalizedData(file, routePattern)); } const { locale, files, slug } = Astro.params; </code></pre></p> <p>For more information about this advanced routing pattern, see Astro's <a href="https://docs.astro.build/en/reference/routing-reference/#routepattern">routing reference</a>.</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/13651">#13651</a> <a href="https://github.com/withastro/astro/commit/dcfbd8c9d5dc798d1bcb9b36531c2eded301050d"><code>dcfbd8c</code></a> Thanks <a href="https://github.com/ADTC"><code>@ADTC</code></a>! - Adds a new <code>SvgComponent</code> type</p> <p>You can now more easily enforce type safety for your <code>.svg</code> assets by directly importing <code>SVGComponent</code> from <code>astro/types</code>:</p> <pre lang="astro"><code>--- // src/components/Logo.astro import type { SvgComponent } from 'astro/types'; import HomeIcon from './Home.svg'; interface Link { url: string; text: string; </code></pre> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/withastro/astro/blob/main/packages/astro/CHANGELOG.md">astro's changelog</a>.</em></p> <blockquote> <h2>5.14.4</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/withastro/astro/pull/14509">#14509</a> <a href="https://github.com/withastro/astro/commit/7e04caf9a4a75c75f06c4207fae601a5fd251735"><code>7e04caf</code></a> Thanks <a href="https://github.com/ArmandPhilippot"><code>@ArmandPhilippot</code></a>! - Fixes an error in the docs that specified an incorrect version for the <code>security.allowedDomains</code> release.</li> </ul> <h2>5.14.3</h2> <h3>Patch Changes</h3> <ul> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14505">#14505</a> <a href="https://github.com/withastro/astro/commit/28b2a1db4f3f265632f280b0dbc4c5f241c387e2"><code>28b2a1d</code></a> Thanks <a href="https://github.com/matthewp"><code>@matthewp</code></a>! - Fixes <code>Cannot set property manifest</code> error in test utilities by adding a protected setter for the manifest property</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14235">#14235</a> <a href="https://github.com/withastro/astro/commit/c4d84bb654c9a5064b243e971c3b5b280e2b3791"><code>c4d84bb</code></a> Thanks <a href="https://github.com/toxeeec"><code>@toxeeec</code></a>! - Fixes a bug where the "tap" prefetch strategy worked only on the first clicked link with view transitions enabled</p> </li> </ul> <h2>5.14.2</h2> <h3>Patch Changes</h3> <ul> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14459">#14459</a> <a href="https://github.com/withastro/astro/commit/916f9c2e094f19562cfe722ca0a5fafb0f313c2e"><code>916f9c2</code></a> Thanks <a href="https://github.com/florian-lefebvre"><code>@florian-lefebvre</code></a>! - Improves font files URLs in development when using the experimental fonts API by showing the subset if present</p> </li> <li> <p><a href="https://github.com/withastro/astro/commit/b8ca69b97149becefaf89bf21853de9c905cdbb7"><code>b8ca69b</code></a> Thanks <a href="https://github.com/ascorbic"><code>@ascorbic</code></a>! - Aligns dev image server file base with Vite rules</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14469">#14469</a> <a href="https://github.com/withastro/astro/commit/1c090b00c1f5c3d8e938ac873fc63ab2f1ae37f1"><code>1c090b0</code></a> Thanks <a href="https://github.com/delucis"><code>@delucis</code></a>! - Updates <code>tinyexec</code> dependency</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/14460">#14460</a> <a href="https://github.com/withastro/astro/commit/008dc75d860eadbb394e86dac68c7f4962e40489"><code>008dc75</code></a> Thanks <a href="https://github.com/florian-lefebvre"><code>@florian-lefebvre</code></a>! - Fixes a case where <code>astro:config/server</code> values typed as URLs would be serialized as strings</p> </li> <li> <p><a href="https://redirect.github.com/withastro/astro/pull/13730">#13730</a> <a href="https://github.com/withastro/astro/commit/72603676818d1c433ac2751843a8a9b0cc9b48c9"><code>7260367</code></a> Thanks <a href="https://github.com/razonyang"><code>@razonyang</code></a>! - Fixes a bug in i18n, where Astro caused an infinite loop when a locale that doesn't have an index, and Astro falls back to the index of the default locale.</p> </li> <li> <p><a href="https://github.com/withastro/astro/commit/6ee63bfac4856f21b4d4633021b3d2ee059e553f"><code>6ee63bf</code></a> Thanks <a href="https://github.com/matthewp"><code>@matthewp</code></a>! - Adds <code>security.allowedDomains</code> configuration to validate <code>X-Forwarded-Host</code> headers in SSR</p> <p>The <code>X-Forwarded-Host</code> header will now only be trusted if it matches one of the configured allowed host patterns. This prevents <a href="https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/07-Input_Validation_Testing/17-Testing_for_Host_Header_Injection">host header injection attacks</a> that can lead to cache poisoning and other security vulnerabilities.</p> <p>Configure allowed host patterns to enable <code>X-Forwarded-Host</code> support:</p> <pre lang="js"><code>// astro.config.mjs export default defineConfig({ output: 'server', adapter: node(), security: { allowedDomains: [ { hostname: 'example.com' }, { hostname: '*.example.com' }, { hostname: 'cdn.example.com', port: '443' }, ], }, }); </code></pre> <p>The patterns support wildcards (<code>*</code> and <code>**</code>) for flexible hostname matching and can optionally specify protocol and port.</p> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/withastro/astro/commit/3412859d37b3282a967278eba86f22cdb373eac7"><code>3412859</code></a> [ci] release (<a href="https://github.com/withastro/astro/tree/HEAD/packages/astro/issues/14510">#14510</a>)</li> <li><a href="https://github.com/withastro/astro/commit/7e04caf9a4a75c75f06c4207fae601a5fd251735"><code>7e04caf</code></a> docs: fix <code>security.allowedDomains</code> version (<a href="https://github.com/withastro/astro/tree/HEAD/packages/astro/issues/14509">#14509</a>)</li> <li><a href="https://github.com/withastro/astro/commit/fe1d35cc950b16a6462102b98b48753d27395e03"><code>fe1d35c</code></a> [ci] release (<a href="https://github.com/withastro/astro/tree/HEAD/packages/astro/issues/14507">#14507</a>)</li> <li><a href="https://github.com/withastro/astro/commit/7926882013c2f493aeb2fe9b162e515e65e68e81"><code>7926882</code></a> [ci] format</li> <li><a href="https://github.com/withastro/astro/commit/c4d84bb654c9a5064b243e971c3b5b280e2b3791"><code>c4d84bb</code></a> fix(prefetch): Fix "tap" prefetch strategy when view transitions are enabled ...</li> <li><a href="https://github.com/withastro/astro/commit/3bb14b7dbbc236f55096631401703a290321031e"><code>3bb14b7</code></a> [ci] release (<a href="https://github.com/withastro/astro/tree/HEAD/packages/astro/issues/14466">#14466</a>)</li> <li><a href="https://github.com/withastro/astro/commit/7a5aafff7b6d424164bf76d25c231d8860a26e25"><code>7a5aaff</code></a> [ci] format</li> <li><a href="https://github.com/withastro/astro/commit/28b2a1db4f3f265632f280b0dbc4c5f241c387e2"><code>28b2a1d</code></a> Fix failing x-forwarded-host tests (<a href="https://github.com/withastro/astro/tree/HEAD/packages/astro/issues/14505">#14505</a>)</li> <li><a href="https://github.com/withastro/astro/commit/ec307b02e3e866fa53ea6715b5f6f05dbb323953"><code>ec307b0</code></a> [ci] format</li> <li><a href="https://github.com/withastro/astro/commit/6ee63bfac4856f21b4d4633021b3d2ee059e553f"><code>6ee63bf</code></a> Merge commit from fork</li> <li>Additional commits viewable in <a href="https://github.com/withastro/astro/commits/[email protected]/packages/astro">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/NVIDIA/NVFlare/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Use Python 3.9 typing (#3611) Use python 3.9 typing syntax. The most changes are about implementing [PEP 585](https://peps.python.org/pep-0585/) , that is, replacing `Dict` with `dict`, replacing `List` with `list` and replacing `Tuple` with `tuple`. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). Signed-off-by: cyy <[email protected]> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]> Fix docs and add missing diagram (#3815) Fix docs and add missing diagram, polish hello-flower example. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Simplify quantization code (#3612) Refactor the quantisation code. This PR is for preparing a new quantisation scheme. The use of `QuantState.from_dict` and `QuantState.as_dict` assumes the latest bitsandbytes version. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). --------- Signed-off-by: cyy <[email protected]> Co-authored-by: Chester Chen <[email protected]> Co-authored-by: Ziyue Xu <[email protected]> Adjust supported minimum Python versions to 3.9 (#3665) This PR life minimum Python version to 3.9. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. Signed-off-by: Yuanyuan Chen <[email protected]> Enhance MLflow receiver (#3657) Previously, if users did not explicitly specify a `tracking_uri`, MLflow would default to using `./mlruns`, which is a local directory on the FL server. This made it difficult for users to access logged metrics and artifacts, as the default path was not exposed or retrievable outside the server environment. This PR introduces an enhancement to set a more accessible default tracking_uri when none is provided by the user. Specifically, the default is now set to: ``` file://[workspace]/[job_id]/mlflow ``` This change enables users to retrieve the logged metrics and artifacts using the FlareAPI, as they are stored in a job-specific, accessible path within the workspace. - This PR depends on #3655. - Ensures MLflow logs are stored in a consistent, retrievable location tied to the job and workspace. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Fix edge simulator (#3813) When the task response is "RETRY", the simulator should keep retrying instead of just return and shutdown. When the task response is "RETRY", the simulator should keep retrying instead of just return and shutdown. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Copilot <[email protected]> Deployment guide of confidential ACI [skip ci] (#3816) Detailed steps on deploying confidential ACI. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. Co-authored-by: Chester Chen <[email protected]> Add Azure CVM deployment guide [skip ci] (#3820) Add the document on deploying Azure CVM, and performing attestation. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. Update release notes (#3818) Fix a syntax error in ACI doc [skip ci] (#3823) …nt [skip ci] Fix minor document format issue. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. Update sub_start.sh with one additional option (#3821) Add one more option `--once` to sub_start.sh. When this option is used with sub_start.sh, like `sub_start.sh --once` it will go directly to start NVFlare. If it fails, such as missing dependencies or other issues, sub_start.sh exits with the exit code returned by python. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Update cc docs (#3824) Fixes FLARE-2689 Update CC docs <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add FedNCA publication (#3806) Adds our publication to the list of publications and talks. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [ ] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [x] Documentation updated. --------- Co-authored-by: Holger Roth <[email protected]> CC document updates and others (#3826) CC document updates, cc provisioning tool updates and authorizers for Azure updates. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add Tensor Stream component for efficient safetensors-based model tensor streaming (#3741) Add GPU CC docs (#3825) Add GPU CC docs Add GPU CC docs <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Zhihong Zhang <[email protected]> Update FAQ (#3832) Update FAQ Update FAQ with more up to date byoc information <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Holger Roth <[email protected]> Co-authored-by: Copilot <[email protected]> Android app enhancements for state management and status on UI (#3819) Add missing statuses and fix state handling, improve display of status on UI with training progress. Add missing statuses and improves state handling, improve display of status on UI with training progress. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Co-authored-by: Chester Chen <[email protected]> address comments address comments run on slurm address comments formatting New CC token verification mechanism (#3829) This PR introduces a new confidential computing token verification mechanism to replace the previous job-based verification approach. Previously, the verification mechanism was tied to specific jobs, which required generating a new set of tokens for each new job. This approach was inefficient and error-prone. The new mechanism provides a persistent, cross-site token validation system that ensures secure and consistent communication between components. 1. Client Registration When a client sends a registration request to the server: - The client includes its token in the request. - The server validates the client’s token. - The server responds with its own token. - The client validates the server’s token. 2. Periodic Cross-Site Validation Each site (server or client) periodically triggers a cross-site token validation event (e.g., every 5–10 minutes): - The initiating site (e.g., siteA) starts the validation event. - All sites, including siteA, generate new tokens for this event. - siteA validates tokens from all participating sites. 3. Failure Handling If any token validation fails: The affected site will shut itself down. Optionally, it may attempt to trigger a system-wide shutdown to prevent inconsistent states. 4. Benefits - Removes dependency on per-job token generation. - Enables periodic, automated validation to detect and isolate compromised sites. <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. Add ADAQUANT quantization scheme (#3628) This PR adds a new quantization scheme: ADAQUANT, as introduced in the paper [Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning](https://www.ijcai.org/proceedings/2023/0394.pdf). ADAQUANT converts float tensors into integer tensors. Combined with an additional compression process to pack low-bit integers, it can reach near 10X quantisation rate, as indicated in the following test results: ``` 2025-08-24 11:10:17,096 - INFO - Quantized 147/147 params. Before quantization: 5716.26 MB. After quantization: 0.00 MB with meta: 602.34 MB. 2025-08-24 11:12:25,513 - INFO - Dequantized 147/147 params. Before dequantization: 5716.26 MB with meta: 602.34 MB. After dequantization: 5716.26 MB. ``` These results were reported by running according under `NVFlare/examples/advanced/llm_hf` with the command ``` python3 llm_hf_fl_job.py --client_ids dolly --data_path ${PWD}/dataset --workspace_dir ${PWD}/workspace/hf_sft_nf4 --job_dir ${PWD}/workspace/jobs/hf_sft_nf4 --train_mode SFT --quantize_mode adaquant ``` <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [X] Quick tests passed locally by running `./runtest.sh`. - [X] In-line docstrings updated. - [ ] Documentation updated. --------- Signed-off-by: cyy <[email protected]> Signed-off-by: Yuanyuan Chen <[email protected]> Add CC GPU notes (#3840) Fixes FLARE-2688. Add GPU passthrough CVM instructions <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line do…
aa2a667 to
3504cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
|
/build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
examples/advanced/llm_hf/client.py, line 143-148 (link)logic: use
rank == 0instead of justrank == 0check - in multi-node setup, both nodes havelocal_rank == 0, causing race condition on shared filesystem
10 files reviewed, 4 comments
|
/build |
YuanTingHsieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes # .
Description
Multi-Node Distributed Training Support for NVFlare LLM Fine-tuning
This PR enables multi-node distributed training with NVFlare for LLM fine-tuning across SLURM clusters.
Key Changes:
Impact:
Tested on: 2 nodes × 8 GPUs (16 total GPUs) with SLURM + InfiniBand
Types of changes
./runtest.sh.