-
Notifications
You must be signed in to change notification settings - Fork 223
Initial Feature Election algorithm version for Nvidia FLARE #3817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR introduces a comprehensive Feature Election framework for NVIDIA FLARE, enabling federated feature selection on tabular datasets. The implementation adds a new app_opt module with client-side executor, server-side controller, and high-level API components that allow multiple clients to collaboratively identify relevant features without sharing raw data. The framework supports various feature selection methods (Lasso, ElasticNet, mutual information, Random Forest, etc.) and includes both simulation capabilities for development and production deployment through FLARE job configuration. The implementation follows FLARE's established patterns with ScatterAndGather workflow inheritance and proper separation between controller/executor components, making it a plug-and-play addition to the existing ecosystem.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/init.py | 4/5 | Module entry point exposing FeatureElection API, controller, executor, and utility functions |
| nvflare/app_opt/feature_election/feature_election.py | 3/5 | Main API class providing data preparation, simulation, FLARE job generation, and result management |
| nvflare/app_opt/feature_election/controller.py | 3/5 | Server-side controller implementing federated feature aggregation with weighted voting strategies |
| nvflare/app_opt/feature_election/executor.py | 2/5 | Client-side executor handling local feature selection with multiple algorithms and PyImpetus integration |
| tests/unit_test/app_opt/feature_election/test_feature_election.py | 4/5 | Comprehensive test suite covering initialization, feature selection methods, and FLARE integration |
| nvflare/app_opt/feature_election/README.md | 4/5 | Detailed documentation with installation, usage examples, API reference, and architecture overview |
| examples/advanced/feature_election/basic_usage.py | 3/5 | Tutorial-style example script demonstrating progressive usage patterns from basic to advanced |
| examples/advanced/feature_election/flare_deployment.py | 3/5 | Production deployment example showing complete FLARE workflow integration |
| examples/advanced/feature_election/requirements.txt | 1/5 | Incorrectly named file containing Markdown installation notes instead of pip requirements format |
| nvflare/app_opt/feature_election/INSTALLATION_NOTES.md | 5/5 | Clear installation guidance for maintainers on integrating dependencies into setup.py |
| examples/advanced/feature_election/INSTALLATION_NOTES.md | 4/5 | Duplicate installation notes for the examples directory with proper dependency structure |
Confidence score: 2/5
- This PR requires careful review due to multiple critical issues including missing imports, undefined variables, and potential circular dependencies
- Score lowered due to hardcoded paths that may not exist, undefined PPIMBC_Model reference in executor.py, scattered imports throughout methods, and basic error handling issues
- Pay close attention to nvflare/app_opt/feature_election/executor.py and nvflare/app_opt/feature_election/feature_election.py which contain the most critical issues
Sequence Diagram
sequenceDiagram
participant User
participant FeatureElection
participant Controller as "FeatureElectionController"
participant Executor as "FeatureElectionExecutor"
participant Client1 as "Client 1"
participant Client2 as "Client 2"
participant ClientN as "Client N"
User->>FeatureElection: "create_flare_job(job_name, output_dir)"
FeatureElection->>FeatureElection: "Generate server & client configs"
FeatureElection->>User: "Return job configuration paths"
User->>Client1: "Load data and set_data(X_train, y_train)"
User->>Client2: "Load data and set_data(X_train, y_train)"
User->>ClientN: "Load data and set_data(X_train, y_train)"
User->>Controller: "Submit FLARE job"
Controller->>Controller: "Initialize with freedom_degree, fs_method"
Controller->>Controller: "Start feature election workflow"
Controller->>Client1: "Send feature_selection task"
Controller->>Client2: "Send feature_selection task"
Controller->>ClientN: "Send feature_selection task"
Client1->>Executor: "execute(task_name='feature_election')"
Client2->>Executor: "execute(task_name='feature_election')"
ClientN->>Executor: "execute(task_name='feature_election')"
Executor->>Executor: "_perform_feature_selection()"
Executor->>Executor: "_evaluate_model() for initial score"
Executor->>Executor: "_evaluate_model() for FS score"
Client1->>Controller: "Return selected_features, feature_scores, num_samples"
Client2->>Controller: "Return selected_features, feature_scores, num_samples"
ClientN->>Controller: "Return selected_features, feature_scores, num_samples"
Controller->>Controller: "_extract_client_data(aggr_result)"
Controller->>Controller: "_aggregate_selections() - apply freedom_degree"
Controller->>Controller: "Generate global_feature_mask"
Controller->>Client1: "Send apply_mask task with global_feature_mask"
Controller->>Client2: "Send apply_mask task with global_feature_mask"
Controller->>ClientN: "Send apply_mask task with global_feature_mask"
Client1->>Executor: "_handle_apply_mask()"
Client2->>Executor: "_handle_apply_mask()"
ClientN->>Executor: "_handle_apply_mask()"
Executor->>Executor: "Apply global mask to local data"
Client1->>Controller: "Acknowledge mask applied"
Client2->>Controller: "Acknowledge mask applied"
ClientN->>Controller: "Acknowledge mask applied"
Controller->>User: "Feature election completed - results available"
User->>Controller: "Retrieve global_feature_mask and statistics"
Controller->>User: "Return election results and selected features"
11 files reviewed, 12 comments
tests/unit_test/app_opt/feature_election/test_feature_election.py
Outdated
Show resolved
Hide resolved
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The most recent change simplifies the requirements.txt file from a documentation-heavy format to a proper Python requirements specification. The developer replaced the comprehensive installation guide (which contained setup.py examples, integration instructions, and markdown documentation) with a clean two-line requirements file listing only the actual package dependencies: scikit-learn>=1.0.0 and PyImpetus>=0.0.6.
This change aligns with Python packaging conventions where requirements.txt should contain only package specifications that can be installed via pip install -r requirements.txt. The transformation converts what was essentially documentation disguised as a requirements file into its intended purpose - a dependency specification file that integrates properly with Python tooling and NVFlare's packaging system.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/advanced/feature_election/requirements.txt | 4/5 | Converted from documentation format to proper Python requirements specification with two dependencies |
Confidence score: 4/5
- This change is safe to merge with minimal risk as it properly formats a requirements file according to Python conventions
- Score reflects the positive nature of standardizing the file format, though the removed documentation content may need to be preserved elsewhere
- No files require special attention as this is a straightforward formatting improvement
1 file reviewed, no comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The current changes in this Pull Request include removing duplicate FLContext import on line 9 in the controller.py file, which addresses the duplicate import issue raised in previous reviews. The Feature Election Controller remains the core component that implements a federated feature selection algorithm within NVIDIA FLARE's architecture. It extends the ScatterAndGather workflow to enable multiple clients to collaboratively identify relevant features through weighted aggregation based on client-provided feature scores and sample sizes. The controller uses a configurable freedom_degree parameter to balance between conservative intersection-based and liberal union-based feature selection approaches. This implementation integrates with FLARE's existing distributed communication infrastructure and maintains compatibility with the framework's component architecture.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/controller.py | 5/5 | Fixed duplicate FLContext import by removing line 9 import while keeping line 11 import |
Confidence score: 5/5
- This PR is safe to merge with minimal risk as it only addresses a minor code cleanup issue
- Score reflects the simple nature of removing a duplicate import statement with no functional changes
- No files require special attention as the change is trivial and improves code quality
1 file reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The change is a simple documentation formatting fix that adds a missing space in the README.md file for the feature election module. Specifically, it corrects a markdown list item from - **FLASH Repository** to - **FLASH Repository** by adding the required space after the dash character. This addresses a markdown formatting inconsistency identified in the previous review and aligns with standard markdown conventions where list items require a space after the dash. The change is purely cosmetic and focused solely on improving documentation readability without any functional impact.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/README.md | 5/5 | Fixed markdown formatting by adding missing space in list item |
Confidence score: 5/5
- This PR is safe to merge with minimal risk as it only contains a trivial documentation formatting fix
- Score reflects a simple, isolated change with zero functional impact that addresses a previously identified formatting issue
- No files require special attention as this is a straightforward documentation improvement
1 file reviewed, no comments
Removed leftover testing code Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest changes address critical import and syntax issues in the feature election executor module. The developer fixed a runtime failure by adding proper error handling for the PyImpetus library import using a try-except block, allowing the module to load gracefully even when this optional dependency is missing. Additionally, a syntax error where PPIMBC_Model was incorrectly referenced has been corrected to PPIMBC, which is the actual class name from the PyImpetus library. These changes improve the robustness of the feature election module by handling optional dependencies properly and fixing incorrect API usage.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/executor.py | 5/5 | Fixed critical import handling for PyImpetus dependency and corrected class name reference |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects proper error handling for optional dependencies and correction of a clear syntax error
- No files require special attention
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The recent change is a minor cleanup to the deployment example script examples/advanced/feature_election/flare_deployment.py. The modification removes two decorative print statements with asterisk borders around the "SERVER SETUP" section and shortens a redundant comment. This change is purely cosmetic - it removes print("="*70) lines that created visual borders and truncates a comment about minimum hospital participation that was already clear from context. The change improves code cleanliness and professional appearance of the example script without affecting any functionality.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/advanced/feature_election/flare_deployment.py | 5/5 | Removed decorative print statements and shortened redundant comment for cleaner code presentation |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects simple cosmetic changes that only affect console output formatting without touching any functional logic
- No files require special attention as this is purely aesthetic cleanup
1 file reviewed, no comments
… Most greptile issues resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, focusing on import cleanup and code organization improvements. The developer addressed several previously identified issues in the Feature Election module by removing unused imports in the test file and improving import organization in the executor. The key changes include removing the unused load_election_results import from the test file and reorganizing sklearn imports in the executor while fixing PyImpetus import handling.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| tests/unit_test/app_opt/feature_election/test_feature_election.py | 4/5 | Cleaned up imports by removing unused load_election_results import |
| nvflare/app_opt/feature_election/executor.py | 3/5 | Improved import organization but introduced duplicate LogisticRegression import |
Confidence score: 3/5
- This PR contains mostly cleanup changes but has one notable issue that should be addressed
- Score lowered due to the duplicate LogisticRegression import in the executor file that could cause import confusion
- Pay close attention to nvflare/app_opt/feature_election/executor.py for the duplicate import issue
2 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change is a focused code optimization in the feature election controller that improves efficiency and readability. Specifically, the modification extracts the calculation of min(n_additional, len(diff_scores)) into a variable k to eliminate redundant computation. This addresses a previous style comment about computing the same value twice. The change refactors lines 313-317 in the controller's weighted feature selection logic, where the partition index for selecting top-scoring features is now computed once and reused. The functionality remains identical - the algorithm still selects the top n_additional features from the difference set based on aggregated scores - but the implementation is cleaner and slightly more performant.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/controller.py | 5/5 | Code optimization that extracts redundant calculation into a variable for better efficiency and readability |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects a simple, well-focused optimization that eliminates redundant computation without changing functionality
- No files require special attention
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review. A single import statement in the flare_deployment.py file has been moved from inside the example_apply_mask_to_new_data() function to the module level. The from sklearn.datasets import make_classification import was relocated from line 193 (within a function) to line 12 at the top of the file alongside other imports. This is a minor code organization improvement that follows Python PEP 8 best practices by consolidating imports at the module level, making dependencies more visible and the codebase more maintainable.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/advanced/feature_election/flare_deployment.py | 5/5 | Moved sklearn.datasets.make_classification import from function to module level |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects a simple import reorganization with no functional changes or breaking impacts
- No files require special attention
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The changes implement a comprehensive Feature Election algorithm for NVIDIA FLARE, providing horizontal federated feature selection capabilities for tabular datasets. This work originates from the FLASH framework that achieved the best student paper award at FLTA IEEE 2025.
The implementation adds six new files: a controller extending ScatterAndGather for weighted feature aggregation, an executor handling client-side feature selection with multiple algorithms (lasso, elastic net, chi2, mutual info, RFE, random forest, PyImpetus), a high-level Feature Election class providing simulation and job configuration capabilities, comprehensive unit tests, and usage examples. The framework enables multiple clients to collaboratively identify relevant features without sharing raw data by performing local feature selection and aggregating results server-side through weighted voting with intersection/union logic.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/advanced/feature_election/basic_usage.py | 5/5 | Added NVIDIA copyright header to example file demonstrating federated feature selection usage |
| tests/unit_test/app_opt/feature_election/test_feature_election.py | 4/5 | Comprehensive unit test suite covering all feature selection methods and edge cases |
| nvflare/app_opt/feature_election/init.py | 5/5 | Module initialization file establishing clean public API and proper package structure |
| nvflare/app_opt/feature_election/controller.py | 4/5 | Server-side controller implementing weighted feature aggregation with intersection/union logic |
| examples/advanced/feature_election/flare_deployment.py | 4/5 | Production deployment example demonstrating complete workflow from setup to result retrieval |
| nvflare/app_opt/feature_election/executor.py | 4/5 | Client-side executor supporting multiple feature selection algorithms with fallback mechanisms |
| nvflare/app_opt/feature_election/feature_election.py | 3/5 | High-level API class providing simulation capabilities and FLARE job configuration generation |
Confidence score: 3/5
- This PR adds significant new functionality but has several technical issues that need addressing before merge
- Score reflects concerns about import errors, missing dependencies, duplicate imports, and potential runtime failures with chi2 method on synthetic data
- Pay close attention to nvflare/app_opt/feature_election/feature_election.py and nvflare/app_opt/feature_election/executor.py for critical fixes needed
7 files reviewed, 2 comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change adds a missing elif condition to properly handle the "ppimbc" method in the PyImpetus feature selection branch of the executor. Previously, when fs_method was set to "ppimbc", the code would fall through without creating a selector instance, likely causing runtime errors. The fix adds an explicit elif self.fs_method == "ppimbc": condition that creates a PPIMBC selector with identical parameters to the existing "pyimpetus" branch, ensuring both method names are properly supported. This change addresses a logical gap in the feature selection framework and aligns with the broader feature election system being added to NVIDIA FLARE, which supports multiple feature selection algorithms for federated tabular data processing.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| nvflare/app_opt/feature_election/executor.py | 5/5 | Added missing elif condition to handle "ppimbc" method in PyImpetus feature selection |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects a simple bug fix that addresses a clear logical gap without introducing complexity or breaking changes
- No files require special attention
1 file reviewed, no comments
|
it's ready for review as i fixed the licensing issue. Will maintain this code with more, informative examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The recent changes include fixing the requirements.txt file content, addressing import issues, correcting duplicate imports, and resolving conditional logic problems in the feature election module. The changes properly convert a markdown documentation file to actual Python requirements and fix several syntax and logic issues that were flagged in previous reviews.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| examples/advanced/feature_election/requirements.txt | 5/5 | Fixed by replacing markdown documentation with proper Python package requirements |
| nvflare/app_opt/feature_election/executor.py | 4/5 | Addressed duplicate imports and conditional logic issues but some encapsulation concerns remain |
| nvflare/app_opt/feature_election/controller.py | 4/5 | Fixed duplicate FLContext import and improved code structure |
| tests/unit_test/app_opt/feature_election/test_feature_election.py | 4/5 | Maintained comprehensive test coverage with proper parameterized testing |
| nvflare/app_opt/feature_election/feature_election.py | 4/5 | Core feature election logic remains stable with proper error handling |
| nvflare/app_opt/feature_election/init.py | 5/5 | Clean module initialization with proper API exports |
| examples/advanced/feature_election/basic_usage.py | 5/5 | Well-structured examples demonstrating feature election functionality |
| examples/advanced/feature_election/flare_deployment.py | 4/5 | Comprehensive deployment example for production usage |
Confidence score: 4/5
- This PR demonstrates significant improvement in addressing previous review concerns with most syntax and import issues resolved
- Score reflects successful resolution of critical issues like duplicate imports, requirements.txt format, and conditional logic, though some architectural concerns around method encapsulation and component path validation remain
- Pay close attention to executor.py for potential encapsulation issues and controller.py for component path validation in actual FLARE deployments
11 files reviewed, 10 comments
nvflare/app_opt/feature_election/tests/test_feature_election.py
Outdated
Show resolved
Hide resolved
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Update feature_election.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
aa86026 to
5923d66
Compare
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
examples/advanced/feature_election/requirements.txt, line 3 (link)style: Remove trailing empty line for consistent file formatting
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
10 files reviewed, 1 comment
|
Hi @chesterxgchen, @holgerroth, This PR is ready for review and from my side—all local tests pass (I do not have access to run CI, but the initial failures were happening because of missing open-source licence headers). Could you please assign two reviewers, or approve a CI test when you have a moment? Thank you for your time and help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 2 comments
nvflare/app_opt/feature_election/tests/test_feature_election.py
Outdated
Show resolved
Hide resolved
|
@christofilojohn I have simple question for you -- basically try to understand what is this Algorithm intended for. Base on the linked github repo for FLASH Github, the algorithm is a way to selecting which features to be included in training in the Federated Settings. If all sites has the same features ( intersect), then it is horizontal learning, if you select a union, then it is vertical FL. So what this Algorithm actually do (key ideas in simple terms, without much details) ? Why do we need it ? |
|
@chesterxgchen This algorithm is useful for communication efficiency, while maintaining or improving performance in Horizontal Federated Learning settings (no Vertical FL testing or setup has been performed). It's a horizontal feature selection method, that uses feature selection algorithms made for conventional Machine Learning, on each client and aggregates the (feature selection) results - meaning the selected feature sets. |
Thanks. This is very helpful. One more follow-up question. "who" is voting for the features ? since the freedom degree is chosen by "voting" and what criteria to decide the vote ? I guess that determines why choose this feature vs. another features --> feature importance. |
|
Each client (executor) has a different dataset and distribution due to the horizontal partitioning and so performing the same feature selection method (like Lasso or PyImpetus) will lead to different results. Each client selects some features and gives a preference score to all features, based on the utilized feature selection method. The server (executor) performs the aggregation of the selection vectors (binary choice and preference) based on the selected freedom degree value. The attached visual explains the process: |
|
@christofilojohn Ok. Assume the the researcher set the degree of freedom, such as 0.4, then each site will choose based on the that degree of freedom to select the features ( according certain local selection algorithms) and then send the server to aggregation. This make sense. but the graph actually confuse me, it confused me from the original github plot. if the different degree of freedom causing the feature selected from different intersection of client 1 and 2, but not client 3, we are back to Vertical FL case. But we are intended to targeting Horizontal FL case, this is seems to contradiction to the goal . |
|
Each site will choose its features independently from the freedom degree, it's only in the server (executor) aggregation that such a mechanism is necessary, to make a decision based on the client's results (think of each small circle as the client's preferences and the venn-like diagram as all the preferences that the server sees). The freedom degree tells the server how to handle these client vectors, with 0.1 meaning a more strict approach, that limits the total number of features, for example. Features at the Intersection set are always selected. So it's just a way to control the aggregation behavior, that doesn't influence each client's decision. After the election a global mask is decided and all clients use it. Each client performs feature selection locally, with each possible selector having different hyperparameters (PyImpetus being relatively plug and play). For example Lasso has a hyperparameter a (or C, depending on literature) which changes it's feature selection behavior. This approach allows existing ML pipelines with feature selection to migrate to horizontal FL, a simple pre-processing Feature selection round that reduces model parameter size for future fl rounds and a way to make any conventional feature selection method (that give us preference scores) to work in FL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
Just make me understand better. the degree of freedom is used for Server side for aggregation. no freedom (= 0) is pure intersection, more freedom ( > 0) could produce some union set. But each site are free to choose what features based on the local section algorithm. What if we have 5 same features F1 to F5 on each site ( horizontal FL). After feature section, site-1 select F1 to F4 and Site-2 selected F2 to F5. The degree of freedom will be > 0. the aggregated one is Now F1, F2, F3, F4, F5. the FL Model will be trained on the F1 to F5 even the selection on site-2 is only F2 to F5. is this correct understanding ? |
Feature Election for NVIDIA FLARE
Description
A plug-and-play horizontal federated feature selection framework for tabular datasets in NVIDIA FLARE.
This work originates from FLASH: A framework for Federated Learning with Attribute Selection and Hyperparameter optimization framework a work presented in FLTA IEEE 2025 achieving the best student paper award.
Feature election enables multiple clients with tabular datasets to collaboratively identify the most relevant features without sharing raw data. It works by using conventional Feature selection algorithms in the client side (executor) and performing a weighted aggregation of their results on the server side (controller).
FLASH is available on Github
Types of changes
./runtest.sh.