Updating SDE comparison mode to accept two manifests#15500
Updating SDE comparison mode to accept two manifests#15500
Conversation
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
…. Updaetd logging system Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: Jorjeous <Jorjeous@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
… sharding with separate numeration. Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: Jorjeous <Jorjeous@users.noreply.github.com>
| import operator | ||
| import os | ||
| import pickle | ||
| import tarfile |
Check notice
Code scanning / CodeQL
Unused import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
To fix an unused import, you remove the import statement (or, if the symbol is meant to be used, you instead add corresponding code that uses it). Here, the simplest and safest way, without changing functionality, is to delete the import tarfile line.
Concretely, in tools/speech_data_explorer/data_explorer.py, remove line 27 containing import tarfile. No additional methods, imports, or definitions are needed. All other imports and code remain unchanged.
| @@ -24,7 +24,6 @@ | ||
| import operator | ||
| import os | ||
| import pickle | ||
| import tarfile | ||
| import tempfile | ||
| from collections import defaultdict | ||
| from os.path import expanduser |
| for line in lines[1:]: # Skip header line | ||
| parts = line.split() | ||
| if len(parts) >= 4: | ||
| file_type = parts[0] |
Check notice
Code scanning / CodeQL
Unused local variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
In general, to fix an unused local variable you either remove the variable entirely or rename it to a clearly “unused” name (such as _ or unused_file_type) if it is kept for documentation or unpacking purposes. This preserves code clarity and avoids misleading readers into thinking the value is relevant to the logic.
Here, the right-hand side (parts[0]) has no side effects, so we can safely remove just the file_type assignment without affecting behavior. The best minimal fix is to delete the file_type = parts[0] line, leaving offset, size, and filename unchanged. No additional imports or definitions are needed, and no other parts of parse_dali_index depend on file_type. All edits occur in tools/speech_data_explorer/data_explorer.py within the shown parse_dali_index function.
| @@ -277,7 +277,6 @@ | ||
| for line in lines[1:]: # Skip header line | ||
| parts = line.split() | ||
| if len(parts) >= 4: | ||
| file_type = parts[0] | ||
| offset = int(parts[1]) | ||
| size = int(parts[2]) | ||
| filename = parts[3] |
|
|
||
| # Handle dual-manifest mode: merge the two manifests into one temp manifest | ||
| # and rewrite names_compared to use the auto-generated pred_text_{name} field names. | ||
| _merged_tmp = None # keep reference so temp file is not deleted |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 days ago
To fix this while preserving behavior, keep the assignment (so the temp file object stays referenced) but rename the variable to follow the project’s “intentionally unused” naming pattern. That makes it clear that the variable is only there for its side effect (controlling object lifetime) and satisfies CodeQL.
Concretely, in tools/speech_data_explorer/data_explorer.py around line 1311, change _merged_tmp to something like _unused_merged_tmp_ref both in its declaration and where the return from merge_manifests is unpacked. No imports or additional definitions are needed. Functionality is unchanged: the temporary file’s reference is still kept alive; only the variable name changes.
| @@ -1308,10 +1308,10 @@ | ||
|
|
||
| # Handle dual-manifest mode: merge the two manifests into one temp manifest | ||
| # and rewrite names_compared to use the auto-generated pred_text_{name} field names. | ||
| _merged_tmp = None # keep reference so temp file is not deleted | ||
| _unused_merged_tmp_ref = None # keep reference so temp file is not deleted | ||
| if dual_manifest_mode: | ||
| model_name_1, model_name_2 = args.names_compared | ||
| merged_manifest_path, _merged_tmp = merge_manifests(args.manifest[0], args.manifest[1], model_name_1, model_name_2) | ||
| merged_manifest_path, _unused_merged_tmp_ref = merge_manifests(args.manifest[0], args.manifest[1], model_name_1, model_name_2) | ||
| data_filename = merged_manifest_path | ||
| args.names_compared = [f'pred_text_{model_name_1}', f'pred_text_{model_name_2}'] | ||
| logging.info(f"Dual-manifest mode: using merged manifest at {merged_manifest_path}") |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information