-
Notifications
You must be signed in to change notification settings - Fork 298
Added duplicate key detection forked_projects_util.py #1982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,6 +40,20 @@ def is_fp_sorted(data): | |
| return list1 == list2 | ||
|
|
||
|
|
||
| def find_duplicate_json_keys(file_path): | ||
| from collections import Counter | ||
| import re | ||
|
|
||
| with open(file_path, "r") as f: | ||
| content = f.read() | ||
|
|
||
| # Extract all keys using regex: "key": | ||
| pattern = r'"([^"]+)"\s*:' | ||
| keys = re.findall(pattern, content) | ||
| counter = Counter(keys) | ||
| return [k for k, v in counter.items() if v > 1] | ||
|
|
||
|
|
||
| def update_fp(data, file_path): | ||
| sorted_data = dict(sorted(data.items())) | ||
|
|
||
|
|
@@ -61,8 +75,11 @@ def check_stale_fp(data, file_path, log): | |
| def run_checks_sort_fp(file_path, log): | ||
| with open(file_path, "r") as file: | ||
| data = json.load(file) | ||
| if not is_fp_sorted(data): | ||
| log_esp_error(file_path, log, "Entries are not sorted") | ||
| dups = find_duplicate_json_keys(file_path) | ||
| if dups: | ||
| log_esp_error(file_path, log, f"Duplicate in forked-projects.json keys detected: {dups}") | ||
| elif not is_fp_sorted(data): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the sorting check strictly increasing or just non-decreasing? Can we make it strictly increasing so that it subsumes the check for duplicates?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Making the sorting check strictly increasing would catch duplicates in the loaded data. But json.load() collapses duplicates when parsing, so any duplicates in the original file wouldn’t be detected. |
||
| log_esp_error(file_path, log, "Entries in forked-projects.json are not sorted") | ||
| else: | ||
| log_info(file_path, log, "There are no changes to be checked") | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Json library to read the file. It's presumably already done somewhere in this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json.load() is the standard way to parse JSON, but it automatically collapses duplicate keys by keeping only the last occurrence. Because of this behavior, any duplicate keys in the original file are lost during parsing. For that reason I did not use json.load().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying about
json.load(); I wasnt' aware of the default behavior. It seems thatjson.loads(..., object_pairs_hook=...)allows for checking for duplicates (and doesn't require you to parse keys, assuming all key-value pairs are in the same line).