Skip to content

Conversation

@jamesaoverton
Copy link
Member

@jamesaoverton jamesaoverton commented Jul 10, 2023

Read the git history of the dashboard/dashboard-results.yml file,
sum the number of passes for each FP* check,
and generate a summary.tsv table with rows for each date.

Summary by CodeRabbit

  • New Features
    • Added a tool that generates a date-based TSV summarizing passing FP* checks per commit from dashboard history, including per-date counts and totals.
  • Chores
    • Introduced supporting automation to extract and compile historical dashboard results into a summary file.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

📝 Walkthrough

Walkthrough

Adds util/summarize.py, a script that iterates git commits affecting dashboard/dashboard-results.yml, loads each version, counts PASS statuses for FP*-prefixed checks across ontologies, and writes a date-keyed TSV (summary.tsv) with totals per check and overall totals.

Changes

Cohort / File(s) Summary of Changes
New summarization utility
util/summarize.py
Introduces functions to list commits touching dashboard/dashboard-results.yml, read YAML at each commit, aggregate PASS counts for FP\*-prefixed results and totals, and write a per-date TSV via csv.DictWriter. Uses subprocess to invoke git and PyYAML for parsing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Script as summarize.py
  participant Git as git
  participant YAML as PyYAML
  participant FS as File System

  User->>Script: Run main()
  Script->>Git: git log --follow --reverse dashboard/dashboard-results.yml
  Git-->>Script: [ (date, commit) ... ]

  loop For each commit
    Script->>Git: git show <commit>:dashboard/dashboard-results.yml
    Git-->>Script: YAML content
    Script->>YAML: yaml.safe_load(content)
    YAML-->>Script: data (ontologies, results)
    Script->>Script: summarize_run(data)\n- count PASS for FP* keys\n- compute totals
  end

  Script->>FS: Write summary.tsv (tab-delimited rows with date + counts)
  FS-->>Script: Success
  Script-->>User: Exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the addition of a script that reads dashboard results and outputs them in a table, matching the primary change specified in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch summary-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00870c9 and ea5e36f.

📒 Files selected for processing (1)
  • util/summarize.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
util/summarize.py

1-1: Shebang is present but file is not executable

(EXE001)


18-18: subprocess call: check for execution of untrusted input

(S603)


19-19: Starting a process with a partial executable path

(S607)


61-61: subprocess call: check for execution of untrusted input

(S603)


62-62: Starting a process with a partial executable path

(S607)

🔇 Additional comments (2)
util/summarize.py (2)

58-67: LGTM!

The function correctly retrieves file content from a specific commit and safely parses it using yaml.safe_load, which prevents arbitrary code execution from malicious YAML.


14-34: Confirm subprocess usage in list_commits is safe

list_commits is only invoked with a hardcoded path in main(), so there’s no untrusted input risk.

@@ -0,0 +1,94 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the script executable.

The shebang is present but the file lacks execute permissions.

Run this command to fix:

chmod +x util/summarize.py
🧰 Tools
🪛 Ruff (0.13.3)

1-1: Shebang is present but file is not executable

(EXE001)

🤖 Prompt for AI Agents
In util/summarize.py at line 1 the shebang is present but the file lacks execute
permissions; make the script executable by setting the user (and optionally
group/other) execute bit, e.g. run chmod +x util/summarize.py from the repo root
and verify the permission change with ls -l util/summarize.py.

Comment on lines +37 to +55
def summarize_run(data):
"""Given a dashboard-results YAML
return a dictionary from principles to counts
of FP* checks that passed,
and the total number of ontologies."""
result = defaultdict(int)
for ontology in data["ontologies"]:
result["total"] += 1
if "results" not in ontology:
continue
for key, value in ontology["results"].items():
if not key.startswith("FP"):
continue
if "status" not in value:
continue
if value["status"] != "PASS":
continue
result[key] += 1
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add validation for required data structure.

The function accesses data["ontologies"] without validation. If the YAML file is malformed or missing this key in an old commit, the script will crash with a KeyError.

Apply this diff to add validation:

 def summarize_run(data):
     """Given a dashboard-results YAML
     return a dictionary from principles to counts
     of FP* checks that passed,
     and the total number of ontologies."""
     result = defaultdict(int)
+    if "ontologies" not in data:
+        return result
     for ontology in data["ontologies"]:
         result["total"] += 1
         if "results" not in ontology:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def summarize_run(data):
"""Given a dashboard-results YAML
return a dictionary from principles to counts
of FP* checks that passed,
and the total number of ontologies."""
result = defaultdict(int)
for ontology in data["ontologies"]:
result["total"] += 1
if "results" not in ontology:
continue
for key, value in ontology["results"].items():
if not key.startswith("FP"):
continue
if "status" not in value:
continue
if value["status"] != "PASS":
continue
result[key] += 1
return result
def summarize_run(data):
"""Given a dashboard-results YAML
return a dictionary from principles to counts
of FP* checks that passed,
and the total number of ontologies."""
result = defaultdict(int)
if "ontologies" not in data:
return result
for ontology in data["ontologies"]:
result["total"] += 1
if "results" not in ontology:
continue
for key, value in ontology["results"].items():
if not key.startswith("FP"):
continue
if "status" not in value:
continue
if value["status"] != "PASS":
continue
result[key] += 1
return result
🤖 Prompt for AI Agents
In util/summarize.py around lines 37 to 55, the function reads
data["ontologies"] without validating input; add a guard that verifies data is a
dict and contains an "ontologies" key whose value is a list. If the check fails,
either raise a clear ValueError (e.g. "invalid dashboard-results: missing or
non-list 'ontologies'") or return an empty defaultdict(int) so the caller can
handle malformed YAML, then proceed with the existing loop using the validated
value.

Comment on lines +83 to +84
fieldnames = list(rows[-1].keys())
fieldnames.sort(key=str.casefold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle empty commits list.

If no commits are found for the file (e.g., file doesn't exist at path, or git history issue), rows[-1] will raise an IndexError.

Apply this diff to add validation:

     fieldnames = list(rows[-1].keys())
+    if not rows:
+        print(f"No commits found for {path}")
+        return
+    fieldnames = list(rows[-1].keys())
     fieldnames.sort(key=str.casefold)

Wait, let me fix the placement:

+    if not rows:
+        print(f"No commits found for {path}")
+        return
+
     fieldnames = list(rows[-1].keys())
     fieldnames.sort(key=str.casefold)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fieldnames = list(rows[-1].keys())
fieldnames.sort(key=str.casefold)
if not rows:
print(f"No commits found for {path}")
return
fieldnames = list(rows[-1].keys())
fieldnames.sort(key=str.casefold)
🤖 Prompt for AI Agents
In util/summarize.py around lines 83-84, the code assumes rows is non-empty and
uses rows[-1] which will raise IndexError when no commits are found; update the
function to first check if rows is empty and handle that case (e.g., return
early with an empty result or raise a clear ValueError/RuntimeError with a
descriptive message), otherwise proceed to compute fieldnames from rows[-1] and
sort them; ensure callers of this function can handle the new empty-case
behavior.

@anitacaron
Copy link
Collaborator

@jamesaoverton, have you generated the TSV file? Running this script would be a bit more complicated because we use a Docker image. I might run it only on the other repository used for the releases.

@jamesaoverton
Copy link
Member Author

I barely remember this PR. I wanted a graph to show improvement in the dashboard scores over time, so I wrote this script, and put it here in case it's useful.

You can do what you want with it, or just close the PR without merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants