Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the prior Streamlit + SQLite “experiment” with a Click-based CLI workflow that treats the POA&M Excel workbook as the source of truth, generating diffs from weekly scanner outputs (Trivy/ZAP/CIS) and applying them back to the spreadsheet.
Changes:
- Added a CLI (
cli/cli.py) to run conversions, compute diffs, and apply updates to the POA&M workbook (including an interactive weekly workflow). - Introduced normalized
Finding/PoamEntrydata structures and shared diff/apply logic, plus scanner-specific adapters for Trivy, ZAP, and CIS. - Removed Streamlit UI, database schema/components, and related static assets; updated dependencies and docs accordingly.
Reviewed changes
Copilot reviewed 47 out of 49 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/zap/poam_generator.py | Generates ZAP POAM entries + IDs from ZAP findings. |
| tools/zap/diff.py | Compares ZAP findings to existing ZAP POAMs in Excel. |
| tools/zap/alerts.py | Converts ZAP CSV report rows into normalized Finding objects and JSON output. |
| tools/zap/init.py | Exposes ZAP conversion entry point. |
| tools/utils.py | Adds shared working-directory helper for CLI/tools. |
| tools/trivy/poam_generator.py | Generates Trivy POAM entries + IDs from Trivy findings. |
| tools/trivy/importer.py | Imports Trivy findings from a CSV into Finding objects. |
| tools/trivy/diff.py | Compares Trivy findings to existing Trivy POAMs in Excel. |
| tools/trivy/alerts.py | Converts GitHub code-scanning alerts JSON into POAM-flavored CSV (via jq). |
| tools/poam.py | Adds POAM Excel reader (PoamFile) + PoamEntry normalization utilities. |
| tools/github.py | Adds GitHub alerts download via gh api. |
| tools/findings.py | Adds normalized Finding dataclass + dictionary loader. |
| tools/diff_apply.py | Applies diff JSON outputs back into the POAM Excel workbook (open/closed/config findings). |
| tools/diff.py | Adds shared matching/diffing engine and diff JSON serialization. |
| tools/cis/splitter.py | Splits CIS “connected sheet” Excel into per-date CSV files. |
| tools/cis/poam_generator.py | Generates CIS POAM entries/IDs for configuration findings workflow. |
| tools/cis/diff.py | Diffs CIS findings vs “Configuration Findings” sheet entries. |
| tools/cis/converter.py | Converts CIS CSVs into normalized findings JSON. |
| tools/init.py | Initializes the tools package. |
| cli/cli.py | Implements Click CLI commands for Trivy/ZAP/CIS and POAM diff/apply workflows. |
| cli/init.py | Initializes the CLI package. |
| tests/test_poam_generator.py | Adds tests for Trivy POAM generation and ID sequencing. |
| tests/test_poam.py | Adds tests for convert_to_snake_case. |
| tests/test_diff_apply.py | Adds tests for merging diff JSON files. |
| tests/test_diff.py | Adds tests for matching logic and configuration findings behavior. |
| tests/conftest.py | Ensures project root is on sys.path for tests. |
| tests/cis/test_poam_generator.py | Adds tests for CIS POAM generator grouping behavior. |
| tests/cis/init.py | Initializes CIS tests package. |
| tests/init.py | Initializes tests package. |
| requirements.txt | Removes Streamlit stack; adds click/openpyxl/yaml/jq for CLI flow. |
| README.md | Updates top-level documentation to CLI-oriented workflow. |
| CLAUDE.md | Adds repository guidance for CLI + POA&M workflow. |
| .gitignore | Ignores working/ and macOS .DS_Store. |
| app/static/style.css | Removes Streamlit styling (no longer needed). |
| app/pages/settings.py | Removes Streamlit settings page. |
| app/pages/5_poam_export.py | Removes Streamlit POAM export page. |
| app/pages/4_settings.py | Removes Streamlit settings page variant. |
| app/pages/3_benchmark_detail.py | Removes Streamlit benchmark detail page. |
| app/pages/2_scan_history.py | Removes Streamlit scan history page. |
| app/pages/1_issue_detail.py | Removes Streamlit issue detail page. |
| app/main.py | Removes Streamlit application entry point. |
| app/database/schema.py | Removes SQLite schema and DB lifecycle code. |
| app/database/init.py | Removes database package init. |
| app/components/poam_exports.py | Removes Streamlit POAM export generation component. |
| app/components/data_processor.py | Removes Streamlit upload/data processing logic tied to SQLite. |
| app/components/init.py | Removes components package init. |
| app/components/IssuesTable.py | Removes Streamlit UI table component. |
| app/components/IssuesList.py | Removes Streamlit UI list component. |
| app/init.py | Removes Streamlit app package init. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "comments": "Comments", | ||
| "auto_approve": "Auto Approve", | ||
| "binding_operational_directive_22_01_tracking": "Binding Operational Directive 22-01 Tracking", | ||
| "binding_operational_directive_22_01_due_date": "Binding Operational Directive 22-01 Due Date", |
There was a problem hiding this comment.
dict_to_row() maps auto_approve and binding_operational_directive_22_01_tracking to header names that don't match the header strings used elsewhere in this repo (e.g., Trivy CSV uses "Auto-Approve" and "Binding Operational Directive 22-01 tracking"). Because apply_diff() only writes cells when the mapped header exists in open_headers, these fields will be silently skipped in the output workbook. Update the mapping keys to exactly match the POAM spreadsheet header text used by your inputs/templates.
| if diff_json.get("reopen_poams"): | ||
| reopen_ids = {p["poam_id"] for p in diff_json["reopen_poams"]} | ||
| poam_id_col = next(col for header, col in open_headers.items() if header == "POAM ID") | ||
|
|
||
| # Find and move rows | ||
| rows_to_delete = [] | ||
| for row in range(header_row + 1, closed_sheet.max_row + 1): | ||
| poam_id = closed_sheet.cell(row=row, column=poam_id_col).value | ||
| if poam_id in reopen_ids: |
There was a problem hiding this comment.
When reopening POAMs, poam_id_col is derived from open_headers but then used to read values from closed_sheet. If the "POAM ID" column position differs between the Open and Closed sheets, this will read the wrong column and fail to reopen items. Use closed_headers["POAM ID"] when scanning the closed sheet (and keep open_headers for writing to the open sheet).
| # Check WORKING environment variable first | ||
| working_env = os.getenv('WORKING') | ||
| if working_env: | ||
| working_dir = Path(working_env) | ||
| else: | ||
| # Fall back to pwd/working | ||
| working_dir = Path(os.getcwd()) / 'working' | ||
|
|
||
| working_dir.mkdir(exist_ok=True) | ||
| return working_dir No newline at end of file |
There was a problem hiding this comment.
ensure_working_dir() uses working_dir.mkdir(exist_ok=True) without parents=True. If WORKING is set to a nested path (e.g., working/2026-03-05), this will fail unless the parent directories already exist. Use mkdir(parents=True, exist_ok=True) to make the helper robust for both default and env-provided paths.
tools/zap/diff.py
Outdated
| def format_datetime(dt: datetime | str) -> str: | ||
| if isinstance(dt, str): | ||
| return dt | ||
| else: | ||
| return dt.strftime("%Y-%m-%d") if dt else None |
There was a problem hiding this comment.
This file also uses the PEP 604 union operator (datetime | str), which is invalid syntax on Python 3.9 (the repo’s stated minimum). This will prevent the module from importing under 3.9. Use typing.Union[datetime, str] (or similar) instead.
cli/cli.py
Outdated
| """Download Trivy alerts from GitHub code scanning API. | ||
|
|
||
| If destination is not specified, uses WORKING environment variable or pwd/working | ||
| and sets filename to trivy-alerts-<date>.json | ||
|
|
||
| Requires one of: | ||
| 1. GitHub CLI (gh) to be installed and authenticated via 'gh auth login' | ||
| 2. GitHub token provided via --token option or GITHUB_TOKEN environment variable |
There was a problem hiding this comment.
The download-alerts help text claims support for a GitHub token via --token or GITHUB_TOKEN, but the command implementation (and tools/github.py) only uses the gh CLI and does not define a --token option. Either implement token-based auth in download_trivy_alerts or update the docstring to avoid misleading users.
| """Download Trivy alerts from GitHub code scanning API. | |
| If destination is not specified, uses WORKING environment variable or pwd/working | |
| and sets filename to trivy-alerts-<date>.json | |
| Requires one of: | |
| 1. GitHub CLI (gh) to be installed and authenticated via 'gh auth login' | |
| 2. GitHub token provided via --token option or GITHUB_TOKEN environment variable | |
| """Download Trivy alerts from the GitHub code scanning API. | |
| If destination is not specified, uses WORKING environment variable or pwd/working | |
| and sets filename to trivy-alerts-<date>.json. | |
| Requires the GitHub CLI (gh) to be installed and authenticated via 'gh auth login'. |
cli/cli.py
Outdated
| @click.argument('poam_file', type=click.Path(exists=True)) | ||
| @click.argument('findings_file', type=click.Path(exists=True)) | ||
| @click.option('--json-output', type=click.Path(), help='Path to save JSON output') | ||
| def alerts_diff(poam_file: str, findings_file: str, json_output: Optional[str]) -> None: |
There was a problem hiding this comment.
cli.py defines three different Click commands using the same Python function name alerts_diff (Trivy/ZAP/CIS). While Click will still register the earlier functions at decoration time, reusing the same name makes stack traces, debugging, and static analysis confusing. Rename the functions (e.g., trivy_alerts_diff, zap_alerts_diff, cis_alerts_diff) while keeping the command names the same.
| def alerts_diff(poam_file: str, findings_file: str, json_output: Optional[str]) -> None: | |
| def zap_alerts_diff(poam_file: str, findings_file: str, json_output: Optional[str]) -> None: |
README.md
Outdated
| ### Available Commands | ||
|
|
||
| 3. Use the application: | ||
| - Upload your weekly CSV file using the file uploader | ||
| - Set the analysis date | ||
| - View the summary statistics and visualizations | ||
| - Track recurrances and resolutions through additional scans and uploads. | ||
| - Export findings as needed | ||
| 1. Download Google Sheets: | ||
| ```bash | ||
| # Using a Google Sheets URL | ||
| ./cli/cli.py download-gsheet "https://docs.google.com/spreadsheets/d/YOUR_SHEET_ID" | ||
|
|
||
| # Or using just the file ID | ||
| ./cli/cli.py download-gsheet "YOUR_SHEET_ID" | ||
| ``` |
There was a problem hiding this comment.
README documents a ./cli/cli.py download-gsheet ... command, but there is no corresponding Click command implemented in cli/cli.py (and no other module defines it). This will cause confusion for users following the README. Either add the command or remove/update this section to reflect the actual CLI surface.
| except Exception as e: | ||
| # If anything goes wrong, leave the half-edited copy for inspection | ||
| raise type(e)(f"Error applying diff changes. Incomplete edit saved as {editable_copy}. Error: {str(e)}") from e | ||
| finally: | ||
| wb.close() | ||
|
|
There was a problem hiding this comment.
The finally block always calls wb.close(), but wb is only defined after openpyxl.load_workbook() succeeds. If an exception occurs before wb is assigned, this will raise UnboundLocalError and mask the original failure. Initialize wb = None before the try, and guard the close (or use a context manager / contextlib.closing).
tools/diff.py
Outdated
| def format_datetime(dt: datetime | str) -> str: | ||
| if isinstance(dt, str): | ||
| return dt | ||
| else: | ||
| return dt.strftime("%Y-%m-%d") if dt else None |
There was a problem hiding this comment.
This uses the PEP 604 union operator (datetime | str), which is only valid syntax on Python 3.10+. The repo targets Python >=3.9 (see requirements), so this will raise a SyntaxError on 3.9. Replace with Union[datetime, str] (and import Union) or otherwise avoid | unions in runtime type annotations.
CLAUDE.md
Outdated
| - `diff.py` — calls `compare_findings_to_poams()` with the right POAM filter and generator | ||
| - `poam_generator.py` — generates `PoamEntry` objects with appropriate POAM IDs | ||
|
|
||
| POAM ID formats: Trivy → `YYYY-TRIVYXXXX`, CIS → `CIS-<CIS_ID>-XXXX` |
There was a problem hiding this comment.
CLAUDE.md states CIS POAM IDs use CIS-<CIS_ID>-XXXX, but the CIS POAM generator in this PR produces IDs in the YYYY-CISXXXX form (and tests use 2025-CIS...). Update this line to match the actual ID scheme to avoid propagating incorrect guidance.
| POAM ID formats: Trivy → `YYYY-TRIVYXXXX`, CIS → `CIS-<CIS_ID>-XXXX` | |
| POAM ID formats: Trivy → `YYYY-TRIVYXXXX`, CIS → `YYYY-CISXXXX` |
Fully replace the streamlit experiment with a CLI that uses the POAMs spreadsheet as the source of truth from week to week, rather than requiring a separately maintained database.
Fixes #6