Skip to content

Fix/63 normalize category#67

Open
vikask011 wants to merge 4 commits into
ionfwsrijan:mainfrom
vikask011:fix/63-normalize-category
Open

Fix/63 normalize category#67
vikask011 wants to merge 4 commits into
ionfwsrijan:mainfrom
vikask011:fix/63-normalize-category

Conversation

@vikask011

Copy link
Copy Markdown
Contributor

Linked issue

Closes #63

What this PR does

This PR introduces a centralized category normalization utility to ensure dependency-related findings are categorized consistently across different scanners. The OSV, Semgrep, and Gitleaks scanners were updated to use the shared normalization logic, and unit tests were added to verify the expected behavior.

Type of change

  • Bug fix
  • New feature
  • ML model / training pipeline
  • Refactor (no behaviour change)
  • Documentation
  • Tests only

ML tier (if applicable)

  • Tier 1 — Triage
  • Tier 2 — Predictive
  • Tier 3 — Autonomous
  • Not ML-related

Changes

Backend

  • Added a shared category normalization utility (categories.py).
  • Updated OSV scanner to normalize dependency finding categories.
  • Updated Semgrep scanner to use centralized category normalization.
  • Updated Gitleaks scanner to use consistent category mapping.
  • Added unit tests covering category normalization behavior.

New dependencies

  • None.

Database / schema changes

  • None.

Testing

How did you test this?

  • Ran the new category normalization unit tests locally.
  • Tested the scanning workflow locally and verified findings were returned with normalized categories.
  • Verified that OSV, Semgrep, and Gitleaks findings are categorized consistently.
  • Confirmed no regressions in existing scanner behavior.

Checklist

  • Tested locally end-to-end (upload ZIP or GitHub URL → scan → findings returned correctly)
  • New ML model falls back gracefully when model file is absent
  • No new console.error or unhandled Python exceptions introduced
  • Added or updated tests where applicable
  • requirements.txt / package.json updated if new dependencies added
  • New model files (.pkl, .pt, etc.) are gitignored, not committed

Anything reviewers should focus on

Please review the category normalization logic and ensure the mappings remain consistent across all scanners and existing finding classifications.

Screenshots (if UI changed)

Not applicable.

@Krishnx21 Krishnx21 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

I found a few issues that should be addressed before merging:

  • category is passed twice to the Finding constructor in multiple files, which will raise SyntaxError: keyword argument repeated: category.
  • There are duplicate return out statements that result in unreachable code.
  • Please verify that ml_features is defined before being used.

I've left inline comments with more details.

Comment thread backend/app/scanners/gitleaks.py
Comment thread backend/app/scanners/gitleaks.py Outdated
@ionfwsrijan

Copy link
Copy Markdown
Owner

@vikask011 Fix failing tests and mentor review

@vikask011 vikask011 requested a review from Krishnx21 June 9, 2026 07:40
@ionfwsrijan

Copy link
Copy Markdown
Owner

@vikask011 Still failing checks. Join our dc server to connect with fellow contributors and mentors. Our mentors will help you out there.

https://discord.gg/FcXuyw2Rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Normalize Dependency Finding Categories

3 participants