Skip to content

feat(malware-check): add whitespace check to detect excessive spacing and invisible characters #1086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AmineRaouane
Copy link
Member

@AmineRaouane AmineRaouane commented May 19, 2025

Summary

This PR adds a new heuristic that analyzes code to detect suspicious use of excessive spaces and invisible characters. It checks whether the amount of spacing and invisible Unicode characters exceeds a defined threshold.

Description of changes

  • Implemented the WhiteSpaces heuristic in a new Python module.
  • Registered the new heuristic inside the main heuristics.py file.
  • Created unit tests to verify the behavior of the WhiteSpacesAnalyzer heuristic.
  • Updated detect_malicious_metadata_check.py to integrate and execute the new heuristic logic during analysis.
  • The heuristic scans the codebase for abnormal invisible characters and spaces in the code.
  • This heuristic is combined with ForceSetup to justify high confidence in detection, as the presence of extra spaces alone could be due to poor formatting rather than malicious intent.

Related issues

None

Checklist

  • I have reviewed the contribution guide.
  • My PR title and commits follow the Conventional Commits convention.
  • My commits include the "Signed-off-by" line.
  • I have signed my commits following the instructions provided by GitHub. Note that we run GitHub's commit verification tool to check the commit signatures. A green verified label should appear next to all of your commits on GitHub.
  • I have updated the relevant documentation, if applicable.
  • I have tested my changes and verified they work as expected.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 19, 2025
@behnazh-w behnazh-w changed the title feat(heuristics): add Whitespace Check to detect excessive spacing and invisible characters. feat(malware-check): add whitespace Check to detect excessive spacing and invisible characters. May 20, 2025
@behnazh-w behnazh-w changed the title feat(malware-check): add whitespace Check to detect excessive spacing and invisible characters. feat(malware-check): add whitespace check to detect excessive spacing and invisible characters. May 20, 2025
@behnazh-w behnazh-w changed the title feat(malware-check): add whitespace check to detect excessive spacing and invisible characters. feat(malware-check): add whitespace check to detect excessive spacing and invisible characters May 20, 2025
@behnazh-w behnazh-w requested a review from art1f1c3R May 26, 2025 05:15
@AmineRaouane AmineRaouane force-pushed the white-spaces-heuristic branch from db0e35c to 6978bd7 Compare May 26, 2025 10:58
@@ -600,3 +600,5 @@ major_threshold = 20
epoch_threshold = 3
# The number of days +/- the day of publish the calendar versioning day may be.
day_publish_error = 4
# THe threshold for the number of repeated spaces in a line from the source code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, "The" instead of "THe"

@@ -381,6 +383,10 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
failed({Heuristics.CLOSER_RELEASE_JOIN_DATE.value}),
forceSetup.

% Package released with excessive whitespace in the code .
{Confidence.HIGH.value}::trigger(malware_high_confidence_4) :-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you walk me through the rationale of why we should combine WHITE_SPACES failing with the forceSetup rule and why these rules together are a malicious indicator with HIGH confidence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it implies, not only that harmful code may be executed during the installation process, but also that there appears to be an effort to hide the form of this code which strongly suggests malicious motives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants