Skip to content

feat(malware-check): add whitespace check to detect excessive spacing and invisible characters #1086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/macaron/resources/pypi_malware_rules/obfuscation.yaml
Original file line number Diff line number Diff line change
@@ -311,3 +311,13 @@ rules:
- pattern: os.writev(...)
- pattern: os.pwrite(...)
- pattern: os.pwritev(...)

- id: obfuscation_excessive-spacing
metadata:
description: Detects the use of excessive spacing in code, which may indicate obfuscation or hidden code.
message: Hidden code after excessive spacing
languages:
- python
severity: WARNING
pattern-either:
- pattern-regex: '[ \t\n\r\f\v]{50,}[^ \t\n\r\f\v]+' # The 50 here is the threshold for excessive spacing , more than that is considered obfuscation
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright (c) 2025 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""
Running this code will not produce any malicious behavior, but code isolation measures are
in place for safety.
"""

import sys

# ensure no symbols are exported so this code cannot accidentally be used
__all__ = []
sys.exit()

def test_function():
"""
All code to be tested will be defined inside this function, so it is all local to it. This is
to isolate the code to be tested, as it exists to replicate the patterns present in malware
samples.
"""
sys.exit()

# excessive spacing obfuscation
def excessive_spacing_flow():
print("Hello world!")
Original file line number Diff line number Diff line change
@@ -229,6 +229,21 @@
"end": 68
}
]
},
"src.macaron.resources.pypi_malware_rules.obfuscation_excessive-spacing": {
"message": "Hidden code after excessive spacing",
"detections": [
{
"file": "obfuscation/excessive_spacing.py",
"start": 24,
"end": 25
},
{
"file": "obfuscation/inline_imports.py",
"start": 27,
"end": 27
}
]
}
},
"disabled_sourcecode_rule_findings": {}
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@ def test_function():
__import__('builtins')
__import__('subprocess')
__import__('sys')
__import__('os')
print("Hello world!") ;__import__('os')
__import__('zlib')
__import__('marshal')
# these both just import builtins