-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Hey 👋
I'd like to discuss integrating my collection of malicious PyPI packages with the OSSF dataset - especially if it makes sense to think about it.
The dataset is based on my private hunting and I do it already for more than a year - with the automated publishing to the repo since summer last year. All published reports are manually verified. I collect clearly malicious as well as more unclear cases (categorized; my idea would be to go with malicious and eventually think about "probably pentest" aka potentially research or pentest activities with low harm).
Besides the question if you're willing to accept data from a private person (fully understand if not), I see so far following potential issues:
- data not in OSV format - this is easy to change,
- no versions tracking - I'm targeting packages intended for malicious purposes, not hijacked versions, so it wasn't a need so far. I do think about including versions in the future, but it may create additional work for me to verify each version.
- descriptions are organized over campaigns, including 1 or more packages. It means that sometimes the description may not exactly fit each package if it belongs to a campaign that is split over multiple packages or evolves (examples: [1], [2])
- I'm not always clearly marking if the malicious action is on installation, importing or calling a specific function.
- Some campaigns can be too generic (e.g. https://bad-packages.kam193.eu/pypi/campaign/GENERIC-standard-pypi-install-pentest/)
- There is currently no S3 with data to download, just the public repository.
Please let me know what you think, if it makes sense to work on integrating my collection into OSSF dataset, and if you see other issues :)
Here are some current statistics: https://bad-packages.kam193.eu/pypi/