Skip to content

ICL-ml4csec/security_datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Security Datasets

Based on Awesome-Cybersecurity-Datasets, we aim to prepare a database of security datasets of ALL kinds with PREPROCESSING available :)

Let's collectively make our lives easier when we search for data to showcase our cool methods!!

Format of documents

We will worry about the directory structure later, but the following must be included with each note:

  • The name of the dataset
  • Relevant tags for the dataset (see below)
  • A brief description of it
  • The location or instructions on how to gain access to the dataset
  • Bibtex citation for said dataset
  • Pre-processing instructions for those that do not know how to use the dataset
  • (Optional but preferred) PyTorch Dataset class for how to load the dataset - with Train/Val/Test split options

Curated list of tags!!!

IMPORTANT STANDARDIZATION: This repo will only be useful if we can accurately tag the datasets for easy lookup. Below are the features of interest and if any are updated, then the entire repository must be updated for consistency...

The tags are formated to work in Obsidian, an organisational tool that can link MD files based on these tags in a cool UI.

Tags Purpose
#network_traffic, #host
#urls, #domain_names
#malware, #binaries
#webapps, #software
#email, #fraud, #phishing, #passwords
#simulated/environment, #simulated/users, #real/attackers, #real/users

Still a work in progress...

About

Notes on security datasets of ALL kinds with PREPROCESSING available :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages