This project provides tools to scrape and manage data from the MITRE ATT&CK framework, specifically designed to scrape, search, and display information about cyber threat groups listed on the MITRE ATT&CK website.
mitre-attack-scraper/
│
├── .git/ # Git repository files
├── .github/ # GitHub-specific files
│ └── workflows/ # GitHub Actions workflows
│ └── ci.yml # Continuous integration setup
├── .gitignore # Specifies intentionally untracked files to ignore
├── LICENSE # The license for the project
├── README.md # Project documentation
├── setup.py # Setup script for the project installation
├── requirements.txt # Python package dependencies
│
├── src/ # Source code directory
│ ├── __init__.py # Makes src a Python package
│ ├── scraper.py # Scraper implementation
│ ├── search.py # Search functionality
│ └── utils.py # Utility functions
│
├── tests/ # Unit tests directory
│ ├── __init__.py # Makes tests a Python package
│ ├── test_scraper.py # Test cases for the scraper
│ └── test_search.py # Test cases for the search functionality
│
├── data/ # Data files
│ ├── groups_info/ # Directory to store data.json
│ └── groups/ # Directory to store individual group data
├── docs/ # Documentation files
| ├── CODE_OF_CONDUCT.md # Code of Conduct
| ├── CONTRIBUTING.md # Contributions
│ ├── DEVELOPER_GUIDE.md # Installation guide
│ ├── INSTALLATION_GUIDE.md # Installation guide
│ ├── RELEASE_NOTES.md # Release notes
│ ├── USER_GUIDE.md # User guide
└── notebooks/ # Jupyter notebooks for exploration
For detailed installation instructions, please see the Installation Guide.
The scraper.py
script is responsible for scraping the MITRE ATT&CK website and saving group data to a JSON file.
python src/scraper.py --output data/groups_info/data.json
--output
: Specifies the file path to save the scraped data. Default isdata/groups_info/data.json
.
The search.py
script provides functionality to search for group information based on ID, name, or associated groups. It also caches search results in data/groups/{GROUPID}
.
-
Search by Name:
python src/search.py --name zirconium --input data/groups_info/data.json
-
Search by ID:
python src/search.py --id g0128 --input data/groups_info/data.json
-
Search by Associate:
python src/search.py --associate "Violet Typhoon" --input data/groups_info/data.json
--name
: Search for a group by its name.--id
: Search for a group by its ID.--associate
: Search for all groups mentioning a specific associate.--input
: Specifies the file path to read the group data. Default isdata/groups_info/data.json
.
The utils.py
script contains utility functions shared between the scraper and search scripts.
load_groups_data(filepath)
: Loads group data from a JSON file.display_group_info(group)
: Displays information about a specific group in a formatted manner.cache_group_data(group, cache_dir)
: Caches group data in a specified directory.
Detailed documentation can be found in the docs
directory:
The data is stored in JSON format, with each group having the following structure:
{
"ID": "G0092",
"Name": "TA505",
"Associated Groups": "Hive0065, Spandex Tempest, CHIMBORAZO",
"Description": "TA505 is a cyber criminal group that has been active since at least 2014. TA505 is known for frequently changing malware, driving global trends in criminal malware distribution, and ransomware campaigns involving Clop."
}
Contributions are welcome! Please read our Developer Guide for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
This project uses the MITRE ATT&CK framework. For more information, visit the MITRE ATT&CK website.