-
-
Notifications
You must be signed in to change notification settings - Fork 49
Add auto extraction of FireHol lists. Closes #548 #642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add auto extraction of FireHol lists. Closes #548 #642
Conversation
- Add Celery beat schedule for weekly FireHol extraction - Register FireHolList model in Django admin - Expose firehol_categories field in Feeds API responses - Add firehol_categories to IOC admin list display - Improve error handling with specific exception types - Simplify verbose comments in firehol.py - Merge conflicting migrations from develop branch - Update serializer tests for new firehol_categories field All 267 tests passing
|
This PR follows the existing |
mlodic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
greedybear/cronjobs/firehol.py
Outdated
| self._update_ioc(line, source) | ||
|
|
||
| except requests.RequestException as e: | ||
| self.log.error(f"Network error fetching {source}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the RequestException can wrap only the requests.get, there's no need to wrap all the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move the RequestException to only wrap the network call.
greedybear/cronjobs/firehol.py
Outdated
| self.log.info(f"Processing {source} from {url}") | ||
| try: | ||
| response = requests.get(url, timeout=60) | ||
| if response.status_code != 200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a response.raise_for_status() is enough and more comprehensive that this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Will use raise_for_status() for cleaner error handling.
greedybear/cronjobs/firehol.py
Outdated
| except Exception as e: | ||
| self.log.exception(f"Unexpected error processing {source}: {e}") | ||
|
|
||
| def _update_ioc(self, ip_address, source): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't update the already existing IOCs because the extracted information from FireHol is to be considered as "new" intelligence, so, on the contrary, I would instead touch where a new IOC is usually added by the already existing normal routines. There I expect a query of that newly found IP address in the FireHolList, and ONLY for the recently added IP addresses (the "added" parameter should queried). If it's present, then populate the firehol_categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering this, we should also add an additional routine at the end of this cron that deletes the old FireHol items because they are not useful anymore in the database. That database could become really big so it is important to keep it clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to only enrich newly discovered IOCs rather than retroactively updating all existing ones. Will:
- Remove immediate IOC updates during extraction
- Add a batch enrichment step for recently added IOCs
- Add cleanup routine for old FireHolList entries
Thanks for the guidance!
- Extract base_path variable for FireHol URLs - Narrow RequestException scope to only wrap network call - Use raise_for_status() for cleaner HTTP error handling - Only enrich recently added IOCs (within 24h) instead of all existing ones - Add cleanup routine to delete FireHolList entries older than 30 days - Update tests to match new enrichment behavior All 267 tests passing
mlodic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great progress!
greedybear/cronjobs/firehol.py
Outdated
| self._cleanup_old_entries() | ||
|
|
||
| def _enrich_recent_iocs(self): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my idea was to leverage the firehol data for fresh new data: to do that we need to touch the "iocs_from_hits" method where we actually save newly found IOCs. This is important to keep the data consistent.
In that case, since a version is deployed, we will have the new IOCs populated with the new data.
The old ones, even if collected in the last day, should not be touched because the information collected from firehol should be considered fresh only at the time of extraction. So this method must be moved where the IOC objects is populated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, enriching recent IOCs retroactively could lead to inconsistencies.
I'll move the FireHol enrichment logic to the iocs_from_hits function where IOC objects are initially created. Now FireHol categories are populated at IOC creation time.
The _enrich_recent_iocs method has been removed from the FireHolCron class as it's no longer needed.
Changes made:
- greedybear/cronjobs/extraction/utils.py: Added FireHol enrichment at IOC creation
- greedybear/cronjobs/firehol.py: Removed
_enrich_recent_iocsmethod
| "blocklist_de": f"{base_path}/blocklist_de.ipset", | ||
| "greensnow": f"{base_path}/greensnow.ipset", | ||
| "bruteforceblocker": f"{base_path}/bruteforceblocker.ipset", | ||
| "dshield": f"{base_path}/dshield.netset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah one thing about this. This is a netset so there won't be any match with the current logic. For netsets, you should use the ipaddress library to check whether and IPAddress is inside an IPNetwork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! The dshield.netset contains network ranges, not individual IPs, so the current exact match logic won't work.
I'll update the enrichment logic to use the ipaddress library to check network membership. Specifically, I need to:
- When looking up FireHol categories in
iocs_from_hits, check if the IP address is contained within any of the stored network ranges - Use
ipaddress.ip_address()andipaddress.ip_network()to perform proper CIDR matching
I'll push an update shortly that handles both:
- Exact IP matches (for .ipset files like blocklist_de, greensnow, bruteforceblocker)
- Network range membership (for .netset files like dshield)
Thanks for pointing this out!
- Move FireHol category enrichment from separate job step to iocs_from_hits() where IOCs are created, ensuring only fresh data is applied at extraction time - Add support for CIDR network ranges (netsets) using ipaddress library - Remove _enrich_recent_iocs() method as enrichment now happens at IOC creation - Update enrichment logic to handle both exact IP matches (.ipset) and network range membership (.netset) for proper dshield.netset support - Update test to reflect new behavior where FireHolCron only downloads data, enrichment happens automatically during IOC creation
Description
Implements auto-extraction of FireHol lists to enhance IOC classification and improve threat intelligence.
This feature enables GreedyBear to:
Changes
firehol_categoriesfield to IOC model for classification metadatablocklist_de: IP addresses involved in attacksgreensnow: Known scanning IPsbruteforceblocker: Brute force attack sourcesdshield: DShield top attackers (CIDR blocks)firehol_categoriesin Feeds API responsesRelated issues
Closes #548
Type of change
Checklist
develop.Black,Flake,Isort) gave 0 errors.