Skip to content

Conversation

@opbot-xd
Copy link
Contributor

@opbot-xd opbot-xd commented Dec 22, 2025

Description

Implements auto-extraction of FireHol lists to enhance IOC classification and improve threat intelligence.

This feature enables GreedyBear to:

  • Track unique IPs: Identify IOCs that aren't widespread mass scanners
  • Classify threat sources: Categorize IPs based on FireHol list categories (brute force, spam, etc.)
  • Enrich feeds: Provide users with more context about the nature of threats

Changes

  • Created FireHolList model to store IPs from FireHol blocklists
  • Added firehol_categories field to IOC model for classification metadata
  • Implemented FireHolCron job to fetch 4 key lists weekly:
    • blocklist_de: IP addresses involved in attacks
    • greensnow: Known scanning IPs
    • bruteforceblocker: Brute force attack sources
    • dshield: DShield top attackers (CIDR blocks)
  • Added extract_firehol_lists Celery task with weekly scheduling (Sundays 4:15 AM)
  • Registered FireHolList in Django admin for data management
  • Exposed firehol_categories in Feeds API responses
  • Added comprehensive unit tests with 100% coverage

Related issues

Closes #548

Type of change

  • New feature (non-breaking change which adds functionality).

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linters (Black, Flake, Isort) gave 0 errors.
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated.
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.

- Add Celery beat schedule for weekly FireHol extraction
- Register FireHolList model in Django admin
- Expose firehol_categories field in Feeds API responses
- Add firehol_categories to IOC admin list display
- Improve error handling with specific exception types
- Simplify verbose comments in firehol.py
- Merge conflicting migrations from develop branch
- Update serializer tests for new firehol_categories field

All 267 tests passing
@opbot-xd
Copy link
Contributor Author

This PR follows the existing MassScanners pattern for consistency with the codebase architecture.

@opbot-xd opbot-xd marked this pull request as ready for review December 23, 2025 21:23
@regulartim regulartim requested a review from mlodic December 24, 2025 08:29
Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

self._update_ioc(line, source)

except requests.RequestException as e:
self.log.error(f"Network error fetching {source}: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the RequestException can wrap only the requests.get, there's no need to wrap all the logic

Copy link
Contributor Author

@opbot-xd opbot-xd Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move the RequestException to only wrap the network call.

self.log.info(f"Processing {source} from {url}")
try:
response = requests.get(url, timeout=60)
if response.status_code != 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a response.raise_for_status() is enough and more comprehensive that this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Will use raise_for_status() for cleaner error handling.

except Exception as e:
self.log.exception(f"Unexpected error processing {source}: {e}")

def _update_ioc(self, ip_address, source):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't update the already existing IOCs because the extracted information from FireHol is to be considered as "new" intelligence, so, on the contrary, I would instead touch where a new IOC is usually added by the already existing normal routines. There I expect a query of that newly found IP address in the FireHolList, and ONLY for the recently added IP addresses (the "added" parameter should queried). If it's present, then populate the firehol_categories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering this, we should also add an additional routine at the end of this cron that deletes the old FireHol items because they are not useful anymore in the database. That database could become really big so it is important to keep it clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to only enrich newly discovered IOCs rather than retroactively updating all existing ones. Will:

  1. Remove immediate IOC updates during extraction
  2. Add a batch enrichment step for recently added IOCs
  3. Add cleanup routine for old FireHolList entries
    Thanks for the guidance!

- Extract base_path variable for FireHol URLs
- Narrow RequestException scope to only wrap network call
- Use raise_for_status() for cleaner HTTP error handling
- Only enrich recently added IOCs (within 24h) instead of all existing ones
- Add cleanup routine to delete FireHolList entries older than 30 days
- Update tests to match new enrichment behavior

All 267 tests passing
@opbot-xd opbot-xd requested a review from mlodic December 24, 2025 13:48
Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great progress!

self._cleanup_old_entries()

def _enrich_recent_iocs(self):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea was to leverage the firehol data for fresh new data: to do that we need to touch the "iocs_from_hits" method where we actually save newly found IOCs. This is important to keep the data consistent.
In that case, since a version is deployed, we will have the new IOCs populated with the new data.
The old ones, even if collected in the last day, should not be touched because the information collected from firehol should be considered fresh only at the time of extraction. So this method must be moved where the IOC objects is populated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, enriching recent IOCs retroactively could lead to inconsistencies.

I'll move the FireHol enrichment logic to the iocs_from_hits function where IOC objects are initially created. Now FireHol categories are populated at IOC creation time.

The _enrich_recent_iocs method has been removed from the FireHolCron class as it's no longer needed.

Changes made:

"blocklist_de": f"{base_path}/blocklist_de.ipset",
"greensnow": f"{base_path}/greensnow.ipset",
"bruteforceblocker": f"{base_path}/bruteforceblocker.ipset",
"dshield": f"{base_path}/dshield.netset",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah one thing about this. This is a netset so there won't be any match with the current logic. For netsets, you should use the ipaddress library to check whether and IPAddress is inside an IPNetwork

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! The dshield.netset contains network ranges, not individual IPs, so the current exact match logic won't work.

I'll update the enrichment logic to use the ipaddress library to check network membership. Specifically, I need to:

  1. When looking up FireHol categories in iocs_from_hits, check if the IP address is contained within any of the stored network ranges
  2. Use ipaddress.ip_address() and ipaddress.ip_network() to perform proper CIDR matching

I'll push an update shortly that handles both:

  • Exact IP matches (for .ipset files like blocklist_de, greensnow, bruteforceblocker)
  • Network range membership (for .netset files like dshield)

Thanks for pointing this out!

- Move FireHol category enrichment from separate job step to iocs_from_hits()
  where IOCs are created, ensuring only fresh data is applied at extraction time
- Add support for CIDR network ranges (netsets) using ipaddress library
- Remove _enrich_recent_iocs() method as enrichment now happens at IOC creation
- Update enrichment logic to handle both exact IP matches (.ipset) and
  network range membership (.netset) for proper dshield.netset support
- Update test to reflect new behavior where FireHolCron only downloads data,
  enrichment happens automatically during IOC creation
@opbot-xd opbot-xd requested a review from mlodic December 29, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants