Skip to content

Add bloom checking endpoints #1243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 11, 2025
Merged

Add bloom checking endpoints #1243

merged 7 commits into from
Apr 11, 2025

Conversation

tomchop
Copy link
Collaborator

@tomchop tomchop commented Apr 1, 2025

No description provided.

@tomchop tomchop requested review from udgover and Copilot April 1, 2025 16:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds endpoints for checking bloom filters against a microservice along with corresponding tests.

  • Introduces new API endpoints (/search and /search/raw) in the bloom module.
  • Updates the main FastAPI router to include the bloom endpoints.
  • Adds comprehensive tests to verify successful and error responses from the bloom endpoints.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
tests/apiv2/bloom.py Adds tests for the new bloom endpoints, including error handling scenarios.
core/web/webapp.py Registers the new bloom endpoints in the API router.
core/web/apiv2/bloom.py Implements the bloom checking endpoints interacting with the microservice.
Files not reviewed (1)
  • yeti.conf.sample: Language not supported
Comments suppressed due to low confidence (2)

tests/apiv2/bloom.py:21

  • [nitpick] The test method 'testSomething' is ambiguous; consider renaming it to clearly indicate its purpose.
def testSomething(self) -> None:

tests/apiv2/bloom.py:34

  • The test 'testConnectionError' does not simulate a connection error scenario; consider patching requests.post to raise a ConnectionError to properly test the error handling branch.
def testConnectionError(self) -> None:

@tomchop tomchop requested a review from sebdraven April 1, 2025 17:37
Copy link

@jleaniz jleaniz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some comments. Mostly just some clarifications to see if I understood how this works under the hood.

def search(httpreq: Request, request: BloomSearchRequest) -> list[BloomHit]:
"""Checks the bloomcheck microservice for hits."""
try:
response = requests.post(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird to me but if i understand it correctly, the Yeti API server makes an http call to the bloom service (which is separate), to get the status hit/no hit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right! How can I make it less weird?

Copy link

@jleaniz jleaniz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some comments. Mostly just some clarifications to see if I understood how this works under the hood.

"""Checks the bloomcheck microservice for hits."""
values = await httpreq.body()
try:
response = requests.post(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this call synchronous? The fastapi method is async though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body is async(), so we have to await it to get the result.

@tomchop tomchop requested a review from jleaniz April 9, 2025 22:05
Copy link
Collaborator

@udgover udgover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific feedback on the code itself. However I've two questions :)

  • I was thinking bloomcheck would be processed from an analytics plugin rather than a dedicated API endpoint. Nevertheless, I do not have strong opinions about refactoring.
  • I wonder how we should deal with false positive. I've these two examples in mind:
    • If I'm checking a malicious sample against a "known good" bloomfilter, there's a chance it will be seen as known good. If I'm building an export of malicious hashes to enrich logs, if this sample was dropped or executed, it won't be matched resulting to a false negative.
    • If I'm checking a usual operating system file against a "known bad" bloomfilter, there's a chance this non malicious file will be flagged as malicious. If I'm building an export of malicious hashes to enrich logs, this non malicious file will be flagged as malicious and will potentially raise lot's of false positive.

@tomchop
Copy link
Collaborator Author

tomchop commented Apr 10, 2025

We'll adjust the documentation to be super clear, but given that bloom filters are there to provide a yes / no answer, and condense bigger databases into Yeti, I don't think it's yeti's job to provide more context about it (otherwise, we'd just use our database / index)

Copy link
Collaborator

@udgover udgover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tomchop tomchop merged commit 3cf07d5 into main Apr 11, 2025
2 checks passed
@tomchop tomchop deleted the bloom branch April 11, 2025 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants