-
Notifications
You must be signed in to change notification settings - Fork 299
Add bloom checking endpoints #1243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds endpoints for checking bloom filters against a microservice along with corresponding tests.
- Introduces new API endpoints (/search and /search/raw) in the bloom module.
- Updates the main FastAPI router to include the bloom endpoints.
- Adds comprehensive tests to verify successful and error responses from the bloom endpoints.
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
tests/apiv2/bloom.py | Adds tests for the new bloom endpoints, including error handling scenarios. |
core/web/webapp.py | Registers the new bloom endpoints in the API router. |
core/web/apiv2/bloom.py | Implements the bloom checking endpoints interacting with the microservice. |
Files not reviewed (1)
- yeti.conf.sample: Language not supported
Comments suppressed due to low confidence (2)
tests/apiv2/bloom.py:21
- [nitpick] The test method 'testSomething' is ambiguous; consider renaming it to clearly indicate its purpose.
def testSomething(self) -> None:
tests/apiv2/bloom.py:34
- The test 'testConnectionError' does not simulate a connection error scenario; consider patching requests.post to raise a ConnectionError to properly test the error handling branch.
def testConnectionError(self) -> None:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some comments. Mostly just some clarifications to see if I understood how this works under the hood.
def search(httpreq: Request, request: BloomSearchRequest) -> list[BloomHit]: | ||
"""Checks the bloomcheck microservice for hits.""" | ||
try: | ||
response = requests.post( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit weird to me but if i understand it correctly, the Yeti API server makes an http call to the bloom service (which is separate), to get the status hit/no hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right! How can I make it less weird?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some comments. Mostly just some clarifications to see if I understood how this works under the hood.
"""Checks the bloomcheck microservice for hits.""" | ||
values = await httpreq.body() | ||
try: | ||
response = requests.post( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this call synchronous? The fastapi method is async though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
body is async(), so we have to await
it to get the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific feedback on the code itself. However I've two questions :)
- I was thinking bloomcheck would be processed from an analytics plugin rather than a dedicated API endpoint. Nevertheless, I do not have strong opinions about refactoring.
- I wonder how we should deal with false positive. I've these two examples in mind:
- If I'm checking a malicious sample against a "known good" bloomfilter, there's a chance it will be seen as known good. If I'm building an export of malicious hashes to enrich logs, if this sample was dropped or executed, it won't be matched resulting to a false negative.
- If I'm checking a usual operating system file against a "known bad" bloomfilter, there's a chance this non malicious file will be flagged as malicious. If I'm building an export of malicious hashes to enrich logs, this non malicious file will be flagged as malicious and will potentially raise lot's of false positive.
We'll adjust the documentation to be super clear, but given that bloom filters are there to provide a yes / no answer, and condense bigger databases into Yeti, I don't think it's yeti's job to provide more context about it (otherwise, we'd just use our database / index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.