Skip to content

[searchfox_api] Ignore the search results that are scattered in too many files #4854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

suhaibmujahid
Copy link
Member

Will mitigate, but not fix, #4842

@kitsiosk
Copy link

Hi! 🤖 The test below is automatically generated and serves as a regression test for this PR because it:

  • passes, and
  • fails in the codebase before the PR.
def test_search_ignores_excessive_paths():
    import logging
    from unittest.mock import patch, MagicMock
    from bugbug.code_search.searchfox_api import search

    # Set up logging to capture warnings
    logger = logging.getLogger('bugbug.code_search.searchfox_api')
    logger.setLevel(logging.WARNING)
    log_capture = MagicMock()
    logger.addHandler(logging.StreamHandler(log_capture))

    # Mock the utils.get_session().get() to return a mock response
    mock_response = MagicMock()
    mock_response.text = """
    var results = {
        "normal": {
            "Definitions (symbol_name)": [
                {"path": "file1.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file2.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file3.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file4.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file5.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file6.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file7.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file8.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file9.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file10.js", "lines": [{"line": "function symbol_name() {}"}]},
                {"path": "file11.js", "lines": [{"line": "function symbol_name() {}"}]}
            ]
        }
    };
    """
    mock_response.raise_for_status = MagicMock()

    with patch('bugbug.utils.get_session', return_value=MagicMock(get=MagicMock(return_value=mock_response))):
        # Call the search function with a symbol name that would trigger too many paths
        result = search('dummy_commit_hash', 'symbol_name')

    # Assert that the result is empty due to too many paths
    assert result == []

    # Assert that a warning was logged
    log_capture.write.assert_any_call("Too many paths found for symbol symbol_name: 11 paths. Skipping this symbol.\n")

If you find this regression test useful, feel free to insert it to your test suite.
Our automated pipeline inserted the test at the end of the tests/test_repository.py file before running it.

This is part of our research at the ZEST group of University of Zurich in collaboration with Mozilla.
If you have any suggestions, questions, or simply want to learn more, feel free to contact us at [email protected] and [email protected].

Click to see which lines were covered.
diff --git a/bugbug/code_search/searchfox_api.py b/bugbug/code_search/searchfox_api.py
--- a/bugbug/code_search/searchfox_api.py
+++ b/bugbug/code_search/searchfox_api.py
@@ -5,6 +5,7 @@
 
 import io
 import json
+import logging# ✅ Covered by above test
 import re
 from typing import Iterable, Literal
 
@@ -19,6 +20,8 @@
     register_function_search,
 )
 from bugbug.repository import SOURCE_CODE_TYPES_TO_EXT
+# ✅ Covered by above test
+logger = logging.getLogger(__name__)# ✅ Covered by above test
 
 
 def get_line_number(elements: Iterable[HtmlElement], position: Literal["start", "end"]):
@@ -201,6 +204,13 @@
                         definitions.append(value)
 
     paths = list(set(definition["path"] for definition in definitions))
+    if len(paths) > 10:# ✅ Covered by above test
+        logger.warning(# ✅ Covered by above test
+            "Too many paths found for symbol %s: %d paths. Skipping this symbol.",# ✅ Covered by above test
+            symbol_name,# ✅ Covered by above test
+            len(paths),# ✅ Covered by above test
+        )# ✅ Covered by above test
+        return []# ✅ Covered by above test
 
     return sum((get_functions(commit_hash, path, symbol_name) for path in paths), [])
 

Line coverage* achieved: 100.0%

* Line coverage is calculated over the lines added in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants