Add `get_authorized_comments_from_merge_request` function #362

ljavorsk · 2025-11-18T14:01:04Z

This function helps to scrape relevant comments on the MR

gemini-code-assist

Code Review

This pull request introduces a new function to fetch authorized comments from a GitLab merge request, along with the necessary data models and comprehensive unit tests. The implementation is solid, but I have a few suggestions to improve robustness and maintainability. Specifically, the error handling can be made more precise by catching specific exceptions, and the main function could be refactored for better readability. I've also included a suggestion to improve the data models for better type safety.

gemini-code-assist · 2025-11-18T14:04:55Z

mcp_server/gitlab_tools.py

+                except Exception as e:
+                    # Thread starter is not an authorized member
+                    continue


Catching a broad Exception and simply continuing is risky. It can hide unrelated bugs (e.g., network issues, API changes) by silencing them. The intention is to skip non-project members, so you should catch the specific exception that python-gitlab raises for a "not found" error, which is gitlab.exceptions.GitlabGetError. This makes your error handling more precise and robust. This also applies to the except block for replies on lines 453-455. You'll need to import this exception: from gitlab.exceptions import GitlabGetError.

This is intended

gemini-code-assist · 2025-11-18T14:04:55Z

common/models.py

+    created_at: str = Field(
+        description="ISO 8601 timestamp when reply was created"
+    )


For better type safety and to leverage Pydantic's parsing capabilities, consider changing the type of created_at from str to datetime. Pydantic will automatically parse ISO 8601 formatted strings into datetime objects. This makes working with dates and times in the rest of the application easier and less error-prone. You'll need to add from datetime import datetime at the top of the file.

gemini-code-assist · 2025-11-18T14:04:55Z

common/models.py

+    created_at: str = Field(
+        description="ISO 8601 timestamp when comment was created"
+    )


Similar to the CommentReply model, consider changing the type of created_at from str to datetime for better type safety and easier manipulation of date/time values.

gemini-code-assist · 2025-11-18T14:04:55Z

mcp_server/gitlab_tools.py

+                    first_member = project.members_all.get(first_author_id)
+
+                    # Only process thread if starter is Developer+
+                    if first_member.access_level >= 30:


The access level 30 is a magic number. While it's explained in the docstring, defining it as a module-level constant (e.g., DEVELOPER_ACCESS_LEVEL = 30) would improve readability and maintainability. This constant could then be used here and on line 445.

Alternatively, make it a function, e.g. is_trusted_user(id).

gemini-code-assist · 2025-11-18T14:04:55Z

mcp_server/gitlab_tools.py

+        def get_authorized_comments():
+            # Get discussions instead of notes to capture thread context
+            discussions = mr._raw_pr.discussions.list(get_all=True)
+            project = mr.target_project.gitlab_repo
+
+            logger.info(f"Found {len(discussions)} discussions in MR")
+
+            authorized_comments = []
+            for discussion in discussions:
+                discussion_id = discussion.id
+                notes = discussion.attributes.get("notes", [])
+
+                if not notes:
+                    continue
+
+                # Process first note (thread starter)
+                first_note = notes[0]
+
+                # Skip system notes (like "added 1 commit")
+                if first_note.get("system", False): continue
+
+                first_author_id = first_note["author"]["id"]
+                first_author_username = first_note["author"]["username"]
+
+                try:
+                    first_member = project.members_all.get(first_author_id)
+
+                    # Only process thread if starter is Developer+
+                    if first_member.access_level >= 30:
+                        file_path = ""
+                        line_number = None
+                        line_type = ""
+
+                        position = first_note.get("position")
+
+                        if position:
+                            file_path = (position.get("new_path", "") or position.get("old_path", ""))
+                            new_line = position.get("new_line")
+                            old_line = position.get("old_line")
+
+                            if new_line and old_line:
+                                # Both present = unchanged/context line
+                                line_number = new_line
+                                line_type = "unchanged"
+                            elif new_line:
+                                line_number = new_line
+                                line_type = "new"
+                            elif old_line:
+                                line_number = old_line
+                                line_type = "old"
+
+                        replies = []
+                        for reply_note in notes[1:]:
+                            if reply_note.get("system", False): continue
+
+                            reply_author_id = reply_note["author"]["id"]
+                            reply_author_username = reply_note["author"]["username"]
+
+                            try:
+                                reply_member = project.members_all.get(reply_author_id)
+
+                                if reply_member.access_level >= 30:
+                                    replies.append(
+                                        CommentReply(
+                                            author=reply_author_username,
+                                            message=reply_note["body"],
+                                            created_at=reply_note["created_at"],
+                                        )
+                                    )
+                            except Exception as e:
+                                # Reply author is not an authorized member
+                                continue
+
+                        authorized_comments.append(
+                            MergeRequestComment(
+                                author=first_author_username,
+                                message=first_note["body"],
+                                created_at=first_note["created_at"],
+                                file_path=file_path,
+                                line_number=line_number,
+                                line_type=line_type,
+                                discussion_id=discussion_id,
+                                replies=replies,
+                            )
+                        )
+                except Exception as e:
+                    # Thread starter is not an authorized member
+                    continue
+
+            return authorized_comments
+


The get_authorized_comments function is quite long and has multiple levels of nesting, which reduces readability and makes it harder to maintain. Consider refactoring it by extracting parts of the logic into smaller, single-purpose helper functions. For example, you could have functions for:

Processing a single discussion.

Checking if a user is authorized.

Extracting comment details (like file path and line number) from a note.

This would make the main loop cleaner and the overall logic easier to follow.

nforro · 2025-11-18T15:36:48Z

mcp_server/gitlab_tools.py

+                notes = discussion.attributes.get("notes", [])
+
+                if not notes:
+                    continue


Suggested change

notes = discussion.attributes.get("notes", [])

if not notes:

continue

if not (notes := discussion.attributes.get("notes")):

continue

nforro · 2025-11-18T15:53:30Z

mcp_server/gitlab_tools.py

+                first_note = notes[0]
+
+                # Skip system notes (like "added 1 commit")
+                if first_note.get("system", False): continue


Suggested change

if first_note.get("system", False): continue

if first_note.get("system"):

continue

nforro · 2025-11-18T15:56:53Z

mcp_server/gitlab_tools.py

+                try:
+                    first_member = project.members_all.get(first_author_id)
+
+                    # Only process thread if starter is Developer+
+                    if first_member.access_level >= 30:


Do the membership check in one place and fail early, please.

nforro · 2025-11-18T15:59:33Z

mcp_server/gitlab_tools.py

+
+                        replies = []
+                        for reply_note in notes[1:]:
+                            if reply_note.get("system", False): continue


Never do oneline conditions, please:

Suggested change

if reply_note.get("system", False): continue

if reply_note.get("system"):

continue

nforro · 2025-11-18T16:03:38Z

mcp_server/gitlab_tools.py

+                    first_member = project.members_all.get(first_author_id)
+
+                    # Only process thread if starter is Developer+
+                    if first_member.access_level >= 30:


Alternatively, make it a function, e.g. is_trusted_user(id).

This function helps to scrape relevant comments on the MR

ljavorsk · 2025-11-19T11:23:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from a GitLab merge request, filtering for those made by authorized users (Developer role or higher). The changes include new Pydantic models for comments and replies, the main function logic with several helpers, and comprehensive unit tests.

My review focuses on improving the robustness and performance of the new functionality. I've identified a couple of areas for improvement:

The exception handling for checking user authorization can be made more specific to avoid masking unrelated errors.
The current implementation may cause performance issues on merge requests with many comments due to excessive API calls (N+1 problem). I've suggested an optimization to fetch project members in a single batch.

Overall, the changes are well-structured and the addition of tests is great. Addressing these points will make the new tool more reliable and efficient.

gemini-code-assist · 2025-11-19T11:25:32Z

mcp_server/gitlab_tools.py

+async def get_authorized_comments_from_merge_request(
+    merge_request_url: Annotated[str, Field(description="URL of the merge request")],
+) -> list[MergeRequestComment]:
+    """
+    Gets all comments from a merge request, filtered to only include
+    comments from authorized members with Developer role or higher.
+    """


This implementation can lead to a large number of API calls (an N+1 problem). For each comment and reply, _is_authorized_member is called, which in turn makes an API call to GitLab to get member details. On a merge request with many comments, this can be very slow and might hit API rate limits.

To optimize this, you can fetch all project members once at the beginning of get_authorized_comments and store the IDs of authorized members in a set. Then, you can check against this set instead of making an API call for each author.

Here's an example of how you could implement this inside the get_authorized_comments function:

# Inside get_authorized_comments function project = mr.target_project.gitlab_repo try: all_members = project.members_all.list(get_all=True) authorized_ids = { member.id for member in all_members if member.access_level >= DEVELOPER_ACCESS_LEVEL } except Exception as e: logger.warning(f"Could not fetch all project members, falling back to individual checks: {e}") authorized_ids = None def is_authorized(author_id: int) -> bool: if authorized_ids is not None: return author_id in authorized_ids return _is_authorized_member(project, author_id) # Then, use is_authorized() instead of _is_authorized_member() for checks. # You would also need to adapt _process_reply to use this new check.

gemini-code-assist · 2025-11-19T11:25:32Z

mcp_server/gitlab_tools.py

+    except Exception:
+        return False


Catching a broad Exception is risky as it can mask unexpected errors (like network issues or authentication problems), treating them as a non-member scenario. It's better to catch the specific exception that python-gitlab raises when a user is not found, which is gitlab.exceptions.GitlabGetError.

This will make the function more robust. You'll need to add import gitlab.exceptions at the top of the file.

Suggested change

except Exception:

return False

except gitlab.exceptions.GitlabGetError:

return False

ljavorsk · 2025-11-19T11:42:56Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from GitLab merge requests, filtering for authorized users. The implementation is well-structured, using helper functions and correctly handling asynchronous operations with blocking API calls via asyncio.to_thread. The accompanying Pydantic models are clearly defined, and the unit tests are comprehensive, covering various scenarios including filtering, replies, and line-specific comments.

My review includes a couple of suggestions to improve robustness and maintainability:

Making dictionary access safer to prevent potential KeyError exceptions from unexpected API responses.
Adding a type hint to a helper function to improve code clarity.

Overall, this is a solid contribution that adds valuable functionality.

gemini-code-assist · 2025-11-19T11:44:58Z

mcp_server/gitlab_tools.py

        raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e
+
+
+def _get_authorized_member_ids(project) -> set[int]:


The project parameter lacks a type hint. Based on its usage (project.members_all.list), it appears to be a python-gitlab project object. Adding a type hint would improve code clarity and allow static analysis tools to catch potential type-related errors. Using a string literal for the type hint is a good practice here to avoid adding an import solely for type annotation.

Suggested change

def _get_authorized_member_ids(project) -> set[int]:

def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]:

Why the 👎?

I thought this is not necessary, but I can add it if you believe it's needed

gemini-code-assist · 2025-11-19T11:44:58Z

mcp_server/gitlab_tools.py

+    author_id = note["author"]["id"]
+    if author_id not in authorized_member_ids:
+        return None
+
+    return CommentReply(
+        author=note["author"]["username"],
+        message=note["body"],
+        created_at=note["created_at"],
+    )


Directly accessing dictionary keys like note["author"]["id"] can lead to a KeyError if the API response structure varies, which would crash the function. To make the code more robust against unexpected API responses, it's safer to use .get() for dictionary access. I've suggested a change that also wraps the CommentReply creation in a try...except block to gracefully handle any other missing keys. This principle of defensive coding against external API data should also be applied to similar data access patterns in get_authorized_comments (e.g., lines 477, 491-493).

author_info = note.get("author") if not author_info or author_info.get("id") not in authorized_member_ids: return None try: return CommentReply( author=author_info["username"], message=note["body"], created_at=note["created_at"], ) except KeyError as e: logger.warning(f"Skipping reply due to missing key in note: {e}") return None

nforro · 2025-11-19T12:33:25Z

mcp_server/gitlab_tools.py

        raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e
+
+
+def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]:


Why is the type hint in quotes? If you want to avoid additional imports, you could pass ogr's GitlabProject instead, or make this a nested function without type hints (but since it's now called at one place only, you could also just inline it).

mcp_server/gitlab_tools.py

nforro · 2025-11-19T12:38:09Z

mcp_server/gitlab_tools.py

+        return set()
+
+
+def _extract_position_info(note: dict) -> tuple[str, int | None, str]:


Will this be updated in the future to include more metadata? I'm afraid this is not enough to identify code/diff context properly.

We can add something in the future once we test it with the agent and see what could help there

This for instance doesn't cover line ranges, which is rather important IMHO, especially with suggestions.

nforro · 2025-11-19T12:42:30Z

mcp_server/gitlab_tools.py

+            return None
+
+        return CommentReply(
+            author=author.get("username", "unknown"),


I think None is just fine, we don't have to invent string constants.

Suggested change

author=author.get("username", "unknown"),

author=author.get("username"),

nforro · 2025-11-19T12:44:37Z

mcp_server/gitlab_tools.py

+        return CommentReply(
+            author=author.get("username", "unknown"),
+            message=note.get("body", ""),
+            created_at=note.get("created_at", ""),


I doubt this can be missing, but most importantly I'm not sure what datetime would be derived from an empty string. If this can really be missing, then it should be None in that case. Same with message above, if there is none, it should be None.

Co-authored-by: Nikola Forró <[email protected]>

ljavorsk · 2025-11-19T14:00:54Z

/gemini review

nforro · 2025-11-19T14:23:14Z

mcp_server/gitlab_tools.py

+                            file_path=file_path,
+                            line_number=line_number,
+                            line_type=line_type,
+                            discussion_id=getattr(discussion, "id", ""),


Suggested change

discussion_id=getattr(discussion, "id", ""),

discussion_id=discussion.id,

Or, if it can be empty (can it though?):

Suggested change

discussion_id=getattr(discussion, "id", ""),

discussion_id=getattr(discussion, "id", None),

nforro

There is a lot of code duplication in the tests, couldn't it be parametrized?

ljavorsk · 2025-11-19T16:15:17Z

I reduced the code in tests. Thanks for suggestion

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

nforro reviewed Nov 18, 2025

View reviewed changes

Add get_authorized_comments_from_merge_request function

c3df700

This function helps to scrape relevant comments on the MR

ljavorsk force-pushed the main branch from c7d16f0 to f538f4c Compare November 19, 2025 11:23

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

ljavorsk force-pushed the main branch 2 times, most recently from b2dc5de to 81eb629 Compare November 19, 2025 11:42

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

ljavorsk force-pushed the main branch 2 times, most recently from e4bbd15 to c7c3aa4 Compare November 19, 2025 12:28

nforro reviewed Nov 19, 2025

View reviewed changes

Spliting to private functions and fixing small issues

1d0abce

ljavorsk force-pushed the main branch from c7c3aa4 to 1d0abce Compare November 19, 2025 13:28

Update mcp_server/gitlab_tools.py

74eda2f

Co-authored-by: Nikola Forró <[email protected]>

nforro reviewed Nov 19, 2025

View reviewed changes

nforro approved these changes Nov 19, 2025

View reviewed changes

ljavorsk added 2 commits November 19, 2025 17:08

Reduce duplicates in the tests for comments in MR

5f852ea

Merge commit 'refs/pull/362/head' of github.com:packit/ai-workflows

666c788

ljavorsk merged commit b296d5b into packit:main Nov 19, 2025
7 checks passed

	if first_note.get("system", False): continue
	if first_note.get("system"):
	continue

	if reply_note.get("system", False): continue
	if reply_note.get("system"):
	continue

		raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e


		def _get_authorized_member_ids(project) -> set[int]:

	def _get_authorized_member_ids(project) -> set[int]:
	def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]:

		return set()


		def _extract_position_info(note: dict) -> tuple[str, int \| None, str]:

	author=author.get("username", "unknown"),
	author=author.get("username"),

	discussion_id=getattr(discussion, "id", ""),
	discussion_id=discussion.id,

Add get_authorized_comments_from_merge_request function #362

Add get_authorized_comments_from_merge_request function #362

Uh oh!

Conversation

ljavorsk commented Nov 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ljavorsk commented Nov 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

ljavorsk commented Nov 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nforro Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add `get_authorized_comments_from_merge_request` function #362

Add `get_authorized_comments_from_merge_request` function #362

nforro Nov 19, 2025 •

edited

Loading