-
Notifications
You must be signed in to change notification settings - Fork 22
Add get_authorized_comments_from_merge_request function
#362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new function to fetch authorized comments from a GitLab merge request, along with the necessary data models and comprehensive unit tests. The implementation is solid, but I have a few suggestions to improve robustness and maintainability. Specifically, the error handling can be made more precise by catching specific exceptions, and the main function could be refactored for better readability. I've also included a suggestion to improve the data models for better type safety.
| except Exception as e: | ||
| # Thread starter is not an authorized member | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching a broad Exception and simply continuing is risky. It can hide unrelated bugs (e.g., network issues, API changes) by silencing them. The intention is to skip non-project members, so you should catch the specific exception that python-gitlab raises for a "not found" error, which is gitlab.exceptions.GitlabGetError. This makes your error handling more precise and robust. This also applies to the except block for replies on lines 453-455. You'll need to import this exception: from gitlab.exceptions import GitlabGetError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended
common/models.py
Outdated
| created_at: str = Field( | ||
| description="ISO 8601 timestamp when reply was created" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better type safety and to leverage Pydantic's parsing capabilities, consider changing the type of created_at from str to datetime. Pydantic will automatically parse ISO 8601 formatted strings into datetime objects. This makes working with dates and times in the rest of the application easier and less error-prone. You'll need to add from datetime import datetime at the top of the file.
common/models.py
Outdated
| created_at: str = Field( | ||
| description="ISO 8601 timestamp when comment was created" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mcp_server/gitlab_tools.py
Outdated
| first_member = project.members_all.get(first_author_id) | ||
|
|
||
| # Only process thread if starter is Developer+ | ||
| if first_member.access_level >= 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, make it a function, e.g. is_trusted_user(id).
| def get_authorized_comments(): | ||
| # Get discussions instead of notes to capture thread context | ||
| discussions = mr._raw_pr.discussions.list(get_all=True) | ||
| project = mr.target_project.gitlab_repo | ||
|
|
||
| logger.info(f"Found {len(discussions)} discussions in MR") | ||
|
|
||
| authorized_comments = [] | ||
| for discussion in discussions: | ||
| discussion_id = discussion.id | ||
| notes = discussion.attributes.get("notes", []) | ||
|
|
||
| if not notes: | ||
| continue | ||
|
|
||
| # Process first note (thread starter) | ||
| first_note = notes[0] | ||
|
|
||
| # Skip system notes (like "added 1 commit") | ||
| if first_note.get("system", False): continue | ||
|
|
||
| first_author_id = first_note["author"]["id"] | ||
| first_author_username = first_note["author"]["username"] | ||
|
|
||
| try: | ||
| first_member = project.members_all.get(first_author_id) | ||
|
|
||
| # Only process thread if starter is Developer+ | ||
| if first_member.access_level >= 30: | ||
| file_path = "" | ||
| line_number = None | ||
| line_type = "" | ||
|
|
||
| position = first_note.get("position") | ||
|
|
||
| if position: | ||
| file_path = (position.get("new_path", "") or position.get("old_path", "")) | ||
| new_line = position.get("new_line") | ||
| old_line = position.get("old_line") | ||
|
|
||
| if new_line and old_line: | ||
| # Both present = unchanged/context line | ||
| line_number = new_line | ||
| line_type = "unchanged" | ||
| elif new_line: | ||
| line_number = new_line | ||
| line_type = "new" | ||
| elif old_line: | ||
| line_number = old_line | ||
| line_type = "old" | ||
|
|
||
| replies = [] | ||
| for reply_note in notes[1:]: | ||
| if reply_note.get("system", False): continue | ||
|
|
||
| reply_author_id = reply_note["author"]["id"] | ||
| reply_author_username = reply_note["author"]["username"] | ||
|
|
||
| try: | ||
| reply_member = project.members_all.get(reply_author_id) | ||
|
|
||
| if reply_member.access_level >= 30: | ||
| replies.append( | ||
| CommentReply( | ||
| author=reply_author_username, | ||
| message=reply_note["body"], | ||
| created_at=reply_note["created_at"], | ||
| ) | ||
| ) | ||
| except Exception as e: | ||
| # Reply author is not an authorized member | ||
| continue | ||
|
|
||
| authorized_comments.append( | ||
| MergeRequestComment( | ||
| author=first_author_username, | ||
| message=first_note["body"], | ||
| created_at=first_note["created_at"], | ||
| file_path=file_path, | ||
| line_number=line_number, | ||
| line_type=line_type, | ||
| discussion_id=discussion_id, | ||
| replies=replies, | ||
| ) | ||
| ) | ||
| except Exception as e: | ||
| # Thread starter is not an authorized member | ||
| continue | ||
|
|
||
| return authorized_comments | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_authorized_comments function is quite long and has multiple levels of nesting, which reduces readability and makes it harder to maintain. Consider refactoring it by extracting parts of the logic into smaller, single-purpose helper functions. For example, you could have functions for:
- Processing a single discussion.
- Checking if a user is authorized.
- Extracting comment details (like file path and line number) from a note.
This would make the main loop cleaner and the overall logic easier to follow.
mcp_server/gitlab_tools.py
Outdated
| notes = discussion.attributes.get("notes", []) | ||
|
|
||
| if not notes: | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| notes = discussion.attributes.get("notes", []) | |
| if not notes: | |
| continue | |
| if not (notes := discussion.attributes.get("notes")): | |
| continue |
mcp_server/gitlab_tools.py
Outdated
| first_note = notes[0] | ||
|
|
||
| # Skip system notes (like "added 1 commit") | ||
| if first_note.get("system", False): continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if first_note.get("system", False): continue | |
| if first_note.get("system"): | |
| continue |
mcp_server/gitlab_tools.py
Outdated
| try: | ||
| first_member = project.members_all.get(first_author_id) | ||
|
|
||
| # Only process thread if starter is Developer+ | ||
| if first_member.access_level >= 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the membership check in one place and fail early, please.
mcp_server/gitlab_tools.py
Outdated
|
|
||
| replies = [] | ||
| for reply_note in notes[1:]: | ||
| if reply_note.get("system", False): continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never do oneline conditions, please:
| if reply_note.get("system", False): continue | |
| if reply_note.get("system"): | |
| continue |
mcp_server/gitlab_tools.py
Outdated
| first_member = project.members_all.get(first_author_id) | ||
|
|
||
| # Only process thread if starter is Developer+ | ||
| if first_member.access_level >= 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, make it a function, e.g. is_trusted_user(id).
This function helps to scrape relevant comments on the MR
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from a GitLab merge request, filtering for those made by authorized users (Developer role or higher). The changes include new Pydantic models for comments and replies, the main function logic with several helpers, and comprehensive unit tests.
My review focuses on improving the robustness and performance of the new functionality. I've identified a couple of areas for improvement:
- The exception handling for checking user authorization can be made more specific to avoid masking unrelated errors.
- The current implementation may cause performance issues on merge requests with many comments due to excessive API calls (N+1 problem). I've suggested an optimization to fetch project members in a single batch.
Overall, the changes are well-structured and the addition of tests is great. Addressing these points will make the new tool more reliable and efficient.
| async def get_authorized_comments_from_merge_request( | ||
| merge_request_url: Annotated[str, Field(description="URL of the merge request")], | ||
| ) -> list[MergeRequestComment]: | ||
| """ | ||
| Gets all comments from a merge request, filtered to only include | ||
| comments from authorized members with Developer role or higher. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation can lead to a large number of API calls (an N+1 problem). For each comment and reply, _is_authorized_member is called, which in turn makes an API call to GitLab to get member details. On a merge request with many comments, this can be very slow and might hit API rate limits.
To optimize this, you can fetch all project members once at the beginning of get_authorized_comments and store the IDs of authorized members in a set. Then, you can check against this set instead of making an API call for each author.
Here's an example of how you could implement this inside the get_authorized_comments function:
# Inside get_authorized_comments function
project = mr.target_project.gitlab_repo
try:
all_members = project.members_all.list(get_all=True)
authorized_ids = {
member.id for member in all_members
if member.access_level >= DEVELOPER_ACCESS_LEVEL
}
except Exception as e:
logger.warning(f"Could not fetch all project members, falling back to individual checks: {e}")
authorized_ids = None
def is_authorized(author_id: int) -> bool:
if authorized_ids is not None:
return author_id in authorized_ids
return _is_authorized_member(project, author_id)
# Then, use is_authorized() instead of _is_authorized_member() for checks.
# You would also need to adapt _process_reply to use this new check.
mcp_server/gitlab_tools.py
Outdated
| except Exception: | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching a broad Exception is risky as it can mask unexpected errors (like network issues or authentication problems), treating them as a non-member scenario. It's better to catch the specific exception that python-gitlab raises when a user is not found, which is gitlab.exceptions.GitlabGetError.
This will make the function more robust. You'll need to add import gitlab.exceptions at the top of the file.
| except Exception: | |
| return False | |
| except gitlab.exceptions.GitlabGetError: | |
| return False |
b2dc5de to
81eb629
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from GitLab merge requests, filtering for authorized users. The implementation is well-structured, using helper functions and correctly handling asynchronous operations with blocking API calls via asyncio.to_thread. The accompanying Pydantic models are clearly defined, and the unit tests are comprehensive, covering various scenarios including filtering, replies, and line-specific comments.
My review includes a couple of suggestions to improve robustness and maintainability:
- Making dictionary access safer to prevent potential
KeyErrorexceptions from unexpected API responses. - Adding a type hint to a helper function to improve code clarity.
Overall, this is a solid contribution that adds valuable functionality.
mcp_server/gitlab_tools.py
Outdated
| raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e | ||
|
|
||
|
|
||
| def _get_authorized_member_ids(project) -> set[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The project parameter lacks a type hint. Based on its usage (project.members_all.list), it appears to be a python-gitlab project object. Adding a type hint would improve code clarity and allow static analysis tools to catch potential type-related errors. Using a string literal for the type hint is a good practice here to avoid adding an import solely for type annotation.
| def _get_authorized_member_ids(project) -> set[int]: | |
| def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the 👎?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this is not necessary, but I can add it if you believe it's needed
mcp_server/gitlab_tools.py
Outdated
| author_id = note["author"]["id"] | ||
| if author_id not in authorized_member_ids: | ||
| return None | ||
|
|
||
| return CommentReply( | ||
| author=note["author"]["username"], | ||
| message=note["body"], | ||
| created_at=note["created_at"], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly accessing dictionary keys like note["author"]["id"] can lead to a KeyError if the API response structure varies, which would crash the function. To make the code more robust against unexpected API responses, it's safer to use .get() for dictionary access. I've suggested a change that also wraps the CommentReply creation in a try...except block to gracefully handle any other missing keys. This principle of defensive coding against external API data should also be applied to similar data access patterns in get_authorized_comments (e.g., lines 477, 491-493).
author_info = note.get("author")
if not author_info or author_info.get("id") not in authorized_member_ids:
return None
try:
return CommentReply(
author=author_info["username"],
message=note["body"],
created_at=note["created_at"],
)
except KeyError as e:
logger.warning(f"Skipping reply due to missing key in note: {e}")
return Nonee4bbd15 to
c7c3aa4
Compare
mcp_server/gitlab_tools.py
Outdated
| raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e | ||
|
|
||
|
|
||
| def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the type hint in quotes? If you want to avoid additional imports, you could pass ogr's GitlabProject instead, or make this a nested function without type hints (but since it's now called at one place only, you could also just inline it).
| return set() | ||
|
|
||
|
|
||
| def _extract_position_info(note: dict) -> tuple[str, int | None, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be updated in the future to include more metadata? I'm afraid this is not enough to identify code/diff context properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add something in the future once we test it with the agent and see what could help there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for instance doesn't cover line ranges, which is rather important IMHO, especially with suggestions.
mcp_server/gitlab_tools.py
Outdated
| return None | ||
|
|
||
| return CommentReply( | ||
| author=author.get("username", "unknown"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think None is just fine, we don't have to invent string constants.
| author=author.get("username", "unknown"), | |
| author=author.get("username"), |
mcp_server/gitlab_tools.py
Outdated
| return CommentReply( | ||
| author=author.get("username", "unknown"), | ||
| message=note.get("body", ""), | ||
| created_at=note.get("created_at", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this can be missing, but most importantly I'm not sure what datetime would be derived from an empty string. If this can really be missing, then it should be None in that case. Same with message above, if there is none, it should be None.
Co-authored-by: Nikola Forró <[email protected]>
|
/gemini review |
| file_path=file_path, | ||
| line_number=line_number, | ||
| line_type=line_type, | ||
| discussion_id=getattr(discussion, "id", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| discussion_id=getattr(discussion, "id", ""), | |
| discussion_id=discussion.id, |
Or, if it can be empty (can it though?):
| discussion_id=getattr(discussion, "id", ""), | |
| discussion_id=getattr(discussion, "id", None), |
nforro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of code duplication in the tests, couldn't it be parametrized?
|
I reduced the code in tests. Thanks for suggestion |
This function helps to scrape relevant comments on the MR