Skip to content

Conversation

@ljavorsk
Copy link
Collaborator

This function helps to scrape relevant comments on the MR

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new function to fetch authorized comments from a GitLab merge request, along with the necessary data models and comprehensive unit tests. The implementation is solid, but I have a few suggestions to improve robustness and maintainability. Specifically, the error handling can be made more precise by catching specific exceptions, and the main function could be refactored for better readability. I've also included a suggestion to improve the data models for better type safety.

Comment on lines 469 to 509
except Exception as e:
# Thread starter is not an authorized member
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Catching a broad Exception and simply continuing is risky. It can hide unrelated bugs (e.g., network issues, API changes) by silencing them. The intention is to skip non-project members, so you should catch the specific exception that python-gitlab raises for a "not found" error, which is gitlab.exceptions.GitlabGetError. This makes your error handling more precise and robust. This also applies to the except block for replies on lines 453-455. You'll need to import this exception: from gitlab.exceptions import GitlabGetError.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended

common/models.py Outdated
Comment on lines 302 to 304
created_at: str = Field(
description="ISO 8601 timestamp when reply was created"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better type safety and to leverage Pydantic's parsing capabilities, consider changing the type of created_at from str to datetime. Pydantic will automatically parse ISO 8601 formatted strings into datetime objects. This makes working with dates and times in the rest of the application easier and less error-prone. You'll need to add from datetime import datetime at the top of the file.

common/models.py Outdated
Comment on lines 312 to 314
created_at: str = Field(
description="ISO 8601 timestamp when comment was created"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the CommentReply model, consider changing the type of created_at from str to datetime for better type safety and easier manipulation of date/time values.

first_member = project.members_all.get(first_author_id)

# Only process thread if starter is Developer+
if first_member.access_level >= 30:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The access level 30 is a magic number. While it's explained in the docstring, defining it as a module-level constant (e.g., DEVELOPER_ACCESS_LEVEL = 30) would improve readability and maintainability. This constant could then be used here and on line 445.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, make it a function, e.g. is_trusted_user(id).

Comment on lines 384 to 512
def get_authorized_comments():
# Get discussions instead of notes to capture thread context
discussions = mr._raw_pr.discussions.list(get_all=True)
project = mr.target_project.gitlab_repo

logger.info(f"Found {len(discussions)} discussions in MR")

authorized_comments = []
for discussion in discussions:
discussion_id = discussion.id
notes = discussion.attributes.get("notes", [])

if not notes:
continue

# Process first note (thread starter)
first_note = notes[0]

# Skip system notes (like "added 1 commit")
if first_note.get("system", False): continue

first_author_id = first_note["author"]["id"]
first_author_username = first_note["author"]["username"]

try:
first_member = project.members_all.get(first_author_id)

# Only process thread if starter is Developer+
if first_member.access_level >= 30:
file_path = ""
line_number = None
line_type = ""

position = first_note.get("position")

if position:
file_path = (position.get("new_path", "") or position.get("old_path", ""))
new_line = position.get("new_line")
old_line = position.get("old_line")

if new_line and old_line:
# Both present = unchanged/context line
line_number = new_line
line_type = "unchanged"
elif new_line:
line_number = new_line
line_type = "new"
elif old_line:
line_number = old_line
line_type = "old"

replies = []
for reply_note in notes[1:]:
if reply_note.get("system", False): continue

reply_author_id = reply_note["author"]["id"]
reply_author_username = reply_note["author"]["username"]

try:
reply_member = project.members_all.get(reply_author_id)

if reply_member.access_level >= 30:
replies.append(
CommentReply(
author=reply_author_username,
message=reply_note["body"],
created_at=reply_note["created_at"],
)
)
except Exception as e:
# Reply author is not an authorized member
continue

authorized_comments.append(
MergeRequestComment(
author=first_author_username,
message=first_note["body"],
created_at=first_note["created_at"],
file_path=file_path,
line_number=line_number,
line_type=line_type,
discussion_id=discussion_id,
replies=replies,
)
)
except Exception as e:
# Thread starter is not an authorized member
continue

return authorized_comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The get_authorized_comments function is quite long and has multiple levels of nesting, which reduces readability and makes it harder to maintain. Consider refactoring it by extracting parts of the logic into smaller, single-purpose helper functions. For example, you could have functions for:

  • Processing a single discussion.
  • Checking if a user is authorized.
  • Extracting comment details (like file path and line number) from a note.

This would make the main loop cleaner and the overall logic easier to follow.

Comment on lines 394 to 397
notes = discussion.attributes.get("notes", [])

if not notes:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
notes = discussion.attributes.get("notes", [])
if not notes:
continue
if not (notes := discussion.attributes.get("notes")):
continue

first_note = notes[0]

# Skip system notes (like "added 1 commit")
if first_note.get("system", False): continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if first_note.get("system", False): continue
if first_note.get("system"):
continue

Comment on lines 408 to 412
try:
first_member = project.members_all.get(first_author_id)

# Only process thread if starter is Developer+
if first_member.access_level >= 30:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the membership check in one place and fail early, please.


replies = []
for reply_note in notes[1:]:
if reply_note.get("system", False): continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never do oneline conditions, please:

Suggested change
if reply_note.get("system", False): continue
if reply_note.get("system"):
continue

first_member = project.members_all.get(first_author_id)

# Only process thread if starter is Developer+
if first_member.access_level >= 30:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, make it a function, e.g. is_trusted_user(id).

This function helps to scrape relevant comments on the MR
@ljavorsk
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from a GitLab merge request, filtering for those made by authorized users (Developer role or higher). The changes include new Pydantic models for comments and replies, the main function logic with several helpers, and comprehensive unit tests.

My review focuses on improving the robustness and performance of the new functionality. I've identified a couple of areas for improvement:

  • The exception handling for checking user authorization can be made more specific to avoid masking unrelated errors.
  • The current implementation may cause performance issues on merge requests with many comments due to excessive API calls (N+1 problem). I've suggested an optimization to fetch project members in a single batch.

Overall, the changes are well-structured and the addition of tests is great. Addressing these points will make the new tool more reliable and efficient.

Comment on lines +440 to +460
async def get_authorized_comments_from_merge_request(
merge_request_url: Annotated[str, Field(description="URL of the merge request")],
) -> list[MergeRequestComment]:
"""
Gets all comments from a merge request, filtered to only include
comments from authorized members with Developer role or higher.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation can lead to a large number of API calls (an N+1 problem). For each comment and reply, _is_authorized_member is called, which in turn makes an API call to GitLab to get member details. On a merge request with many comments, this can be very slow and might hit API rate limits.

To optimize this, you can fetch all project members once at the beginning of get_authorized_comments and store the IDs of authorized members in a set. Then, you can check against this set instead of making an API call for each author.

Here's an example of how you could implement this inside the get_authorized_comments function:

# Inside get_authorized_comments function
project = mr.target_project.gitlab_repo

try:
    all_members = project.members_all.list(get_all=True)
    authorized_ids = {
        member.id for member in all_members
        if member.access_level >= DEVELOPER_ACCESS_LEVEL
    }
except Exception as e:
    logger.warning(f"Could not fetch all project members, falling back to individual checks: {e}")
    authorized_ids = None

def is_authorized(author_id: int) -> bool:
    if authorized_ids is not None:
        return author_id in authorized_ids
    return _is_authorized_member(project, author_id)

# Then, use is_authorized() instead of _is_authorized_member() for checks.
# You would also need to adapt _process_reply to use this new check.

Comment on lines 400 to 401
except Exception:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception is risky as it can mask unexpected errors (like network issues or authentication problems), treating them as a non-member scenario. It's better to catch the specific exception that python-gitlab raises when a user is not found, which is gitlab.exceptions.GitlabGetError.

This will make the function more robust. You'll need to add import gitlab.exceptions at the top of the file.

Suggested change
except Exception:
return False
except gitlab.exceptions.GitlabGetError:
return False

@ljavorsk ljavorsk force-pushed the main branch 2 times, most recently from b2dc5de to 81eb629 Compare November 19, 2025 11:42
@ljavorsk
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new function get_authorized_comments_from_merge_request to fetch comments from GitLab merge requests, filtering for authorized users. The implementation is well-structured, using helper functions and correctly handling asynchronous operations with blocking API calls via asyncio.to_thread. The accompanying Pydantic models are clearly defined, and the unit tests are comprehensive, covering various scenarios including filtering, replies, and line-specific comments.

My review includes a couple of suggestions to improve robustness and maintainability:

  • Making dictionary access safer to prevent potential KeyError exceptions from unexpected API responses.
  • Adding a type hint to a helper function to improve code clarity.

Overall, this is a solid contribution that adds valuable functionality.

raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e


def _get_authorized_member_ids(project) -> set[int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project parameter lacks a type hint. Based on its usage (project.members_all.list), it appears to be a python-gitlab project object. Adding a type hint would improve code clarity and allow static analysis tools to catch potential type-related errors. Using a string literal for the type hint is a good practice here to avoid adding an import solely for type annotation.

Suggested change
def _get_authorized_member_ids(project) -> set[int]:
def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the 👎?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is not necessary, but I can add it if you believe it's needed

Comment on lines 438 to 446
author_id = note["author"]["id"]
if author_id not in authorized_member_ids:
return None

return CommentReply(
author=note["author"]["username"],
message=note["body"],
created_at=note["created_at"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Directly accessing dictionary keys like note["author"]["id"] can lead to a KeyError if the API response structure varies, which would crash the function. To make the code more robust against unexpected API responses, it's safer to use .get() for dictionary access. I've suggested a change that also wraps the CommentReply creation in a try...except block to gracefully handle any other missing keys. This principle of defensive coding against external API data should also be applied to similar data access patterns in get_authorized_comments (e.g., lines 477, 491-493).

    author_info = note.get("author")
    if not author_info or author_info.get("id") not in authorized_member_ids:
        return None

    try:
        return CommentReply(
            author=author_info["username"],
            message=note["body"],
            created_at=note["created_at"],
        )
    except KeyError as e:
        logger.warning(f"Skipping reply due to missing key in note: {e}")
        return None

@ljavorsk ljavorsk force-pushed the main branch 2 times, most recently from e4bbd15 to c7c3aa4 Compare November 19, 2025 12:28
raise ToolError(f"Failed to get failed jobs from merge request: {e}") from e


def _get_authorized_member_ids(project: "gitlab.v4.objects.Project") -> set[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type hint in quotes? If you want to avoid additional imports, you could pass ogr's GitlabProject instead, or make this a nested function without type hints (but since it's now called at one place only, you could also just inline it).

return set()


def _extract_position_info(note: dict) -> tuple[str, int | None, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be updated in the future to include more metadata? I'm afraid this is not enough to identify code/diff context properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add something in the future once we test it with the agent and see what could help there

Copy link
Member

@nforro nforro Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for instance doesn't cover line ranges, which is rather important IMHO, especially with suggestions.

return None

return CommentReply(
author=author.get("username", "unknown"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think None is just fine, we don't have to invent string constants.

Suggested change
author=author.get("username", "unknown"),
author=author.get("username"),

return CommentReply(
author=author.get("username", "unknown"),
message=note.get("body", ""),
created_at=note.get("created_at", ""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this can be missing, but most importantly I'm not sure what datetime would be derived from an empty string. If this can really be missing, then it should be None in that case. Same with message above, if there is none, it should be None.

@ljavorsk
Copy link
Collaborator Author

/gemini review

file_path=file_path,
line_number=line_number,
line_type=line_type,
discussion_id=getattr(discussion, "id", ""),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
discussion_id=getattr(discussion, "id", ""),
discussion_id=discussion.id,

Or, if it can be empty (can it though?):

Suggested change
discussion_id=getattr(discussion, "id", ""),
discussion_id=getattr(discussion, "id", None),

Copy link
Member

@nforro nforro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of code duplication in the tests, couldn't it be parametrized?

@ljavorsk
Copy link
Collaborator Author

I reduced the code in tests. Thanks for suggestion

@ljavorsk ljavorsk merged commit b296d5b into packit:main Nov 19, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants