Skip to content

Add identity events for Git commits#8

Merged
sduenas merged 2 commits into
chaoss:mainfrom
jjmerchante:git-identities
Sep 29, 2025
Merged

Add identity events for Git commits#8
sduenas merged 2 commits into
chaoss:mainfrom
jjmerchante:git-identities

Conversation

@jjmerchante

Copy link
Copy Markdown
Contributor

This PR introduces functionality to extract and eventize identities from Git commit data. It handles authors, committers, and signers, creating events for each identity found.

Comment thread chronicler/events/core/git.py Outdated
Comment thread tests/test_git.py Outdated
Comment thread chronicler/eventizer.py Outdated
@jjmerchante jjmerchante force-pushed the git-identities branch 5 times, most recently from f6f2175 to d98ed65 Compare September 24, 2025 11:32
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
}

AUTHOR_P2P_REGEX = re.compile(
r"(?P<first_authors>.* .*) ([aA][nN][dD]|&|\+) (?P<last_author>.* .*) (?P<email>.*)"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find a better regex? This one looks not helpful because we aren't capturing the email with the expression.

Also, I think we need some good examples about how people use these fields. I remember we added peer programming support on ELK long ago. Maybe you can find where we introduced it and look for the examples that we used for that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the regular expression from ELK: https://github.com/chaoss/grimoirelab-elk/blob/main/grimoire_elk/enriched/git.py#L81.

It looks like there is a single email in the format name one and name two <foo@bar>. One example is this commit: mezuro/kalibro@a9f25be.

In ELK the email is ignored. I don’t know who I should assign the email to. To both of them?

The second regular expression from ELK is incorrect; it obtains Co-author from Author and Commit fields, but Co-author is in the Commit message, not in the Author field.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original project we used to incorporate pair programming was cloudfoundy. Maybe you can have some more examples there to add to the tests.

I suggest that the email address should be part of the regular expression, maybe as an optional field. If you do this, you won't have to look for the email address later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see more examples in this archived repository: https://github.com/cloudfoundry/cflinuxfs2. They follow the same pattern: Name Surname, Name Surname and Name Surname <oneemail@domain.com>. The email sometimes comes from the first person and other times from the last.

I have improved the regular expression. The email is detected using <> and it is added as a new identity (without a name and username).

I included a test with that pattern.

Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
@jjmerchante jjmerchante force-pushed the git-identities branch 4 times, most recently from 73a62f1 to a0d850a Compare September 26, 2025 11:55
Comment thread chronicler/events/core/git.py Outdated
}

AUTHOR_P2P_REGEX = re.compile(
r"(?P<first_authors>.* .*) ([aA][nN][dD]|&|\+) (?P<last_author>.* .*) (?P<email>.*)"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original project we used to incorporate pair programming was cloudfoundy. Maybe you can have some more examples there to add to the tests.

I suggest that the email address should be part of the regular expression, maybe as an optional field. If you do this, you won't have to look for the email address later.

Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py Outdated
Comment thread chronicler/events/core/git.py
Comment thread chronicler/events/core/git.py

@sduenas sduenas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good. I'm suggesting minor changes, mainly in the documentation.

Comment thread docs/events.md Outdated
Comment on lines +119 to +126
| Field | Type | Description |
|-------|------|---------------------------------------------------|
| name | `String` | Name of the contributor |
| username | `String` | Username of the contributor |
| email | `String` | Email of the contributor |
| uuid | `String` | Unique identifier of the contributor |
| role | `String` | Role of the contributor. |
| source | `String` | Source of the identity. Always `git` for commits. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you align the text and format of the table so it can be read when not rendered with a markdown reader?

Comment thread docs/events.md
Comment thread chronicler/events/core/git.py
Comment thread chronicler/events/core/git.py Outdated
Comment on lines +62 to +64
AUTHOR_P2P_REGEX = re.compile(
r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>"
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AUTHOR_P2P_REGEX = re.compile(
r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>"
)
# Pair programming regex. Some matching examples are:
# - John Smith, John Doe and Jane Rae <pairprogramming@example.com>
# - John Smith, John Doe & Jane Rae <pairprogramming@example>
# - John Smith and John Doe <pairpogramming@example>
GIT_AUTHORS_REGEX = re.compile(
r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>"
)

This commit introduces functionality to extract and
eventize identities from Git commit data. It handles
authors, committers, and signers, creating events for
each identity found.

Signed-off-by: Jose Javier Merchante <jjmerchante@bitergia.com>
This commit updates the tests workflow to continue
when the coveralls action fails as proposed by one
of their incidents:

https://status.coveralls.io/incidents/v5mcbrsbhgt4

Signed-off-by: Jose Javier Merchante <jjmerchante@bitergia.com>

@sduenas sduenas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sduenas sduenas merged commit 5def5e7 into chaoss:main Sep 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants