Add identity events for Git commits#8
Conversation
f6f2175 to
d98ed65
Compare
| } | ||
|
|
||
| AUTHOR_P2P_REGEX = re.compile( | ||
| r"(?P<first_authors>.* .*) ([aA][nN][dD]|&|\+) (?P<last_author>.* .*) (?P<email>.*)" |
There was a problem hiding this comment.
Can we find a better regex? This one looks not helpful because we aren't capturing the email with the expression.
Also, I think we need some good examples about how people use these fields. I remember we added peer programming support on ELK long ago. Maybe you can find where we introduced it and look for the examples that we used for that.
There was a problem hiding this comment.
I used the regular expression from ELK: https://github.com/chaoss/grimoirelab-elk/blob/main/grimoire_elk/enriched/git.py#L81.
It looks like there is a single email in the format name one and name two <foo@bar>. One example is this commit: mezuro/kalibro@a9f25be.
In ELK the email is ignored. I don’t know who I should assign the email to. To both of them?
The second regular expression from ELK is incorrect; it obtains Co-author from Author and Commit fields, but Co-author is in the Commit message, not in the Author field.
There was a problem hiding this comment.
The original project we used to incorporate pair programming was cloudfoundy. Maybe you can have some more examples there to add to the tests.
I suggest that the email address should be part of the regular expression, maybe as an optional field. If you do this, you won't have to look for the email address later.
There was a problem hiding this comment.
I can see more examples in this archived repository: https://github.com/cloudfoundry/cflinuxfs2. They follow the same pattern: Name Surname, Name Surname and Name Surname <oneemail@domain.com>. The email sometimes comes from the first person and other times from the last.
I have improved the regular expression. The email is detected using <> and it is added as a new identity (without a name and username).
I included a test with that pattern.
73a62f1 to
a0d850a
Compare
| } | ||
|
|
||
| AUTHOR_P2P_REGEX = re.compile( | ||
| r"(?P<first_authors>.* .*) ([aA][nN][dD]|&|\+) (?P<last_author>.* .*) (?P<email>.*)" |
There was a problem hiding this comment.
The original project we used to incorporate pair programming was cloudfoundy. Maybe you can have some more examples there to add to the tests.
I suggest that the email address should be part of the regular expression, maybe as an optional field. If you do this, you won't have to look for the email address later.
a0d850a to
8005822
Compare
sduenas
left a comment
There was a problem hiding this comment.
The PR looks good. I'm suggesting minor changes, mainly in the documentation.
| | Field | Type | Description | | ||
| |-------|------|---------------------------------------------------| | ||
| | name | `String` | Name of the contributor | | ||
| | username | `String` | Username of the contributor | | ||
| | email | `String` | Email of the contributor | | ||
| | uuid | `String` | Unique identifier of the contributor | | ||
| | role | `String` | Role of the contributor. | | ||
| | source | `String` | Source of the identity. Always `git` for commits. | |
There was a problem hiding this comment.
Can you align the text and format of the table so it can be read when not rendered with a markdown reader?
| AUTHOR_P2P_REGEX = re.compile( | ||
| r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>" | ||
| ) |
There was a problem hiding this comment.
| AUTHOR_P2P_REGEX = re.compile( | |
| r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>" | |
| ) | |
| # Pair programming regex. Some matching examples are: | |
| # - John Smith, John Doe and Jane Rae <pairprogramming@example.com> | |
| # - John Smith, John Doe & Jane Rae <pairprogramming@example> | |
| # - John Smith and John Doe <pairpogramming@example> | |
| GIT_AUTHORS_REGEX = re.compile( | |
| r"(?P<first_authors>.+?)\s+(?:[aA][nN][dD]|&|\+)\s+(?P<last_author>.+?)\s+<(?P<email>[^>]+)>" | |
| ) |
This commit introduces functionality to extract and eventize identities from Git commit data. It handles authors, committers, and signers, creating events for each identity found. Signed-off-by: Jose Javier Merchante <jjmerchante@bitergia.com>
This commit updates the tests workflow to continue when the coveralls action fails as proposed by one of their incidents: https://status.coveralls.io/incidents/v5mcbrsbhgt4 Signed-off-by: Jose Javier Merchante <jjmerchante@bitergia.com>
8005822 to
51363d8
Compare
This PR introduces functionality to extract and eventize identities from Git commit data. It handles authors, committers, and signers, creating events for each identity found.