-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/extract-notes: Implement note extraction per version #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this is, overall, great. I have some abstract thoughts about the direction I'd like to go with this, but I don't necessarily think we should hold this for them. I also left a few minor nitpicks on the implementation here; you can take them as suggestions, they're not required for merge. Just things I noticed that stuck out to me.
I think, overall, how I'd like a package like parser
to exist is to be a real, actual parser, not a regular expression. Regular expressions are tricky to maintain and parse; just today we had a release snafu because the regex powering the changelog parsing wasn't flexible enough to account for a valid changelog.
That being said, I think any parser package should be flexible enough to deal with different types of CHANGELOGs. For example, different header markers, having the word "Release" before the version name, whatever.
My thoughts on that are nascent, but it feels like a configurable "section break" token could allow us to split into sections (in this example, it may be \n##
), a "header break" token could allow us to split into header and body (in this case, it could just be the first \n
in the section) and then a "version" token could allow us to parse the version out (in this case, it could be the first v
in the header).
None of this is to suggest that you do this, I just wanted to record my thoughts as I read through the PR somewhere, as it seems like an interesting and potentially more reliable design direction to head in for a parser
package when I or anyone else gets some time to spend on it.
But in the interest of pragmatism, I think this is a fine start and unlocks some nice usage with fairly minimal cost. As we're on version 0 still, (technically, no releases) I think we can break the interface of the parser package when we get around to it, if necessary, to make some of these changes.
I will respond to each inline comment individually, but to your comment about regex - I understand and agree. My choice of regex came purely from the perspective of it being universal enough to deal with any changelog format. I assumed that universal changelog format was something we want to support by looking at the logic which generates changelog - i.e. I don't see any Markdown specific logic - just plain text templates that can contain anything, even invalid Markdown (for better or worse). In other words this was a practical choice of trying to provide parity/compatibility with the current state of the generating part of go-changelog. If we do go down the road of the parser understanding/prescribing certain formats like Markdown - which I think is a good idea and makes the whole thing less fragile and more maintainable - I think we should first ensure that we're actually generating a valid and parseable changelog. That doesn't mean we have to be generating changelog dynamically and abandon the TL;DR I went for consistency, rather than best solution here. |
I like the section break token idea, but I'm not sure whether parsing the header/version the way you're suggesting would be accurate enough. I would probably still advocate for a regexp pattern there, which should however be much more readable as it would only need to match against a single line. So perhaps to maintain the balance between fragility, readability and accuracy we need to leverage both real Markdown parser and regexp? 🤔 |
This would be super helpful to remove shell scripting workarounds like:
|
minor cleanup of changelog-gen command, rename to changelog-entry
Use case
Some projects may not be at such a scale that they would want or need to adopt any changelog generation mechanism. They may however still benefit from some automation when uploading the relevant changelog at release time, for example to GitHub Releases.
The output is assumed to match the output of
changelog-build
, given the default template and default regular expression in the parser.Implementation Details
The parser was designed with some other future use cases in mind - e.g.
SectionRange().BodyRange
can be used to identify the range where notes are and replace the relevant section of changelog; This allows regenerating an "unreleased" changelog straight after merging a PR.SectionRange().HeaderRange
can be used to identify the range of header and replace(Unreleased)
with a release dateNext version (Unreleased)
withVERSION (RELEASE-DATE)
The parser is as accurate as the regular expression that drives it, but I think the default one is flexible enough (basically just assumes valid Markdown
##
header, no particular versioning) to cover format that most HashiCorp projects use for their Changelogs.Usage
This also allows easy plugging into GoReleaser's interface, e.g.