Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/extract-notes: Implement note extraction per version #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

radeksimko
Copy link
Member

Use case

Some projects may not be at such a scale that they would want or need to adopt any changelog generation mechanism. They may however still benefit from some automation when uploading the relevant changelog at release time, for example to GitHub Releases.

The output is assumed to match the output of changelog-build, given the default template and default regular expression in the parser.

Implementation Details

The parser was designed with some other future use cases in mind - e.g.

  • SectionRange().BodyRange can be used to identify the range where notes are and replace the relevant section of changelog; This allows regenerating an "unreleased" changelog straight after merging a PR.
  • SectionRange().HeaderRange can be used to identify the range of header and replace (Unreleased) with a release date
    • possibly even automate Changelog updates in Vagrant's style - i.e. replace Next version (Unreleased) with VERSION (RELEASE-DATE)

The parser is as accurate as the regular expression that drives it, but I think the default one is flexible enough (basically just assumes valid Markdown ## header, no particular versioning) to cover format that most HashiCorp projects use for their Changelogs.

Usage

terraform-provider-google $ changelog-extract-notes -version 3.15.0
FEATURES:
...
IMPROVEMENTS:
...
BUG FIXES:
...

This also allows easy plugging into GoReleaser's interface, e.g.

$ goreleaser --release-notes <(changelog-extract-notes -version $GIT_TAG)

Copy link
Contributor

@paddycarver paddycarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is, overall, great. I have some abstract thoughts about the direction I'd like to go with this, but I don't necessarily think we should hold this for them. I also left a few minor nitpicks on the implementation here; you can take them as suggestions, they're not required for merge. Just things I noticed that stuck out to me.

I think, overall, how I'd like a package like parser to exist is to be a real, actual parser, not a regular expression. Regular expressions are tricky to maintain and parse; just today we had a release snafu because the regex powering the changelog parsing wasn't flexible enough to account for a valid changelog.

That being said, I think any parser package should be flexible enough to deal with different types of CHANGELOGs. For example, different header markers, having the word "Release" before the version name, whatever.

My thoughts on that are nascent, but it feels like a configurable "section break" token could allow us to split into sections (in this example, it may be \n## ), a "header break" token could allow us to split into header and body (in this case, it could just be the first \n in the section) and then a "version" token could allow us to parse the version out (in this case, it could be the first v in the header).

None of this is to suggest that you do this, I just wanted to record my thoughts as I read through the PR somewhere, as it seems like an interesting and potentially more reliable design direction to head in for a parser package when I or anyone else gets some time to spend on it.

But in the interest of pragmatism, I think this is a fine start and unlocks some nice usage with fairly minimal cost. As we're on version 0 still, (technically, no releases) I think we can break the interface of the parser package when we get around to it, if necessary, to make some of these changes.

@radeksimko
Copy link
Member Author

I will respond to each inline comment individually, but to your comment about regex - I understand and agree.

My choice of regex came purely from the perspective of it being universal enough to deal with any changelog format. I assumed that universal changelog format was something we want to support by looking at the logic which generates changelog - i.e. I don't see any Markdown specific logic - just plain text templates that can contain anything, even invalid Markdown (for better or worse).

In other words this was a practical choice of trying to provide parity/compatibility with the current state of the generating part of go-changelog.

If we do go down the road of the parser understanding/prescribing certain formats like Markdown - which I think is a good idea and makes the whole thing less fragile and more maintainable - I think we should first ensure that we're actually generating a valid and parseable changelog. That doesn't mean we have to be generating changelog dynamically and abandon the text/template, but we should at least run the output through the same markdown parser which would be then parsing it to validate that it's actually parseable.

TL;DR I went for consistency, rather than best solution here.

@radeksimko
Copy link
Member Author

My thoughts on that are nascent, but it feels like a configurable "section break" token could allow us to split into sections (in this example, it may be \n## ), a "header break" token could allow us to split into header and body (in this case, it could just be the first \n in the section) and then a "version" token could allow us to parse the version out (in this case, it could be the first v in the header).

I like the section break token idea, but I'm not sure whether parsing the header/version the way you're suggesting would be accurate enough. I would probably still advocate for a regexp pattern there, which should however be much more readable as it would only need to match against a single line.

So perhaps to maintain the balance between fragility, readability and accuracy we need to leverage both real Markdown parser and regexp? 🤔

@bflad
Copy link
Contributor

bflad commented Oct 15, 2020

This would be super helpful to remove shell scripting workarounds like:

sed -n -e "1{/# /d;}" -e "2{/^$/d;}" -e "/# $(git describe --abbrev=0 --exclude="$(git describe --abbrev=0 --match='v*.*.*' --tags)" --match='v*.*.*' --tags | tr -d v)/q;p" CHANGELOG.md > /tmp/RELEASE-NOTES.md

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

mikemorris pushed a commit to mikemorris/go-changelog that referenced this pull request Apr 6, 2022
minor cleanup of changelog-gen command, rename to changelog-entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants