-
Notifications
You must be signed in to change notification settings - Fork 312
Add AI-powered release notes generator #5621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AI-powered release notes generator #5621
Conversation
Thank you for your contribution, a few high level comments before I go any further:
|
Agree with @rishabh6788 |
|
||
def main(): | ||
parser = argparse.ArgumentParser(description="Simple OpenSearch changelog processor") | ||
parser.add_argument("--token", help="GitHub token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search for GITHUB_TOKEN in the env variables rather than providing it as an input. Also it should not be needed for reading files. Checkout release_notes on existing approach.
As discussed, this is how your code should look like after restructuring.
@gaiksaya Feel free add your thoughts and comments. |
We already have a git client here: https://github.com/opensearch-project/opensearch-build/tree/main/src/git for git related activities. |
I'm good with reusing the release notes class but not included towards git because the current implementation works in the context of a checked out git repo, not using github apis, we need metadata, such as associated pull request numbers, for a git commit which afaik |
77c369f
to
2e37c0c
Compare
src/run_releasenotes_check.py
Outdated
# Generate AI-powered release notes | ||
check_if_exists_then_delete([table_filename, urls_filename]) | ||
|
||
def get_baseline_date_from_github_tag(current_version: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole logic is not required. Given we have a process to generate a releases after tag cut, just call GET /repos/{owner}/{repo}/releases/latest
and get the published_at
date and assume that is the baseline date for now.
src/run_releasenotes_check.py
Outdated
raise ValueError(f"Failed to determine baseline date: {e}. Please use --date to specify the start date.") | ||
|
||
# Get baseline date - use provided date or get from GitHub tag | ||
current_version = manifests[0].build.version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: current_release_version
src/run_releasenotes_check.py
Outdated
# Generate AI release notes for each component | ||
for i, manifest in enumerate(manifests): | ||
manifest_path = args.manifest[i].name if i < len(args.manifest) else None | ||
for component in manifest.components.select(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be for component in manifest.components.select(focus=components, platform='linux'):
where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components is args.components
.
src/run_releasenotes_check.py
Outdated
for component in manifest.components.select(): | ||
if hasattr(component, "repository"): | ||
# Filter by component if specified | ||
if args.component and args.component.lower() not in component.name.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can deleted after fixing the component parsing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet to do go through the code in depth but here are some overview pointers.
src/git/github_api_extension.py
Outdated
|
||
response = self._make_github_request(changelog_url) | ||
if response and response.status_code == 200: | ||
import base64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep all the imports at the top
src/git/github_api_extension.py
Outdated
def get_changelog_from_github(self) -> Optional[str]: | ||
"""Get CHANGELOG.md content from GitHub API.""" | ||
if not self.github_token: | ||
logging.warning("No GitHub token provided, cannot fetch changelog via API") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the warning logger be displayed during normal run mode or would it be only for debug mode?
src/git/github_api_extension.py
Outdated
page += 1 | ||
|
||
# Rate limiting | ||
time.sleep(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be enough?
src/git/github_api_extension.py
Outdated
if until_date: | ||
params["until"] = until_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? until_date by default should be now
except Exception as e: | ||
logging.warning(f"Failed to create Bedrock client: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to hard fail if client creation does not go through. Maybe change it to error instead of returning None
?
else: | ||
baseline_version = f"{major-1}.0.0" if major > 0 else "2.0.0" | ||
else: | ||
baseline_version = "3.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we decide on 3.0.0 as baseline? Also have we considered patch version?
import requests | ||
from bs4 import BeautifulSoup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: imports at the top
def _commit_release_notes(self, repo: GitRepository, component_name: str): | ||
"""Commit release notes changes.""" | ||
try: | ||
# Create release branch | ||
branch_name = f"{self.version}-release-notes" | ||
repo.execute(f"git checkout -b {branch_name}") | ||
|
||
# Add and commit | ||
repo.execute("git add release-notes/") | ||
repo.execute(f'git commit -m "Add {self.version} release notes for {component_name}"') | ||
|
||
logging.info(f"Committed release notes on branch {branch_name}") | ||
|
||
except Exception as e: | ||
logging.warning(f"Failed to commit release notes: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rishabh6788 Since we already have the token for retrieval wondering do we want to generate PR from here or let CI take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that it is easier in Ci because everything runs in the context of build repo as final release notes PR is raised in the build repo, however, here we will be creating PRs for each component repo, which is getting checked out in the context of GitRepository python class and gets deleted as soon as processing is over, it is much easier to add pull request logic in python code than jenkins.
parser.add_argument( | ||
"--baseline-date", | ||
help="Baseline date for commit analysis (format: YYYY-MM-DD)" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --date
on line 35 can be the baseline. Check out the description. Should be same Right?
As discussed, please move the LLM related code in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RileyJergerAmazon @rishabh6788 , This is a huge PR so can we please get some test cases?
Also would appreciate if the README is updated in both root and subfolder of workflows.
Thanks!
src/git/git_repository.py
Outdated
self.dir = os.path.realpath(self.temp_dir.name) | ||
self.dir = os.path.abspath(self.temp_dir.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for the change right here?
src/git/github_api_extension.py
Outdated
|
||
def get_changelog_from_github(self) -> Optional[str]: | ||
"""Get CHANGELOG.md content from GitHub API.""" | ||
if not self.github_token: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative path if github token not found and you cant use github api to retrieve? Also cant we simply use http to get the raw content of changelog.md? Why api?
src/git/github_api_extension.py
Outdated
page += 1 | ||
|
||
# Rate limiting | ||
time.sleep(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dont like using sleep here to process.
But if we really need this, probably sleep 1 instead of 0.1.
fa4ca8a
to
7eb1730
Compare
baseline_date = self.date | ||
try: | ||
# Check out the opensearch-build repository to get the last tag date | ||
with GitRepository( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this, like I mentioned, just call the releases api and it should give you latest release details without checking out the repo.
"""Generate AI-powered release notes for a component.""" | ||
with TemporaryDirectory(chdir=True) as work_dir: | ||
# Initialize processor with initial date | ||
processor = Processor(build_version, self.date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to re-initialize for each component, please move this in the __init__
method
release_notes = ReleaseNotesComponents.from_component(component, build_version, build_qualifier, repo.dir) | ||
|
||
# Try to fetch CHANGELOG.md directly from GitHub | ||
content = processor.fetch_changelog_from_github(component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be further simplified to below:
changelog_exist = os.path.isfile(os.path.join(repo.dir, 'CHANGELOG.md'))
.
Signed-off-by: Riley Jerger <[email protected]>
This is to fix the editing the commit made before. Signed-off-by: RileyJergerAmazon <[email protected]> Signed-off-by: Riley Jerger <[email protected]>
Did not mean to edit this file just want it to be the same as main. Signed-off-by: RileyJergerAmazon <[email protected]> Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]> Signed-off-by: Riley Jerger <[email protected]>
<[email protected]> Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
c3f502a
to
38c9aa0
Compare
src/git/git_commit_processor.py
Outdated
response.raise_for_status() | ||
return response.json() | ||
except requests.exceptions.RequestException as e: | ||
print(f"Error making request to {url}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a logger instead of print statement.
Thanks!
Converted print statements to logging statements in git_commit_processor.py, ai_release_notes_generator.py, and run_releasenotes_check.py for better log management. Signed-off-by: Riley Jerger <[email protected]>
Added unit tests for the GitHubCommitProcessor class and updated existing tests to accommodate the recent changes from print statements to logging. Signed-off-by: Riley Jerger <[email protected]>
Enhanced AI release notes generation with improved prompts, updated dependencies, and fixed test cases. Signed-off-by: Riley Jerger <[email protected]>
Enhanced git repository functionality, updated prompts for AI release notes, and improved test coverage. Signed-off-by: Riley Jerger <[email protected]>
Enhanced documentation for AI release notes generation, including prerequisites and improved prompt instructions for changelog parsing. Signed-off-by: Riley Jerger <[email protected]>
3b02ff9
to
3cc6a59
Compare
Signed-off-by: Rishabh Singh <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5621 +/- ##
==========================================
+ Coverage 96.45% 96.47% +0.01%
==========================================
Files 398 404 +6
Lines 17550 18372 +822
==========================================
+ Hits 16928 17724 +796
- Misses 622 648 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Rishabh Singh <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]>
6ad0500
to
278f719
Compare
Signed-off-by: Rishabh Singh <[email protected]>
278f719
to
90877ce
Compare
Description
Issues Resolved
Closes Generate release notes using LLMs for OS and OSD components. (#5614).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.