Skip to content

Conversation

RileyJergerAmazon
Copy link
Contributor

Description

  • Add simple_changelog_processor.py to src/release_notes_workflow/
  • Implements OpenSearch release notes generation using AI analysis
  • Follows existing codebase patterns with proper imports and structure
  • Uses TemporaryDirectory and GitRepository for safe operations
  • Integrates with AWS Bedrock Claude for intelligent categorization
  • Supports both API-based and git-based repository processing
  • Includes comprehensive error handling and retry logic

Issues Resolved

Closes Generate release notes using LLMs for OS and OSD components. (#5614).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rishabh6788
Copy link
Collaborator

Thank you for your contribution, a few high level comments before I go any further:

@gaiksaya
Copy link
Member

gaiksaya commented Jul 7, 2025

Agree with @rishabh6788
Instead of a bulky single script, lets try to make it object oriented. Its easier to maintain, test (need to add it) and also extend in future.
Would also recommend to use logging instead of print statements.
Also I think we can take input manifest as the input and iterate over all the repos. This can be done at CI level where everything can run in parallel or here at the workflow level.


def main():
parser = argparse.ArgumentParser(description="Simple OpenSearch changelog processor")
parser.add_argument("--token", help="GitHub token")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search for GITHUB_TOKEN in the env variables rather than providing it as an input. Also it should not be needed for reading files. Checkout release_notes on existing approach.

@rishabh6788
Copy link
Collaborator

rishabh6788 commented Jul 7, 2025

As discussed, this is how your code should look like after restructuring.

  • Starts with run_generate_change_log.py which will have the main() method, something like:
 def main():
      """Main workflow orchestration following established patterns."""
      # Parse arguments
      args = ReleaseNotesArgs()

      # Configure logging
      console.configure(level=logging.INFO if args.verbose else logging.WARNING)

      # Load manifest
      manifest = InputManifest.from_file(args.manifest)

     # Start processing in a temp directory context that gets cleaned up after execution
      with TemporaryDirectory(keep=args.keep, chdir=True) as work_dir:

          # Inside the temp dir context loop over all components and filter if component args are provided
          for component in manifest.components.select(focus=components, platform=target.platform):
            logging.info(f"Building {component.name}")
            # Make the decision here whether you want to go CHANGELOG route or commit route
            See https://github.com/opensearch-project/opensearch-build/blob/main/src/run_build.py#L87C9-L103
     
...
...
...
  • You can have a github client class to handle all git related calls, like:
class GitHubClient:
      """GitHub API client with rate limiting and error handling."""

      def __init__(self, token: str):
          self.token = token
          self.headers = {"Authorization": f"token {token}"}
          self.session = requests.Session()
          self.session.headers.update(self.headers)

      def get_changelog(self, repo_name: str) -> Optional[str]:
          # process change log code

      def get_commits_since_date(self, repo_name: str, since_date: str) -> List[Dict]:
         # pass

      def _make_request(self, url: str, params: dict = None) -> Optional[requests.Response]:
          pass
  • Data handling and processing can have changelog processing method and commit processing method, like:
class DataProcessor():

      def process_changelog(self, repo_name: str, data: str) -> Dict[str, Any]:
          """Process changelog data."""
          pass

      def extract_commits(self, repo_name: str, baseline_date: str) -> List[Dict]:
          """Extract commits since baseline."""
          pass
  • The main execution can happen inside AIReleaseNotesGenerator class:
class AIReleaseNotesGenerator():
      """AI-powered release notes generator using AWS Bedrock."""

      def __init__(<insert params>):
          self.bedrock_client = self._create_bedrock_client()
          self.github_client = GitHubClient(target.github_token)

      def generate_release_notes(self, repositories: List[str]) -> Dict[str, Any]:
          """Generate AI-powered release notes."""

      def process_repository(self, repo_name: str) -> Dict[str, Any]:
          """Process individual repository and make decision on changelog vs commit"""
          

@gaiksaya Feel free add your thoughts and comments.

@gaiksaya
Copy link
Member

gaiksaya commented Jul 7, 2025

We already have a git client here: https://github.com/opensearch-project/opensearch-build/tree/main/src/git for git related activities.
Check if we can add to it or extend it rather than reinventing for another use case.
Also for releaseNotesGenerator mayeb add an option of AI to existing workflow https://github.com/opensearch-project/opensearch-build/blob/main/src/release_notes_workflow/release_notes.py (main class + args, etc)

@rishabh6788
Copy link
Collaborator

rishabh6788 commented Jul 7, 2025

We already have a git client here: https://github.com/opensearch-project/opensearch-build/tree/main/src/git for git related activities. Check if we can add to it or extend it rather than reinventing for another use case. Also for releaseNotesGenerator mayeb add an option of AI to existing workflow https://github.com/opensearch-project/opensearch-build/blob/main/src/release_notes_workflow/release_notes.py (main class + args, etc)

I'm good with reusing the release notes class but not included towards git because the current implementation works in the context of a checked out git repo, not using github apis, we need metadata, such as associated pull request numbers, for a git commit which afaik git log will not give, nor there is any git command since pull request is github concept and not git.

@RileyJergerAmazon RileyJergerAmazon force-pushed the AI_release_notes_generation branch 2 times, most recently from 77c369f to 2e37c0c Compare July 8, 2025 22:35
# Generate AI-powered release notes
check_if_exists_then_delete([table_filename, urls_filename])

def get_baseline_date_from_github_tag(current_version: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole logic is not required. Given we have a process to generate a releases after tag cut, just call GET /repos/{owner}/{repo}/releases/latest and get the published_at date and assume that is the baseline date for now.

raise ValueError(f"Failed to determine baseline date: {e}. Please use --date to specify the start date.")

# Get baseline date - use provided date or get from GitHub tag
current_version = manifests[0].build.version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: current_release_version

# Generate AI release notes for each component
for i, manifest in enumerate(manifests):
manifest_path = args.manifest[i].name if i < len(args.manifest) else None
for component in manifest.components.select():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be for component in manifest.components.select(focus=components, platform='linux'): where.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

components is args.components.

for component in manifest.components.select():
if hasattr(component, "repository"):
# Filter by component if specified
if args.component and args.component.lower() not in component.name.lower():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can deleted after fixing the component parsing logic.

Copy link
Member

@gaiksaya gaiksaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet to do go through the code in depth but here are some overview pointers.


response = self._make_github_request(changelog_url)
if response and response.status_code == 200:
import base64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep all the imports at the top

def get_changelog_from_github(self) -> Optional[str]:
"""Get CHANGELOG.md content from GitHub API."""
if not self.github_token:
logging.warning("No GitHub token provided, cannot fetch changelog via API")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the warning logger be displayed during normal run mode or would it be only for debug mode?

page += 1

# Rate limiting
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be enough?

Comment on lines 62 to 63
if until_date:
params["until"] = until_date
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? until_date by default should be now

Comment on lines 52 to 54
except Exception as e:
logging.warning(f"Failed to create Bedrock client: {e}")
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to hard fail if client creation does not go through. Maybe change it to error instead of returning None?

else:
baseline_version = f"{major-1}.0.0" if major > 0 else "2.0.0"
else:
baseline_version = "3.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we decide on 3.0.0 as baseline? Also have we considered patch version?

Comment on lines 349 to 350
import requests
from bs4 import BeautifulSoup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imports at the top

Comment on lines 473 to 487
def _commit_release_notes(self, repo: GitRepository, component_name: str):
"""Commit release notes changes."""
try:
# Create release branch
branch_name = f"{self.version}-release-notes"
repo.execute(f"git checkout -b {branch_name}")

# Add and commit
repo.execute("git add release-notes/")
repo.execute(f'git commit -m "Add {self.version} release notes for {component_name}"')

logging.info(f"Committed release notes on branch {branch_name}")

except Exception as e:
logging.warning(f"Failed to commit release notes: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh6788 Since we already have the token for retrieval wondering do we want to generate PR from here or let CI take care of it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that it is easier in Ci because everything runs in the context of build repo as final release notes PR is raised in the build repo, however, here we will be creating PRs for each component repo, which is getting checked out in the context of GitRepository python class and gets deleted as soon as processing is over, it is much easier to add pull request logic in python code than jenkins.

Comment on lines 56 to 59
parser.add_argument(
"--baseline-date",
help="Baseline date for commit analysis (format: YYYY-MM-DD)"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --date on line 35 can be the baseline. Check out the description. Should be same Right?

@rishabh6788
Copy link
Collaborator

As discussed, please move the LLM related code in llms directory and move all the prompts to a constant prompts.py file in the llms repo.
Also, simplify the generate method in ReleaseNotes class as per below:

def generate(self, component: InputComponentFromSource, build_version: str, build_qualifier: str):
        with TemporaryDirectory(chdir=True) as work_dir:
            with GitRepository(
                    component.repository,
                    component.ref,
                    os.path.join(work_dir.name, component.name),
                    component.working_directory
            ) as repo:
                
                release_notes = ReleaseNotesComponents.from_component(component, build_version, build_qualifier, repo.dir)

                if os.path.isfile(os.path.join(repo.dir, 'CHANGELOG.md')):
                    # Do all the Changelog related LLM stuff here, add file in repos release-notes folder
                else:
                    # get the git commit messages and fo the LLM stuff here, add file in repo release-notes folder
               create_pr()

Copy link
Member

@peterzhuamazon peterzhuamazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RileyJergerAmazon @rishabh6788 , This is a huge PR so can we please get some test cases?

Also would appreciate if the README is updated in both root and subfolder of workflows.

Thanks!

Comment on lines 31 to 32
self.dir = os.path.realpath(self.temp_dir.name)
self.dir = os.path.abspath(self.temp_dir.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for the change right here?


def get_changelog_from_github(self) -> Optional[str]:
"""Get CHANGELOG.md content from GitHub API."""
if not self.github_token:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative path if github token not found and you cant use github api to retrieve? Also cant we simply use http to get the raw content of changelog.md? Why api?

page += 1

# Rate limiting
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dont like using sleep here to process.
But if we really need this, probably sleep 1 instead of 0.1.

@RileyJergerAmazon RileyJergerAmazon force-pushed the AI_release_notes_generation branch 2 times, most recently from fa4ca8a to 7eb1730 Compare July 14, 2025 18:32
baseline_date = self.date
try:
# Check out the opensearch-build repository to get the last tag date
with GitRepository(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this, like I mentioned, just call the releases api and it should give you latest release details without checking out the repo.

"""Generate AI-powered release notes for a component."""
with TemporaryDirectory(chdir=True) as work_dir:
# Initialize processor with initial date
processor = Processor(build_version, self.date)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to re-initialize for each component, please move this in the __init__ method

release_notes = ReleaseNotesComponents.from_component(component, build_version, build_qualifier, repo.dir)

# Try to fetch CHANGELOG.md directly from GitHub
content = processor.fetch_changelog_from_github(component)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be further simplified to below:
changelog_exist = os.path.isfile(os.path.join(repo.dir, 'CHANGELOG.md')).

RileyJergerAmazon and others added 8 commits July 21, 2025 20:51
Signed-off-by: Riley Jerger <[email protected]>
This is to fix the editing the commit made before.

Signed-off-by: RileyJergerAmazon <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Did not mean to edit this file just want it to be the same as main.

Signed-off-by: RileyJergerAmazon <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
Signed-off-by: Riley Jerger <[email protected]>
@RileyJergerAmazon RileyJergerAmazon force-pushed the AI_release_notes_generation branch from c3f502a to 38c9aa0 Compare July 21, 2025 20:53
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error making request to {url}: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a logger instead of print statement.
Thanks!

Converted print statements to logging statements in git_commit_processor.py, ai_release_notes_generator.py, and run_releasenotes_check.py for better log management.

Signed-off-by: Riley Jerger <[email protected]>
Added unit tests for the GitHubCommitProcessor class and updated existing tests to accommodate the recent changes from print statements to logging.

Signed-off-by: Riley Jerger <[email protected]>
Enhanced AI release notes generation with improved prompts, updated dependencies, and fixed test cases.

Signed-off-by: Riley Jerger <[email protected]>
Enhanced git repository functionality, updated prompts for AI release notes, and improved test coverage.

Signed-off-by: Riley Jerger <[email protected]>
Enhanced documentation for AI release notes generation, including prerequisites and improved prompt instructions for changelog parsing.

Signed-off-by: Riley Jerger <[email protected]>
@rishabh6788 rishabh6788 force-pushed the AI_release_notes_generation branch from 3b02ff9 to 3cc6a59 Compare July 28, 2025 23:04
Copy link

codecov bot commented Jul 29, 2025

Codecov Report

❌ Patch coverage is 87.81559% with 111 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.47%. Comparing base (3ec12d5) to head (90877ce).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/run_releasenotes_check.py 11.76% 75 Missing ⚠️
src/git/git_commit_processor.py 78.08% 32 Missing ⚠️
src/release_notes_workflow/release_notes.py 97.61% 1 Missing ⚠️
src/validation_workflow/rpm/validation_rpm.py 0.00% 1 Missing ⚠️
tests/tests_git/test_git_commit_processor.py 99.27% 1 Missing ⚠️
...ests/tests_llms/test_ai_release_notes_generator.py 99.51% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5621      +/-   ##
==========================================
+ Coverage   96.45%   96.47%   +0.01%     
==========================================
  Files         398      404       +6     
  Lines       17550    18372     +822     
==========================================
+ Hits        16928    17724     +796     
- Misses        622      648      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Rishabh Singh <[email protected]>
Signed-off-by: Rishabh Singh <[email protected]>
@rishabh6788 rishabh6788 force-pushed the AI_release_notes_generation branch 2 times, most recently from 6ad0500 to 278f719 Compare July 29, 2025 21:38
Signed-off-by: Rishabh Singh <[email protected]>
@rishabh6788 rishabh6788 force-pushed the AI_release_notes_generation branch from 278f719 to 90877ce Compare July 29, 2025 21:46
@rishabh6788 rishabh6788 merged commit 1a5523f into opensearch-project:main Jul 29, 2025
16 of 17 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In Review to ✅ Done in Engineering Effectiveness Board Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants