Jenkins Plugin PR Statistics Collector

This repository contains tools and automation for collecting and analyzing Pull Request (PR) statistics for Jenkins plugins. It helps track open, merged, and failing PRs across the Jenkins ecosystem.

Overview

The system collects PR data from GitHub repositories related to Jenkins plugins, processes this data, and uploads statistics to Google Sheets for analysis. The collection process runs both automatically (via GitHub Actions) and can be run manually when needed.

Scripts and Their Functions

Core Scripts

jenkins-pr-collector.go
- Main data collection script written in Go
- Queries GitHub's GraphQL API to fetch PR data for Jenkins plugins
- Usage: go run jenkins-pr-collector.go -start "YYYY-MM-DD" -end "YYYY-MM-DD" -output "output_file.json"
- Logs output to stdout/stderr for monitoring
collect-monthly.sh
- Collects PR data for a specific month
- Parameters:
  - YYYY-MM: Target month (optional, defaults to last month)
  - UPDATE_SHEETS: Boolean flag to update Google Sheets (optional, defaults to false)
- Creates monthly data files in data/monthly/
- Updates consolidated data files in data/consolidated/
- Usage: ./collect-monthly.sh "2024-03" true
- Logs progress and errors to stdout
count_prs.sh
- Counts pull requests for specified repositories
- Takes a text file containing repository names and a year
- Generates repository-specific PR statistics
- Usage: ./count_prs.sh repos.txt 2024
- Outputs counts to stdout and generates summary report
compute-stats.sh
- Generates detailed PR statistics for specific users
- Analyzes PR patterns and contributions
- Parameters:
  - List of GitHub usernames (comma-separated)
  - Date range (start and end dates)
- Usage: ./compute-stats.sh user1,user2 YYYY-MM-DD YYYY-MM-DD
- Outputs detailed statistics report
group-prs.sh
- Processes and groups PR data by title and status
- Called by collect-monthly.sh
- Requires plugins.json file for plugin information
- Usage: ./group-prs.sh "input_file.json" "plugins.json"
- Logs grouping statistics to stdout
retry-collection.sh
- Bulk data collection script with retry mechanism
- Collects data from July 2024 onwards
- Implements exponential backoff for failed attempts
- Updates Google Sheets only after all data is collected
- Usage: ./retry-collection.sh
- Logs retry attempts and progress to stdout

Supporting Scripts

upload_to_sheets.py
- Python script for uploading data to Google Sheets
- Requires Google Sheets API credentials
- Called by other scripts when UPDATE_SHEETS is true
- Logs upload status and any API errors to stdout

Directory Structure

``` . ├── data/ │ ├── monthly/ # Monthly PR data files │ ├── consolidated/ # Consolidated data files │ ├── archive/ # Archived data (older than 6 months) │ └── backup/ # Backup directory for data files ├── .github/ │ └── workflows/ # GitHub Actions workflow files ├── updatecli/ │ ├── updatecli.d/ # Updatecli manifests │ └── . # Configuration values for Updatecli └── scripts/ # Collection and processing scripts ```

Automated Workflows

PR Stats Workflow (`pr-stats.yml`)

Monthly Collection (2nd of each month)
- Runs full data collection for the previous month
- Updates consolidated statistics
- Updates Google Sheets
- Creates a backup of all data before running
- Logs available in GitHub Actions run history
- Expected duration: 15–30 minutes
Daily Updates (midnight UTC)
- Updates current month's data
- Updates open and failing PR statistics
- Updates Google Sheets with latest data
- Creates a backup of current data
- Logs available in GitHub Actions run history
- Expected duration: 5–10 minutes

Updatecli Workflow (`updatecli.yml`)

Daily Check (midnight UTC)
- Checks for updates to the top-250-plugins.csv file from the upstream source
- Creates a pull request when changes are detected
- Updates the local file with the latest content
- Logs available in GitHub Actions run history
- Expected duration: 1–2 minutes

Required Secrets and Permissions

The workflows require proper authentication to access GitHub's API. Set up the following:

GitHub Token:
- Go to repository Settings → Secrets and variables → Actions
- Add a new repository secret named GH_TOKEN or PAT_TOKEN
- Use a Personal Access Token (PAT) with the following permissions:
  - repo (Full repository access)
  - read:org (Read organization data)
  - read:user (Read user data)
- The token should have sufficient scope to access Jenkins organization repositories
Updatecli GitHub Token:
- For the Updatecli workflow, the default GITHUB_TOKEN is used
- No additional configuration is needed as the workflow uses the built-in token
Workflow Permissions:
- Go to repository Settings → Actions → General
- Under "Workflow permissions", select:
  - "Read and write permissions"
  - Check "Allow GitHub Actions to create and approve pull requests"

PR Collector Test (`pr-collector-test.yml`)

Runs every Tuesday at 07:18 UTC
Tests the PR collector functionality
Creates a pull request with updated statistics
Uses Docker for isolated testing environment
Logs available in GitHub Actions run history
Expected duration: 10–15 minutes

Logging and Monitoring

GitHub Actions Logs

All automated runs log their output to GitHub Actions
Access logs through the "Actions" tab in the repository
Logs are retained for 90 days
Each run includes:
- Setup steps
- Script execution output
- Error messages (if any)
- Completion status

Data Collection Logs

Scripts log to stdout/stderr
Key information logged includes:
- Start and end times of operations
- Number of PRs processed
- API rate limit status
- Error messages and retry attempts
- Google Sheets update status

Monitoring Points

GitHub Actions Status
- Check Actions tab for failed runs
- Review logs for rate limit warnings
- Verify backup creation
Data Integrity
- Verify monthly files are created
- Check consolidated data updates
- Confirm Google Sheets updates
Storage Management
- Monitor backup directory size
- Check archive rotation
- Verify data retention policies

Getting Started

Initial Setup

Clone the repository:

git clone https://github.com/your-org/alpha-omega-stats.git
cd alpha-omega-stats

Install dependencies:

# Go dependencies
go mod download

# Python dependencies
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

Set up credentials:
- Create a GitHub token with necessary permissions
- Set up Google Sheets API credentials
- Configure environment variables as needed

Running Data Collection

For Initial Data Collection

# This will collect all data from July 2024 onwards
./retry-collection.sh

For Monthly Maintenance

# Collect data for a specific month
./collect-monthly.sh "YYYY-MM" true

Example Usage

Here are some common usage examples:

# Collect PR data for March 2024 and update Google Sheets
./collect-monthly.sh "2024-03" true

# Process and group PRs from a JSON file
./group-prs.sh "data/monthly/prs_2024_03.json" "plugins.json"

# Collect historical data with automatic retries
./retry-collection.sh

Command Explanations

Monthly Collection

The collect-monthly.sh script collects PR data for a specific month:

First argument: Month in YYYY-MM format (optional, defaults to last month)
Second argument: Whether to update Google Sheets (optional, defaults to false)

Group PRs

The group-prs.sh script organizes pull requests by plugin:

First argument: JSON file containing PR data
Second argument: Plugin configuration file

Retry Collection

The retry-collection.sh script performs bulk data collection with retry mechanism:

No arguments required
Collects all data from July 2024 onwards
Implements automatic retries with exponential backoff

Sequence Diagram(s)

sequenceDiagram
    participant GitHub as GitHub Actions
    participant Runner as Workflow Runner
    participant Checkout as Checkout Code
    participant EnvSetup as Environment Setup (Go, Python, CLI)
    participant Script as Collection/Update Script
    participant Artifact as Data Artifacts

    GitHub->>Runner: Trigger workflow (Scheduled/Manual)
    Runner->>Checkout: Checkout repository
    Runner->>EnvSetup: Set up environments & install dependencies (jq, GitHub CLI, Python deps)
    EnvSetup-->>Runner: Environment ready
    Runner->>Script: Execute script based on event type
    alt Scheduled Monthly
        Script->>Script: Run collect-monthly.sh
    else Daily/Manual
        Script->>Script: Run update-daily.sh
    end
    Script->>Artifact: Upload updated PR JSON artifacts
    Artifact-->>Runner: Artifacts stored

sequenceDiagram
    participant Client as GraphQLClient
    participant API as GitHub GraphQL API
    participant Retry as Retry Logic
    participant Storage as Partial Data Storage

    Client->>API: Execute GraphQL query
    API-->>Client: Response/Error
    alt Error is retryable?
        Client->>Retry: Call isRetryableError
        Retry-->>Client: Error qualifies, initiate exponential backoff
        loop Up to max attempts
            Client->>API: Retry GraphQL query
        end
    else Successful Response
        Client->>Storage: Save partial data if needed
        Client-->>Client: Process and return data
    end

Maintenance Tasks

Monthly Tasks

Check the automated collection ran successfully on the 2nd
- Review GitHub Actions logs
- Verify data files are created
- Check Google Sheets updates
Verify data in Google Sheets is updated
- Check latest data timestamp
- Verify all sheets are updated
- Review data consistency
Review any failed collections in the GitHub Actions logs
- Check for rate limit issues
- Review error messages
- Plan retries if needed

As-Needed Tasks

Review and clean up archived data
- Verify archive rotation
- Check storage usage
- Clean up old backups
Verify backup integrity
- Test backup restoration
- Check backup completeness
- Update backup strategy if needed
Update dependencies as needed
- Check for security updates
- Review dependency versions
- Test updates in development

Troubleshooting

Rate Limiting
- The scripts include built-in retry mechanisms with exponential backoff
- Check GitHub API quota in the logs
- Adjust collection timing if needed
- Monitor rate limit headers in responses
Failed Collections
- Check the logs in data/monthly/ for specific errors
- Use retry-collection.sh to retry failed periods
- Verify GitHub token permissions
- Review network connectivity issues
Google Sheets Issues
- Verify API credentials are valid
- Check Python virtual environment is activated
- Review logs for API errors
- Verify sheet permissions
Data Inconsistencies
- Compare monthly and consolidated data
- Check for missing or duplicate entries
- Verify data format consistency
- Review archive integrity

Contributing

Fork the repository
Create a feature branch
Submit a pull request with a clear description of changes

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,129 Commits
.claude		.claude
.github		.github
cmd/find-junit5-prs		cmd/find-junit5-prs
data		data
github-profile-tools		github-profile-tools
updatecli		updatecli
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
Dockerfile		Dockerfile
README.md		README.md
all_results.json		all_results.json
analyze-junit5-prs.sh		analyze-junit5-prs.sh
build-debug.log		build-debug.log
check-env.sh		check-env.sh
collect-monthly.sh		collect-monthly.sh
compute-stats.sh		compute-stats.sh
compute_logs.txt		compute_logs.txt
count_prs.sh		count_prs.sh
entrypoint.sh		entrypoint.sh
export_sheet_to_tsv.py		export_sheet_to_tsv.py
failing-prs.json		failing-prs.json
failing-prs.sh		failing-prs.sh
fetch-logs.txt		fetch-logs.txt
fetch_prs_debug.log		fetch_prs_debug.log
filter-prs.sh		filter-prs.sh
filtered_prs_filtered_prs_2024_06.json		filtered_prs_filtered_prs_2024_06.json
filtered_prs_filtered_prs_2024_07.json		filtered_prs_filtered_prs_2024_07.json
filtered_prs_filtered_prs_2024_08.json		filtered_prs_filtered_prs_2024_08.json
filtered_prs_filtered_prs_2024_09.json		filtered_prs_filtered_prs_2024_09.json
filtered_prs_filtered_prs_2024_10.json		filtered_prs_filtered_prs_2024_10.json
filtered_prs_filtered_prs_2024_11.json		filtered_prs_filtered_prs_2024_11.json
filtered_prs_filtered_prs_2024_12.json		filtered_prs_filtered_prs_2024_12.json
filtered_prs_filtered_prs_2025_01.json		filtered_prs_filtered_prs_2025_01.json
filtered_prs_filtered_prs_2025_02.json		filtered_prs_filtered_prs_2025_02.json
filtered_prs_filtered_prs_2025_03.json		filtered_prs_filtered_prs_2025_03.json
filtered_prs_filtered_prs_2025_04.json		filtered_prs_filtered_prs_2025_04.json
filtered_prs_filtered_prs_2025_05.json		filtered_prs_filtered_prs_2025_05.json
filtered_prs_filtered_prs_2025_06.json		filtered_prs_filtered_prs_2025_06.json
filtered_prs_filtered_prs_2025_07.json		filtered_prs_filtered_prs_2025_07.json
filtered_prs_filtered_prs_2025_08.json		filtered_prs_filtered_prs_2025_08.json
filtered_prs_filtered_prs_2025_09.json		filtered_prs_filtered_prs_2025_09.json
filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-15.json		filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-15.json
filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-16.json		filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-16.json
filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-22.json		filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-22.json
filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-13.json		filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-13.json
filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-26.json		filtered_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-26.json
filtered_prs_prs_gounthar_and_others_2025-01-01_to_2025-01-15.json		filtered_prs_prs_gounthar_and_others_2025-01-01_to_2025-01-15.json
filtered_prs_report.json		filtered_prs_report.json
find-junit5-prs.go.sh		find-junit5-prs.go.sh
find-junit5-prs.sh		find-junit5-prs.sh
found_prs.json		found_prs.json
generate-report.sh		generate-report.sh
get-jdk-versions.sh		get-jdk-versions.sh
get-most-popular-plugins.sh		get-most-popular-plugins.sh
get-recent-releases.sh		get-recent-releases.sh
github-user-analyzer		github-user-analyzer
github-user-analyzer-debug.log		github-user-analyzer-debug.log
github-user-analyzer.exe		github-user-analyzer.exe
go.mod		go.mod
go.sum		go.sum
group-prs.sh		group-prs.sh
grouped_prs_filtered_prs_2024_01.json		grouped_prs_filtered_prs_2024_01.json
grouped_prs_filtered_prs_2024_06.json		grouped_prs_filtered_prs_2024_06.json
grouped_prs_filtered_prs_2024_07.json		grouped_prs_filtered_prs_2024_07.json
grouped_prs_filtered_prs_2024_08.json		grouped_prs_filtered_prs_2024_08.json
grouped_prs_filtered_prs_2024_09.json		grouped_prs_filtered_prs_2024_09.json
grouped_prs_filtered_prs_2024_10.json		grouped_prs_filtered_prs_2024_10.json
grouped_prs_filtered_prs_2024_11.json		grouped_prs_filtered_prs_2024_11.json
grouped_prs_filtered_prs_2024_12.json		grouped_prs_filtered_prs_2024_12.json
grouped_prs_filtered_prs_2025_01.json		grouped_prs_filtered_prs_2025_01.json
grouped_prs_filtered_prs_2025_02.json		grouped_prs_filtered_prs_2025_02.json
grouped_prs_filtered_prs_2025_03.json		grouped_prs_filtered_prs_2025_03.json
grouped_prs_filtered_prs_2025_04.json		grouped_prs_filtered_prs_2025_04.json
grouped_prs_filtered_prs_2025_05.json		grouped_prs_filtered_prs_2025_05.json
grouped_prs_filtered_prs_2025_06.json		grouped_prs_filtered_prs_2025_06.json
grouped_prs_filtered_prs_2025_07.json		grouped_prs_filtered_prs_2025_07.json
grouped_prs_filtered_prs_2025_08.json		grouped_prs_filtered_prs_2025_08.json
grouped_prs_filtered_prs_2025_09.json		grouped_prs_filtered_prs_2025_09.json
grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-15.json		grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-15.json
grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-16.json		grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-16.json
grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-22.json		grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-01-22.json
grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-13.json		grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-13.json
grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-26.json		grouped_prs_prs_gounthar_and_others_2024-12-01_to_2025-02-26.json
grouped_prs_prs_gounthar_and_others_2025-01-01_to_2025-01-15.json		grouped_prs_prs_gounthar_and_others_2025-01-01_to_2025-01-15.json
grouped_prs_report.json		grouped_prs_report.json
install-jdk-versions.sh		install-jdk-versions.sh
jdk-25-build-results-extended.csv		jdk-25-build-results-extended.csv
jdk-25-build-results.csv		jdk-25-build-results.csv
jenkins-csp-december-2024-report.md		jenkins-csp-december-2024-report.md
jenkins-csp-january-2025-report.md		jenkins-csp-january-2025-report.md
jenkins-pr-collector		jenkins-pr-collector
jenkins-pr-collector-readme.md		jenkins-pr-collector-readme.md
jenkins-pr-collector.go		jenkins-pr-collector.go
jenkins_prs.json		jenkins_prs.json
junit5-migration-prs.sh		junit5-migration-prs.sh
junit5_candidate_prs.txt		junit5_candidate_prs.txt

Folders and files

Latest commit

History

Repository files navigation

Jenkins Plugin PR Statistics Collector

Overview

Scripts and Their Functions

Core Scripts

Supporting Scripts

Directory Structure

Automated Workflows

PR Stats Workflow (pr-stats.yml)

Updatecli Workflow (updatecli.yml)

Required Secrets and Permissions

PR Collector Test (pr-collector-test.yml)

Logging and Monitoring

GitHub Actions Logs

Data Collection Logs

Monitoring Points

Getting Started

Initial Setup

Running Data Collection

For Initial Data Collection

For Monthly Maintenance

Example Usage

Command Explanations

Monthly Collection

Group PRs

Retry Collection

Sequence Diagram(s)

Maintenance Tasks

Monthly Tasks

As-Needed Tasks

Troubleshooting

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

PR Stats Workflow (`pr-stats.yml`)

Updatecli Workflow (`updatecli.yml`)

PR Collector Test (`pr-collector-test.yml`)

Packages