This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
- Monthly PR collection:
./collect-monthly.sh "YYYY-MM" [true/false]- Collects PR data for specified month, optional Google Sheets update - Daily updates:
./update-daily.sh- Updates current month's data and failing PR statistics - Bulk collection with retries:
./retry-collection.sh- Collects all data from July 2024 onwards with exponential backoff - JDK 25 testing:
./test-jdk-25.sh- Tests top 250 plugins with JDK 25, writes results to CSV - Install JDK versions:
./install-jdk-versions.sh- Installs multiple JDK versions for compatibility testing
- Go dependencies:
go mod download && go mod tidy - Go build:
go build jenkins-pr-collector.go- Builds the main collector binary - Go run directly:
go run jenkins-pr-collector.go -start YYYY-MM-DD -end YYYY-MM-DD -output file.json - Python environment:
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt - Environment check:
./check-env.sh- Validates required tools and credentials
- Find JUnit 5 migration PRs:
./find-junit5-prs.sh- Searches for JUnit 5 migration-related PRs - Test plugin builds:
./test-pr-builds.sh- Tests building plugins from PR branches - Validate plugin list:
./validate-top-plugins.sh- Validates top-250-plugins.csv format - Analyze JUnit 5 PRs:
./analyze-junit5-prs.sh- Analyzes JUnit 5 migration patterns - Generate reports:
./generate-report.sh- Creates consolidated statistics reports
- Filter PRs:
./filter-prs.sh input.json- Filters PR data by Jenkins-related criteria - Group PRs:
./group-prs.sh input.json plugins.json- Groups PRs by repository/plugin - Count PRs:
./count_prs.sh repos.txt year- Counts PRs for specific repositories - Process PRs:
./process_prs.sh- General PR data processing pipeline
- Build analyzer:
(cd github-profile-tools && go build -o ../github-user-analyzer ./cmd/github-user-analyzer)- Builds the GitHub profile analyzer binary - Run tests:
(cd github-profile-tools && go test ./...)- Runs unit test suite - Test specific package:
(cd github-profile-tools && go test -v ./internal/cache)- Tests specific package with verbose output - Analyze user:
./github-user-analyzer -user=username- Generates comprehensive GitHub profile analysis with all templates by default - Analyze with specific template:
./github-user-analyzer -user=username -template=resume- Generates profile with specific template (resume, technical, executive, ats) - Analyze with custom usernames:
./github-user-analyzer -user=username -docker-user=dockerhub_user -discourse-user=discourse_user- Uses separate usernames for different platforms - Cache management:
./github-user-analyzer -cache-stats- Shows cache statistics,./github-user-analyzer -clear-cache- Clears cache,./github-user-analyzer -force-refresh- Forces refresh ignoring cache - Analyze with token:
./github-user-analyzer -user=username -token="$GITHUB_TOKEN"- Uses explicit GitHub token for API access
- jenkins-pr-collector.go - Main Go application that queries GitHub GraphQL API to collect PR data
- Shell scripts ecosystem - Bash scripts orchestrate data collection, processing, and reporting
- Python integration - Handles Google Sheets uploads and data processing via
upload_to_sheets.py - GitHub Actions workflows - Automated scheduling and execution in
.github/workflows/ - GitHub Profile Tools - Standalone Go application for analyzing GitHub user profiles and generating professional documentation
- Collection:
jenkins-pr-collector.gofetches PR data via GitHub GraphQL API - Processing: Scripts filter, group, and transform raw PR data
- Storage: Data stored in
data/directory (monthly/, consolidated/, archive/) - Reporting: Processed data uploaded to Google Sheets and stored as JSON artifacts
- Raw PR data: JSON files containing GitHub PR objects with metadata
- Filtered data: PRs filtered by criteria (Jenkins-related, specific time periods)
- Grouped data: PRs organized by plugin/repository for analysis
- Build results: CSV files with plugin build status and JDK compatibility
- GitHub API: Requires
GITHUB_TOKENorPAT_TOKENenvironment variable with repo, read:org, read:user scopes - Google Sheets: Requires
GOOGLE_CREDENTIALSJSON service account file (set via environment or file path) - Rate limiting: Built-in exponential backoff and retry mechanisms for both GitHub and Google APIs
- pr-stats.yml: Monthly collection (2nd of month at 00:00 UTC) and daily updates (midnight UTC)
- test-jdk-25.yml: Weekly JDK 25 compatibility testing (Tuesdays at 00:00 UTC) with 6-hour timeout
- updatecli.yml: Daily updates to top-250-plugins.csv from upstream Jenkins data
- auto-merge-bot-prs.yml: Automatically merges bot-created PRs with "automation" label
- pr-collector-test.yml: Weekly testing of PR collector functionality (Tuesdays at 07:18 UTC)
- generate-top-plugins.yml: Updates plugin popularity data
- run-update-daily-on-merge.yml: Triggers daily update when changes are merged
- release-github-profile-tools.yml: Automated releases of GitHub Profile Tools with cross-platform binaries
The GitHub Profile Tools binary will be automatically released through GitHub Actions:
- Trigger: Tag-based releases using semantic versioning (v1.0.0, v1.1.0, etc.)
- Platforms: Cross-platform binaries for Windows (x64), Linux (x64, ARM64), and macOS (x64, ARM64)
- Artifacts: Compressed binaries with checksums for verification
- Changelog: Auto-generated release notes from commit messages and PR titles
All builds use ubuntu-latest with Go cross-compilation:
- Windows x64:
GOOS=windows GOARCH=amd64→github-user-analyzer-windows-amd64.exe - Linux x64:
GOOS=linux GOARCH=amd64→github-user-analyzer-linux-amd64 - Linux ARM64:
GOOS=linux GOARCH=arm64→github-user-analyzer-linux-arm64 - macOS x64:
GOOS=darwin GOARCH=amd64→github-user-analyzer-darwin-amd64 - macOS ARM64:
GOOS=darwin GOARCH=arm64→github-user-analyzer-darwin-arm64
- Tag Creation: Create and push a version tag (e.g.,
git tag v1.0.0 && git push origin v1.0.0) - Automated Build: GitHub Action builds binaries for all supported platforms
- Testing: Run basic smoke tests on each binary
- Packaging: Compress binaries and generate SHA256 checksums
- Release: Create GitHub release with auto-generated changelog and download links
- Notification: Optional notifications to relevant channels
data/monthly/- Monthly PR collection files (prs_YYYY_MM.json, filtered_, grouped_)data/consolidated/- Aggregated data across time periodsdata/archive/- Older data files (>6 months)data/junit5/- JUnit 5 migration analysis datadata/profiles/- Generated GitHub profile analyses and templatesdata/cache/- Cached analysis data for efficient template regeneration and incremental updatesdata/progress/- Temporary progress files for resuming interrupted analysesgithub-profile-tools/- GitHub profile analyzer Go applicationcmd/github-user-analyzer/- Main CLI application entry pointinternal/github/- GitHub API client with exponential backoff and retry logic (8 attempts)internal/profile/- Profile analysis logic with incremental processing (50 repos per page)internal/cache/- File-based cache system with TTL, compression, and thread-safetyinternal/docker/- Docker Hub integration and expertise scoringinternal/discourse/- Discourse community engagement analysisinternal/markdown/- Template generator for multiple profile formatstemplates/- Profile generation templates (resume, technical, executive, ats)
updatecli/- Updatecli configuration for dependency updates.github/workflows/- GitHub Actions automation workflows
- Raw PR data:
prs_YYYY_MM.json- Complete GitHub PR objects from GraphQL API - Filtered data:
filtered_prs_YYYY_MM.json- Jenkins-related PRs only - Grouped data:
grouped_prs_YYYY_MM.json- PRs organized by repository/plugin - JDK compatibility:
jdk-25-build-results.csv- Plugin build results with JDK 25 - Plugin list:
top-250-plugins.csv- Most popular Jenkins plugins for testing
- Scripts use
set -efor immediate exit on errors - Rate limiting handled with exponential backoff for GitHub and Google APIs
- Partial data preservation during collection failures with resume capability
- Comprehensive logging to stdout/stderr and dedicated log files
- Debug logging to
build-debug.log,fetch_prs_debug.log, and other log files
- File-based cache with gzip compression and JSON serialization
- TTL support with configurable expiration (default 24 hours)
- Thread-safe operations with proper mutex locking
- Cache key types: profile, repositories, organizations, contributions, languages, skills
- Cache management: Statistics, clearing, force refresh, and invalidation by user
- Storage location:
data/cache/with files named by cache key type and username
- Page-by-page processing: Repositories fetched in batches of 50 to handle large profiles
- Progress saving: Analysis state saved after each major step for resumability
- Graceful degradation: Continues with partial data if API calls fail
- Progress files: Stored in
data/progress/for interrupted analysis resumption - Analysis steps: 8 sequential steps from basic info → Docker/Discourse → insights generation
- Retry logic: Up to 8 attempts with exponential backoff and jitter
- Retryable errors: Infrastructure failures (502, 503, 504), rate limits, timeouts, stream cancellations
- Rate limit handling: Respects GitHub API rate limits with automatic backoff
- Context support: All API calls respect context timeouts (default 6 hours for large analyses)
- Docker Hub: Fetches public repositories, pull counts, expertise scoring (0-10 scale)
- Docker expertise: Proficiency levels from beginner to expert based on usage patterns
- Docker file detection: Scans for Dockerfile, docker-compose.yml, docker-bake.hcl, .dockerignore
- Discourse: Jenkins community engagement analysis with separate username support
- Optional platforms: Both Docker Hub and Discourse are optional and gracefully handle missing data
- Comprehensive unit tests for cache system covering concurrency, expiration, data integrity
- String conversion: Use
strconv.Itoa()instead ofstring(rune())to avoid control character injection - Test coverage: Critical paths tested including error scenarios and edge cases
- Test location:
github-profile-tools/internal/cache/*_test.go,internal/profile/*_test.go
- Scoped cache keys: Prevents cache poisoning when same GitHub user has different Docker/Discourse usernames
- Scope format:
"docker:{dockerUsername},discourse:{discourseUsername}"appended to cache key - Scope conditions: Only applied when Docker/Discourse usernames differ from GitHub username
- Implementation:
GetUserProfileKeyWithScope(),GetUserProfileWithCustomUsernames(),SetUserProfileWithCustomUsernames() - Cache invalidation:
DeleteByPrefix()removes all scoped variants when invalidating user cache- Pattern:
"profile_username"matches"profile_username"and"profile_username_scope:..." - Ensures
-force-refreshand cache clearing work with scoped keys
- Pattern:
- Files:
internal/cache/manager.go,internal/cache/storage.go,internal/profile/cache.go
- Code duplication: Refactor
runAnalysis()andrunAnalysisWithCache()incmd/github-user-analyzer/main.go- Both functions share similar structure and logic
- Extract common analysis workflow into shared helper function
- Reduce maintenance burden and improve code clarity
- Identified in CodeRabbit review comment
- Progress file cache key alignment: Ensure progress files use same scoped key format as cache
- Store
dockerUsernameanddiscourseUsernameinProgressDatastruct - Validate on resume that usernames match requested analysis
- Prevents resuming with wrong Docker/Discourse username data
- Identified in CodeRabbit review comment on
analyzer.go:1182-1246
- Store