Skip to content

Commit 6a942b2

Browse files
committed
Prioritize review fetching in cron scraper to prevent missing approvals
PROBLEM: PR #25042 and 115 other PRs were missing from the database because the scraper was timing out before completing review fetching. The old flow was: 1. Fetch all open PRs from GitHub 2. Create/update PR records 3. Scrape CI checks for each PR (SLOW - times out here) 4. Fetch reviews for all PRs (never reached if timeout) This meant new PRs had no approval data even though they existed on GitHub. SOLUTION: - Increased timeout from 12 to 13 minutes for more processing time - Moved review fetching INSIDE the PR processing loop - Reviews now fetched immediately after CI checks for each PR - Even if scraper times out, all processed PRs will have approval data IMPACT: Prevents "missing approval" issues like PR #25042 which had an approval from PhilipDeFraties 3 days ago but showed "None" in dashboard. Changes: - scripts/render_cron_scraper_fixed.rb:245: max_runtime 12min -> 13min - scripts/render_cron_scraper_fixed.rb:303-327: Added inline review fetching - scripts/render_cron_scraper_fixed.rb:347-377: Removed redundant Step 5
1 parent 0dd2cf7 commit 6a942b2

1 file changed

Lines changed: 30 additions & 35 deletions

File tree

scripts/render_cron_scraper_fixed.rb

Lines changed: 30 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@
242242
total_prs = pr_scope.count
243243
processed = 0
244244
start_time = Time.now
245-
max_runtime = 12 * 60 # 12 minutes max runtime (leaving buffer for Render's 15 min limit)
245+
max_runtime = 13 * 60 # 13 minutes max runtime (leaving buffer for Render's 15 min limit)
246246

247247
pr_scope.find_in_batches(batch_size: 5) do |batch| # Reduced batch size
248248
# Check if we're approaching the time limit
@@ -300,6 +300,32 @@
300300
)
301301
end
302302

303+
# PRIORITY: Fetch reviews immediately after checks so PR has approvals even if we timeout later
304+
begin
305+
reviews = github_service.pull_request_reviews(pr.number)
306+
307+
# Update reviews
308+
reviews.each do |review_data|
309+
PullRequestReview.find_or_create_by(
310+
pull_request_id: pr.id,
311+
github_id: review_data.id
312+
).update!(
313+
user: review_data.user.login,
314+
state: review_data.state,
315+
submitted_at: review_data.submitted_at
316+
)
317+
end
318+
319+
# Update statuses - Always update approval summary
320+
pr.update_backend_approval_status!
321+
pr.update_ready_for_backend_review!
322+
pr.update_approval_status!
323+
324+
logger.info " ✓ Updated reviews and approvals for PR ##{pr.number}"
325+
rescue => review_error
326+
logger.error " ✗ Error fetching reviews for PR ##{pr.number}: #{review_error.message}"
327+
end
328+
303329
scrape_success += 1
304330

305331
# Reduced delay between requests
@@ -317,40 +343,9 @@
317343
end
318344

319345
logger.info "Scraped #{scrape_success} PRs successfully, #{scrape_errors} errors"
346+
logger.info "Note: Reviews are now fetched inline during check scraping for better reliability"
320347

321-
# Step 5: Update PR reviews and approval statuses
322-
logger.info "Updating PR reviews and approval statuses..."
323-
review_errors = 0
324-
325-
pr_scope.find_each do |pr|
326-
begin
327-
# Fetch reviews
328-
reviews = github_service.pull_request_reviews(pr.number)
329-
330-
# Update reviews
331-
reviews.each do |review_data|
332-
PullRequestReview.find_or_create_by(
333-
pull_request_id: pr.id,
334-
github_id: review_data.id
335-
).update!(
336-
user: review_data.user.login,
337-
state: review_data.state,
338-
submitted_at: review_data.submitted_at
339-
)
340-
end
341-
342-
# Update statuses - Always update approval summary
343-
pr.update_backend_approval_status!
344-
pr.update_ready_for_backend_review!
345-
pr.update_approval_status!
346-
347-
rescue => e
348-
logger.error "Error updating reviews for PR ##{pr.number}: #{e.message}"
349-
review_errors += 1
350-
end
351-
end
352-
353-
# Step 6: Update cache with completion time
348+
# Step 5: Update cache with completion time
354349
Rails.cache.write('last_refresh_time', Time.current)
355350

356351
# Clean up old webhook events if they exist
@@ -369,7 +364,7 @@
369364
logger.info "Total time: #{total_time.round(2)} seconds"
370365
logger.info "API calls used: #{api_calls_used}"
371366
logger.info "Remaining API calls: #{final_rate_limit.remaining}/#{final_rate_limit.limit}"
372-
logger.info "Review errors: #{review_errors}"
367+
logger.info "PRs processed: #{scrape_success} success, #{scrape_errors} errors"
373368
logger.info "=" * 60
374369

375370
# Run daily metrics capture once per day between 7-8am UTC (when this job runs at 7am)

0 commit comments

Comments
 (0)