Skip to content

Fix test history cron job crash and optimize performance#4942

Open
DanielRyanSmith wants to merge 1 commit into
mainfrom
test-history-optimizations
Open

Fix test history cron job crash and optimize performance#4942
DanielRyanSmith wants to merge 1 commit into
mainfrom
test-history-optimizations

Conversation

@DanielRyanSmith

Copy link
Copy Markdown
Contributor

Overview

This PR fixes a ValueError crash in the test history cron job and implements major performance optimizations (caching, parallelization, and checkpointing) to speed up processing and catch up on months of missing history.

Root Cause / Motivation

  • Bug: The cron job crashed when the WPT API returned a start time without microseconds (e.g. 2025-10-02T15:34:39Z), because the script strictly expected %Y-%m-%dT%H:%M:%S.%fZ.
  • Performance: The script was very slow because it downloaded and uploaded large (~10MB) JSON status files from GCS for each of the 4 browsers sequentially, for every single revision. During a catch-up run, this network overhead was prohibitive.

Detailed Changelog

  • process_test_history.py:
    • Added _parse_datetime helper to support datetime parsing with/without microseconds.
    • Implemented global in-memory cache _prev_test_statuses_cache for GCS status files.
    • Refactored process_single_run to run within thread-local NDB contexts and use the GCS cache.
    • Parallelized browser processing using ThreadPoolExecutor (4 workers).
    • Implemented deterministic keys for TestHistoryEntry based on SHA-256 hashes of test names to ensure idempotency.
    • Implemented checkpointing in main loop to commit date and flush GCS cache every 20 revisions.
    • Increased Datastore batch write size to 500.
    • Added --force CLI flag to bypass empty Datastore check when manually setting start date.
    • Pre-compiled whitespace regex for faster string substitution.
    • Printed main() return value on exit.

- Fix ValueError crash in get_aligned_run_info by supporting datetime strings both with and without microseconds (fallback to %Y-%m-%dT%H:%M:%SZ).
- Implement in-memory caching for GCS recent statuses to avoid redundant downloads/uploads, reducing GCS traffic by ~90%.
- Parallelize browser processing (Chrome, Edge, Firefox, Safari) using ThreadPoolExecutor with thread-local NDB contexts.
- Implement deterministic key names for TestHistoryEntry using SHA-256 hashes of test names to ensure idempotency and prevent duplicate entries on retries.
- Implement checkpointing to commit processed date to Datastore and flush cached GCS statuses only every 20 revisions.
- Increase Datastore batch write size from 200 to 500 to optimize write throughput.
- Add --force CLI flag to allow manual start date override when Datastore is not empty.
- Pre-compile regular expressions to optimize whitespace substitution in loops.
- Print main() return value on exit to show timeout/completion status.

TAG=agy
CONV=9096c6ce-d7f3-4a97-aa8d-31e76c7337c5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant