feature: improvements to CI failure monitor #16272

dougyster · 2026-01-01T16:25:56Z

Motivation

Improve the quality of the failure monitor by showing test level failures for each job failure, as well as other fixes and design improvements.

Modifications

ci_failures_analysis.py

Accuracy Tests

(https://github.com/sgl-project/sglang/actions/runs/20641823449)

Benchmarking and Profiling

N/A

Checklist

[Done] Format your code according to the Format code with pre-commit.
[Done] Add unit tests according to the Run and add unit tests.
[Done] Update documentation according to Write documentations.
[Done] Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
[Done] Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-01T16:26:13Z

Summary of Changes

Hello @dougyster, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the CI failure monitoring system by introducing the capability to analyze failures at the individual test file level. It enhances the existing job-level failure tracking with more detailed insights, allowing developers to quickly pinpoint the root cause of CI instability. The changes include fetching job logs, parsing test summaries, and presenting this granular information in an improved, interactive GitHub summary report, making it easier to identify and address persistent test failures.

Highlights

Detailed Test-Level Failure Analysis: Introduced new functionality to fetch and parse job logs to identify specific failing test files within a CI job. This includes extracting test summaries, counting passed/total tests, and listing individual failed Python test files.
Enhanced Failure Tracking: Implemented logic to track total failures and current consecutive failure streaks for individual test files. This allows for more granular identification of flaky or consistently failing tests.
Improved Reporting for Broken Jobs: Added a new analysis step that focuses on jobs with existing failure streaks (>=2) or high failure rates (>=50%). For these critical jobs, the system now retrieves and presents detailed test-level failure information directly in the report.
Expanded Run History Tracking: Increased the number of recent job runs tracked for analysis from 5 to 10, providing a broader historical context for job and test stability.
Refined GitHub Summary Output: The GitHub Actions summary now includes collapsible sections for detailed test failures within problematic jobs, highlighting tests with high streaks and providing direct links to job logs. The report structure has also been adjusted, moving runner health to a more prominent position and introducing distinct categories for jobs with high intermittent failure rates.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly enhances the CI failure monitor by introducing test-level failure analysis. It adds functionality to fetch and parse job logs to identify specific failing tests within a job, track their failure streaks, and report them in a new collapsible section in the GitHub Actions summary. The reporting has been refactored from console output to a more detailed and user-friendly markdown format. The changes are well-structured, but there are a few opportunities to improve code quality by reducing duplication and adhering to Python best practices, which I've detailed in my comments.

gemini-code-assist · 2026-01-01T16:28:22Z