Skip to content

Conversation

@Dayuxiaoshui
Copy link
Contributor

This patch fixes issue #18423 where meta_schedule.tune_tir crashes during initial population sampling when RewriteParallelVectorizeUnroll postprocessor encounters blocks that violate compact dataflow requirements.

The crash occurred when:

  • A block reads and writes to the same buffer
  • RewriteParallelVectorizeUnroll tries to parallelize/vectorize these loops
  • ScheduleError is thrown and propagates through parallel_for_dynamic

Solution:

  • Added exception handling in RewriteParallelVectorizeUnroll::Apply to catch ScheduleError and skip problematic blocks instead of crashing
  • Added exception handling in ThreadedTraceApply::Apply to catch exceptions from postprocessors and treat them as normal failures

This makes the tuning process more robust by gracefully handling schedule errors instead of crashing the entire tuning session.

…elVectorizeUnroll

This patch fixes issue apache#18423 where meta_schedule.tune_tir crashes during
initial population sampling when RewriteParallelVectorizeUnroll postprocessor
encounters blocks that violate compact dataflow requirements.

The crash occurred when:
- A block reads and writes to the same buffer
- RewriteParallelVectorizeUnroll tries to parallelize/vectorize these loops
- ScheduleError is thrown and propagates through parallel_for_dynamic

Solution:
- Added exception handling in RewriteParallelVectorizeUnroll::Apply to catch
  ScheduleError and skip problematic blocks instead of crashing
- Added exception handling in ThreadedTraceApply::Apply to catch exceptions
  from postprocessors and treat them as normal failures

This makes the tuning process more robust by gracefully handling schedule
errors instead of crashing the entire tuning session.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Dayuxiaoshui, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the stability and resilience of the MetaSchedule tuning process. It specifically targets and resolves a critical crash that occurred when the RewriteParallelVectorizeUnroll postprocessor encountered un-schedulable blocks, such as those with conflicting read/write operations on the same buffer. By introducing robust exception handling within both the postprocessor itself and the ThreadedTraceApply utility, the system can now gracefully manage and log these scheduling failures, allowing the tuning session to continue without interruption, thereby enhancing the overall user experience and reliability.

Highlights

  • Crash Fix: Addresses a crash in meta_schedule.tune_tir during initial population sampling.
  • Root Cause: The crash was due to RewriteParallelVectorizeUnroll encountering blocks that violate compact dataflow requirements, specifically when a block reads and writes to the same buffer, leading to a ScheduleError.
  • Solution in Postprocessor: Implemented try-catch blocks in RewriteParallelVectorizeUnroll::Apply to catch ScheduleError and skip problematic blocks, preventing crashes.
  • Robustness in Utility: Added broader exception handling in ThreadedTraceApply::Apply to catch any ScheduleError or std::exception from postprocessors, treating them as graceful failures instead of crashes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash in meta_schedule.tune_tir by introducing exception handling for tir::ScheduleError. The changes in RewriteParallelVectorizeUnroll::Apply correctly catch scheduling errors, log them, and skip problematic blocks, which enhances the robustness of the tuning process. Similarly, ThreadedTraceApply::Apply is updated to handle exceptions from postprocessors, treating them as failures without crashing. The changes are logical and directly address the issue described. I have one suggestion for src/meta_schedule/utils.h to refactor the error handling logic for better code clarity and maintainability.

Previously, catching ScheduleError and continuing could leave the schedule
partially mutated if operations failed mid-sequence. This change ensures that
when a ScheduleError occurs, the postprocessor returns false immediately,
causing the candidate to be discarded safely rather than continuing with
a potentially corrupted schedule state.

This matches the error handling pattern used in ThreadedTraceApply::Apply
and addresses the safety concern about partial mutations.
@Dayuxiaoshui
Copy link
Contributor Author

@tlopex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants