Skip to content

Upgrade filter #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Upgrade filter #41

wants to merge 25 commits into from

Conversation

kyleheadley
Copy link

Add a filter database with categories, tags, regex, and notes (for github issue links or whatever).

Includes summary text at the end of the output.

demonstrated here, using special case on vsftpd to force pass while the other tests show standard results.

@kyleheadley
Copy link
Author

I have no attachment to the content of the database. So I'd prefer that not be reviewed here. But also I don't mind if it's changed significantly, either as a gihub "suggestion" commit here or as a push later.

Copy link
Collaborator

@mattmccutchen-cci mattmccutchen-cci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this feature forward; as we know, it will be helpful to guide development of 3C. Here's one round of review: I have some concerns about the design/behavior as well as a bunch of smaller style concerns.

I have no attachment to the content of the database. So I'd prefer that not be reviewed here. But also I don't mind if it's changed significantly, either as a gihub "suggestion" commit here or as a push later.

Well, if the database is going to be merged, it needs to be reviewed like anything else. If you're not attached to it, one option is to put a smaller database in this PR that we can easily review and save the full database in another branch to review later.

Comment on lines 541 to 543
2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
'''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
'''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Using a multi-line f-string and doubling the literal curly braces (see examples earlier in this file) will probably reduce the mess compared to this string concatenation.
  2. If feasible, find and add an example of a benchmark-specific filter rule to demonstrate how that feature is intended to be used and that it works correctly.
  3. Please ensure that the formatting of the code you changed complies with yapf. If you like, you could try one of the tools here to format only the modified lines; if so, please share your experience for our future use. Or you could just format the whole file (yapf3 -i generate-workflow.py); I know that will make some unrelated changes to code that other people previously added with wrong formatting, but if you can tolerate that, it might be the most expedient solution.
  4. To reduce clutter here, consider passing only the benchmark name and having filter-errors.py find the full file paths based on sys.argv[0].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions? Dynamic code creation through string-manipulation with multiple special-cases is far from my strength as a coder, and python is not a language I use often.

category,tag,regex,note
bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",
bounds,unmet_bounds,"^argument does not meet declared bounds for (.*) parameter$",
error,ifdef,"^#endif without #if$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As I mentioned on Friday, a potential alternative to using custom regexes that might be less error-prone would be to redact the errors in a standard way as summarize-benchmark-errors.py does and then match specific redacted error templates. Document why you chose the custom-regex approach. (I know I used a custom regex prior to this PR, but this deserves reconsideration before we scale up to a large number of rules.) I'm guessing the reason is that you want to be able to group error templates that differ in unimportant ways (e.g., use of undeclared identifier '<OMITTED>' with and without did you mean '<OMITTED>'?) or write rules that are narrower than an error template (e.g., matching individual program elements in a benchmark that are causing problems for 3C without ignoring instances of the same error template elsewhere in the benchmark).
  2. Many of the tags you have chosen are very terse, and we're going to have to look at this file all the time to remind ourselves of what they mean; Mike already started to experience this at the meeting on Friday. Please make them more descriptive; I don't think making them a bit longer is going to make the output significantly harder to read. You could draw inspiration from the internal Clang diagnostic kind names in clang/include/clang/Basic/DiagnosticSemaKinds.td, making adjustments as appropriate if tags do not correspond one-to-one to Clang diagnostic kinds.
  3. What is the value of going ahead and adding a bunch error entries now as opposed to doing it as we triage the errors and link them to GitHub issues? Just having the summary in the logs for each benchmark rather than having to use summarize-benchmark-errors.py even when we only care about one benchmark?

@@ -0,0 +1,39 @@
category,tag,regex,note
bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify that the errors designated here as "known bounds inference limitations" actually are limitations that we don't expect to fix. Ideally, we would find example code for each error message where the error is clearly caused by such a limitation; the code could be part of one of our benchmarks or could be a simple handwritten example. But I would also be open to trusting the judgment of a second person if it would get the review done faster. Is this something I should study up and do myself, or would it be more efficient for someone else to do it (maybe John or Aravind)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems well beyond the scope of a simple filter script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan from the 2021-09-23 meeting: Split this PR into two PRs, the first with the script and a small example database and the second with the full database, more or less as I previously suggested.

2>&1 | ${{github.workspace}}/depsfolder/actions/filter-bounds-inference-errors.py'''
2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
'''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
'''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''
if variant.alltypes else '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new filter will cover some known errors unrelated to bounds inference limitations, it needs to run even when variant.alltypes is off, but the filter should accept errors due to bounds inference limitations only when variant.alltypes is on. Maybe the best way to achieve this is to pass a flag to filter-errors.py that controls whether the bounds category is considered acceptable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the original intent that non-alltypes runs would be without compile error altogether? So there's no reason to filter them. If we change this policy (which we may, as Mike suggested recently), I agree we should reevaluate the usage of the filter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed at the 2021-09-23 meeting, both the alltypes and non-alltypes variants may be affected by known 3C bugs that we want to temporarily accept in order to have a useful pass/fail status. While it's possible that some bugs could affect one variant and not the other, for simplicity, we'll use the union of the accepted errors for now and set up separate filters if/when it seems to be worth it. However, the known bounds inference errors should still be accepted only in the alltypes variant; if they somehow occur in non-alltypes, we want to know.

Copy link
Collaborator

@mattmccutchen-cci mattmccutchen-cci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are responses to some of the threads that I think are more important right now. I'll come back to the rest later.

@@ -0,0 +1,39 @@
category,tag,regex,note
bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan from the 2021-09-23 meeting: Split this PR into two PRs, the first with the script and a small example database and the second with the full database, more or less as I previously suggested.

filter-errors.py Outdated
print()
print('Encountered errors:')
for e in error_list:
if (e['tag'] in output_tags) or (seen_tags[e['tag']] == 0): continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under my first proposal, we can still override a tag's category as long as the added rule has the same regex. Under my second proposal, we can achieve the same effect by using a different tag name, which might be helpful for documentation anyway. So let's try for some way to avoid the confusion.

2>&1 | ${{github.workspace}}/depsfolder/actions/filter-bounds-inference-errors.py'''
2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
'''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
'''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''
if variant.alltypes else '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed at the 2021-09-23 meeting, both the alltypes and non-alltypes variants may be affected by known 3C bugs that we want to temporarily accept in order to have a useful pass/fail status. While it's possible that some bugs could affect one variant and not the other, for simplicity, we'll use the union of the accepted errors for now and set up separate filters if/when it seems to be worth it. However, the known bounds inference errors should still be accepted only in the alltypes variant; if they somehow occur in non-alltypes, we want to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants