Upgrade filter #41

kyleheadley · 2021-09-21T17:17:42Z

Add a filter database with categories, tags, regex, and notes (for github issue links or whatever).

Includes summary text at the end of the output.

demonstrated here, using special case on vsftpd to force pass while the other tests show standard results.

kyleheadley · 2021-09-21T17:24:21Z

I have no attachment to the content of the database. So I'd prefer that not be reviewed here. But also I don't mind if it's changed significantly, either as a gihub "suggestion" commit here or as a push later.

mattmccutchen-cci

Thanks for driving this feature forward; as we know, it will be helpful to guide development of 3C. Here's one round of review: I have some concerns about the design/behavior as well as a bunch of smaller style concerns.

I have no attachment to the content of the database. So I'd prefer that not be reviewed here. But also I don't mind if it's changed significantly, either as a gihub "suggestion" commit here or as a push later.

Well, if the database is going to be merged, it needs to be reviewed like anything else. If you're not attached to it, one option is to put a smaller database in this PR that we can easily review and save the full database in another branch to review later.

filter-errors.py

mattmccutchen-cci · 2021-09-22T18:39:43Z

generate-workflow.py

+ 2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
+        '''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
+        '''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''


Using a multi-line f-string and doubling the literal curly braces (see examples earlier in this file) will probably reduce the mess compared to this string concatenation.

If feasible, find and add an example of a benchmark-specific filter rule to demonstrate how that feature is intended to be used and that it works correctly.

Please ensure that the formatting of the code you changed complies with yapf. If you like, you could try one of the tools here to format only the modified lines; if so, please share your experience for our future use. Or you could just format the whole file (yapf3 -i generate-workflow.py); I know that will make some unrelated changes to code that other people previously added with wrong formatting, but if you can tolerate that, it might be the most expedient solution.

To reduce clutter here, consider passing only the benchmark name and having filter-errors.py find the full file paths based on sys.argv[0].

Do you have any suggestions? Dynamic code creation through string-manipulation with multiple special-cases is far from my strength as a coder, and python is not a language I use often.

mattmccutchen-cci · 2021-09-22T18:43:21Z

benchmark_errors.csv

+category,tag,regex,note
+bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",
+bounds,unmet_bounds,"^argument does not meet declared bounds for (.*) parameter$",
+error,ifdef,"^#endif without #if$",


As I mentioned on Friday, a potential alternative to using custom regexes that might be less error-prone would be to redact the errors in a standard way as summarize-benchmark-errors.py does and then match specific redacted error templates. Document why you chose the custom-regex approach. (I know I used a custom regex prior to this PR, but this deserves reconsideration before we scale up to a large number of rules.) I'm guessing the reason is that you want to be able to group error templates that differ in unimportant ways (e.g., use of undeclared identifier '<OMITTED>' with and without did you mean '<OMITTED>'?) or write rules that are narrower than an error template (e.g., matching individual program elements in a benchmark that are causing problems for 3C without ignoring instances of the same error template elsewhere in the benchmark).

Many of the tags you have chosen are very terse, and we're going to have to look at this file all the time to remind ourselves of what they mean; Mike already started to experience this at the meeting on Friday. Please make them more descriptive; I don't think making them a bit longer is going to make the output significantly harder to read. You could draw inspiration from the internal Clang diagnostic kind names in clang/include/clang/Basic/DiagnosticSemaKinds.td, making adjustments as appropriate if tags do not correspond one-to-one to Clang diagnostic kinds.

What is the value of going ahead and adding a bunch error entries now as opposed to doing it as we triage the errors and link them to GitHub issues? Just having the summary in the logs for each benchmark rather than having to use summarize-benchmark-errors.py even when we only care about one benchmark?

filter-errors.py

mattmccutchen-cci · 2021-09-23T13:37:56Z

benchmark_errors.csv

@@ -0,0 +1,39 @@
+category,tag,regex,note
+bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",


We should verify that the errors designated here as "known bounds inference limitations" actually are limitations that we don't expect to fix. Ideally, we would find example code for each error message where the error is clearly caused by such a limitation; the code could be part of one of our benchmarks or could be a simple handwritten example. But I would also be open to trusting the judgment of a second person if it would get the review done faster. Is this something I should study up and do myself, or would it be more efficient for someone else to do it (maybe John or Aravind)?

This seems well beyond the scope of a simple filter script.

Plan from the 2021-09-23 meeting: Split this PR into two PRs, the first with the script and a small example database and the second with the full database, more or less as I previously suggested.

mattmccutchen-cci · 2021-09-23T13:44:15Z

generate-workflow.py

- 2>&1 | ${{github.workspace}}/depsfolder/actions/filter-bounds-inference-errors.py'''
+ 2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
+        '''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
+        '''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''
                      if variant.alltypes else '')


Since the new filter will cover some known errors unrelated to bounds inference limitations, it needs to run even when variant.alltypes is off, but the filter should accept errors due to bounds inference limitations only when variant.alltypes is on. Maybe the best way to achieve this is to pass a flag to filter-errors.py that controls whether the bounds category is considered acceptable?

Wasn't the original intent that non-alltypes runs would be without compile error altogether? So there's no reason to filter them. If we change this policy (which we may, as Mike suggested recently), I agree we should reevaluate the usage of the filter.

As discussed at the 2021-09-23 meeting, both the alltypes and non-alltypes variants may be affected by known 3C bugs that we want to temporarily accept in order to have a useful pass/fail status. While it's possible that some bugs could affect one variant and not the other, for simplicity, we'll use the union of the accepted errors for now and set up separate filters if/when it seems to be worth it. However, the known bounds inference errors should still be accepted only in the alltypes variant; if they somehow occur in non-alltypes, we want to know.

filter-errors.py

mattmccutchen-cci

Here are responses to some of the threads that I think are more important right now. I'll come back to the rest later.

mattmccutchen-cci · 2021-09-23T17:51:24Z

benchmark_errors.csv

@@ -0,0 +1,39 @@
+category,tag,regex,note
+bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",


Plan from the 2021-09-23 meeting: Split this PR into two PRs, the first with the script and a small example database and the second with the full database, more or less as I previously suggested.

mattmccutchen-cci · 2021-09-23T17:53:20Z

filter-errors.py

+print()
+print('Encountered errors:')
+for e in error_list:
+    if (e['tag'] in output_tags) or (seen_tags[e['tag']] == 0): continue


Under my first proposal, we can still override a tag's category as long as the added rule has the same regex. Under my second proposal, we can achieve the same effect by using a different tag name, which might be helpful for documentation anyway. So let's try for some way to avoid the confusion.

mattmccutchen-cci · 2021-09-23T18:36:38Z

generate-workflow.py

- 2>&1 | ${{github.workspace}}/depsfolder/actions/filter-bounds-inference-errors.py'''
+ 2>&1 | ${{github.workspace}}/depsfolder/actions/filter-errors.py ''' +
+        '''${{github.workspace}}/depsfolder/actions''' + f'''/{binfo.name}_errors.csv ''' +
+        '''${{github.workspace}}/depsfolder/actions/benchmark_errors.csv'''
                      if variant.alltypes else '')


As discussed at the 2021-09-23 meeting, both the alltypes and non-alltypes variants may be affected by known 3C bugs that we want to temporarily accept in order to have a useful pass/fail status. While it's possible that some bugs could affect one variant and not the other, for simplicity, we'll use the union of the accepted errors for now and set up separate filters if/when it seems to be worth it. However, the known bounds inference errors should still be accepted only in the alltypes variant; if they somehow occur in non-alltypes, we want to know.

kyleheadley added 11 commits September 16, 2021 15:54

filter uses database of errors, outputs stats

78c8fda

reverse priority: first then break

8dbcacc

fill out database

251300f

take databases as command-line args

eea1648

add database files as parameters and regen workflow

8e09e2b

correct database path

71e23f2

regenerate workflow

0b87f7d

switch database to category model

baa5618

create temp file to test specific benchmark cases

4c2df54

remove temp test file

3cce2cc

reneam

76eed36

kyleheadley requested a review from mattmccutchen-cci September 21, 2021 17:17

mattmccutchen-cci requested changes Sep 23, 2021

View reviewed changes

kyleheadley added 2 commits September 23, 2021 12:18

remaning and comments

609deb4

add starter vsftpd-specific filter rules

f981217

mattmccutchen-cci reviewed Sep 23, 2021

View reviewed changes

kyleheadley added 12 commits September 28, 2021 10:57

keep lines in order

6a5bd2a

mark vsftpd fp errors as inprogress

f7ae51a

don't ignore final error

d42e4d4

include filter for non-alltypes

be64d80

remove end tags from regexes and complete vsftpd

038cd64

acknowledge more errors from git issues

b46eafe

add second issue for vsftpd bug

71f7209

more tweaks from benchmark analysis

bfd480e

finish more benchmarks

7cf8a5f

all benchmarks (non-macro,-alltypes) pass

8d8dfdd

fix alternate versions

be87e12

tweeks

33d66a6

		@@ -0,0 +1,39 @@
		category,tag,regex,note
		bounds,expr_unknown_bounds,"^expression has unknown bounds(.*)$",

Upgrade filter #41

Are you sure you want to change the base?

Upgrade filter #41

Uh oh!

Conversation

kyleheadley commented Sep 21, 2021

Uh oh!

kyleheadley commented Sep 21, 2021

Uh oh!

mattmccutchen-cci left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattmccutchen-cci left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!