Skip to content

One clang pass #488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 23, 2021
Merged

One clang pass #488

merged 15 commits into from
Mar 23, 2021

Conversation

kyleheadley
Copy link
Member

@kyleheadley kyleheadley commented Mar 15, 2021

This PR changes 3C architecture at the top level. Instead of calling clang to generate an ASTs for each files for each step, we generate all ASTs once and reuse them. This avoids problems of AST differences between runs, and gives a little speed boost, but at the cost of extra memory usage, and disabling of the Diagnostic Verifier.

Without the Diagnostic Verifier we have some tests that need to be refactored, see #503.

@mattmccutchen-cci
Copy link
Member

In brief testing, I noticed a few problems:

  1. 3c does not exit nonzero when there is an error diagnostic.
  2. The diagnostic verifier does not seem to be enabled: the -verify option has no effect and the original diagnostics are just reported. (Indeed, the VerifyDiagnosticOutput variable is unreferenced.) This would have caused the regression tests with expected errors to fail when those errors got reported directly, except that (1) hid the problem.

Perhaps we should have regression tests specifically for (1) and (2). I can write them if you want.

I'm also concerned that the use of FileEntry::tryGetRealPathName rather than our getCanonicalFilePath may lead to writing outside the base directory via symlinks, given that the comments in FileManager::fillRealPathName claim that symlinks are not resolved. However, I haven't yet been able to confirm via testing whether this problem can happen.

@mattmccutchen-cci
Copy link
Member

mattmccutchen-cci commented Mar 16, 2021

I looked into the canonical file path stuff a bit more. Apparently, the reason why this PR triggered the assertion failure is that the path returned by FileEntry::getName may be expressed relative to the directory field of the compilation database entry, whereas getCanonicalFilePath tries to interpret the path relative to 3C's working directory. ClangTool::run temporarily changes the working directory to the compilation database directory field, so we got what we wanted. But this PR is no longer using ClangTool::run, so there is no longer a guarantee that the two directories are the same.

Under the benchmark workflow prior to mwhicks1/3c-actions#7, 3C's working directory was set to clang/tools/3c/utils/port_tools of the 3C's repository, so we saw the problem. In mwhicks1/3c-actions#7, I changed the workflow to set the working directory to the project base directory for my own convenience. Interestingly, at least in the case of vsftpd, this masked the getCanonicalFilePath problem because the project base directory equals the compilation database directory field. This could be an argument for having the workflow set the working directory to some arbitrary directory to flush out anything in the system that makes improper assumptions about the working directory; I may work on that separately. (Update: Submitted mwhicks1/3c-actions#11.)

Re the proposed code in this PR: In one simple test I performed, the value returned by FileEntry::tryGetRealPathName seemed to have ultimately come from RealFileSystem::openFileForRead, which uses the same underlying LLVM library code to resolve symlinks that getCanonicalFilePath does. So if we know that all of our file entries will take this code path, it's fine to just use FileEntry::tryGetRealPathName. If we want to be super safe against unusual scenarios or possible future changes to Clang, we could put the return value of FileEntry::tryGetRealPathName through getCanonicalFilePath again. That might be worth doing at least in RewriteUtils, because that code is the critical gate against writing files we shouldn't, especially if we add an "overwrite files in place" option to 3C in the future. If we do the check there, it may not be worth also doing it in PersistentSourceLoc because that may have a more significant performance cost, and the worst that can happen if PersistentSourceLoc gets a wrong answer is that the program analysis doesn't do what we want; any actual attempt to modify a file outside the base dir would trigger the "unwritable change" error in RewriteUtils. In any event, I now know that having a fallback that passes the return value of FileEntry::getName directly to getCanonicalFilePath is not a good idea since it may trip the assertion or (if we're really unlucky) match an unrelated file in another directory and silently give an incorrect answer.

I don't understand why PersistentSourceLoc::mkPSL has the other fallbacks that it does. Ideally they would be documented or replaced with assertions, but that's outside the scope of this PR.

I'll be interested to see if no longer using ClangTool::run breaks any other assumptions I've naively made. 🙂

@mattmccutchen-cci
Copy link
Member

Perhaps we should have regression tests specifically for (1) and (2). I can write them if you want.

Submitted in #489. If you combine #489 with this PR (either in a temporary branch or by updating this PR after #489 is merged), you should see failures indicating the problems in this PR that need to be fixed.

@kyleheadley
Copy link
Member Author

@mattmccutchen-cci thanks for the error report. Exiting non-zero is an easy fix, but the diagnostic verifier will be more difficult.

I have to make sure clang parses the annotations, and then keep them open to be satisfied later. I may or may not be able to localize them to one pass, i.e. annotations will have to mention clang warnings along with rewriter errors. I expect there will be options we can decide on once I have a handle on their state-machine. I'm uncertain how a makefile with -verify on some files will interact.

@mattmccutchen-cci
Copy link
Member

I may or may not be able to localize them to one pass, i.e. annotations will have to mention clang warnings along with rewriter errors.

I think this issue already exists but is not a problem because the regression tests that use the diagnostic verifier just turn off the Clang warnings with -Wno-everything. You don't have to address the issue in this PR unless you want to.

I'm uncertain how a makefile with -verify on some files will interact.

So far, the regression tests only use 3c -verify with a single translation unit; I don't know what happens with multiple translation units. Again, if you want to research it, fine, but I think it's also fine to wait to deal with this until we need to write a regression test that uses 3c -verify with multiple translation units.

ClangTool &Tool = getGlobalClangTool();
Tool.buildASTs(ASTs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd to me to build the ASTs here, since the name of the method is addVariables.

Perhaps there should be another method call in _3CInterface that you specifically invoke to the build the ASTs, and then the subsequent passes can check to confirm the ASTs have been built already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I'll do a minor refactor once the functionality is stable and I agree this needs its own little section.

GenericAction<VariableAdderConsumer, ProgramInfo>>(GlobalProgramInfo);
VariableAdderConsumer VA = VariableAdderConsumer(GlobalProgramInfo, nullptr);
for (auto &TU : ASTs) {
VA.HandleTranslationUnit(TU->getASTContext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No return code form this call? Tool.run() below returns an exit code that could cause a return value of false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same question for the passes below.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. fixed.

RewriteConsumer RC = RewriteConsumer(GlobalProgramInfo);
for (auto &TU : ASTs) {
RC.HandleTranslationUnit(TU->getASTContext());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rewriting happens, do you need to delete/free the ASTs? Maybe add another method to free them, to complement the one I'm suggesting you add, to build them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could tell the destructors handled everything. That might be different if we had to save/load for memory, or maintain the diagnostics for verification. If you can think of anything else, I'll check it out and add it.

@mwhicks1
Copy link
Member

Are there other places where we are using Tool.run() but should change to use the saved-away ASTs? For example, I am wondering if the _3CInterface is the only place these things change. I feel like I've seen Tool.run in other places.

@kyleheadley
Copy link
Member Author

Are there other places where we are using Tool.run() but should change to use the saved-away ASTs? For example, I am wondering if the _3CInterface is the only place these things change. I feel like I've seen Tool.run in other places.

greping through lib/3C for "Tool" and "run" provide no other instances. Do you know of somewhere else to look?

@kyleheadley
Copy link
Member Author

benchmarks

@kyleheadley kyleheadley marked this pull request as ready for review March 19, 2021 19:44
@kyleheadley
Copy link
Member Author

I manually verified that the disabled tests do produce the intended diagnostics after this PR. The only one that didn't was the symlink one that instead gave an error about too many symlink levels.

Copy link
Member

@mattmccutchen-cci mattmccutchen-cci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually verified that the disabled tests do produce the intended diagnostics after this PR.

Thanks, I appreciate it.

The only one that didn't was the symlink one that instead gave an error about too many symlink levels.

Interesting: I ran that test and it gave the correct diagnostics, so I think we're good. I don't know what happened in your environment.

Copy link
Collaborator

@john-h-kastner john-h-kastner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very small change in a log message. Otherwise, this looks good.

Co-authored-by: John Kastner <[email protected]>
…possible addition to #488) (#508)

* Slight improvements and comments for file path canonicalization.

In particular, getCanonicalFilePath (the version that asserts) is no
longer used.

* Add comments about blocks around diagnostic generation.
@kyleheadley kyleheadley merged commit ad4c801 into main Mar 23, 2021
mattmccutchen-cci added a commit that referenced this pull request Mar 26, 2021
preprocess-before-conversion branch.

It's no longer needed with the current expand-macros approach and would
need a new regression test, which I don't want to deal with now. Plus,
in its current form, it seems to have a semantic conflict with #488,
which might fix the icecast file path issue.
mattmccutchen-cci added a commit that referenced this pull request Mar 30, 2021
PR #488 made `3c -verify` verify only the compiler diagnostics, but none
of our regression tests actually use that functionality. Instead, one
regression test (macro_function_call) used `3c -verify` to try to test
the absence of 3C warnings, and we were unaware that the test wasn't
testing what it was supposed to. I think it's best to make `3c -verify`
an error for now so we don't make that mistake again.

I'm deleting the addVerifyAdjuster code because I think we're unlikely
to ever want to use it. If we do, it should be easy enough to bring back
by reverting this commit.
mattmccutchen-cci added a commit that referenced this pull request Mar 30, 2021
PR #488 made `3c -verify` cover only the compiler diagnostics, but none
of our regression tests actually use that functionality. Instead, one
regression test (macro_function_call) used `3c -verify` to try to test
the absence of 3C warnings, and we were unaware that the test wasn't
testing what it was supposed to. I think it's best to make `3c -verify`
an error for now so we don't make that mistake again.

I'm deleting the addVerifyAdjuster code because I think we're unlikely
to ever want to use it. If we do, it should be easy enough to bring back
by reverting this commit.
mattmccutchen-cci added a commit that referenced this pull request Mar 30, 2021
PR #488 made `3c -verify` cover only the compiler diagnostics, but none
of our regression tests actually use that functionality. Instead, one
regression test (macro_function_call) used `3c -verify` to try to test
the absence of 3C warnings, and we were unaware that the test wasn't
testing what it was supposed to. I think it's best to make `3c -verify`
an error for now so we don't make that mistake again.
mattmccutchen-cci added a commit that referenced this pull request Jun 10, 2021
This feels a little cleaner than manually scanning the stderr and lets
us check the diagnostics more precisely, for what that's worth. We
couldn't do this before #488 both because 3C exited before the
diagnostics could be verified and because 3C didn't support custom
verify prefixes. #488 solved the first problem, and this PR solves the
second.
mattmccutchen-cci added a commit that referenced this pull request Jun 11, 2021
Re-enable diagnostic verification in regression tests where it still
succeeds. It remains disabled in some tests in which failures were
introduced while it was disabled; follow-up on that is included in #503.

Addresses #503.
mattmccutchen-cci added a commit that referenced this pull request Jun 11, 2021
This feels a little cleaner than manually scanning the stderr and lets
us check the diagnostics more precisely, for what that's worth. We
couldn't do this before #488 both because 3C exited before the
diagnostics could be verified and because 3C didn't support custom
verify prefixes. #488 solved the first problem, and this PR solves the
second.
mattmccutchen-cci added a commit that referenced this pull request Jun 14, 2021
Re-enable diagnostic verification in regression tests where it still
succeeds. It remains disabled in some tests in which failures were
introduced while it was disabled; #609 is for that.

Fixes #503.
mattmccutchen-cci added a commit that referenced this pull request Jun 14, 2021
This feels a little cleaner than manually scanning the stderr and lets
us check the diagnostics more precisely, for what that's worth. We
couldn't do this before #488 both because 3C exited before the
diagnostics could be verified and because 3C didn't support custom
verify prefixes. #488 solved the first problem, and this PR solves the
second.
mattmccutchen-cci added a commit that referenced this pull request Jun 14, 2021
Re-enable diagnostic verification in regression tests where it still
succeeds. It remains disabled in some tests in which failures were
introduced while it was disabled; #609 is for that.

Fixes #503.
mattmccutchen-cci added a commit that referenced this pull request Jun 14, 2021
This feels a little cleaner than manually scanning the stderr and lets
us check the diagnostics more precisely, for what that's worth. We
couldn't do this before #488 both because 3C exited before the
diagnostics could be verified and because 3C didn't support custom
verify prefixes. #488 solved the first problem, and this PR solves the
second.
mattmccutchen-cci added a commit that referenced this pull request Jun 14, 2021
* Reverse name of 3c -disccty option to reflect what it actually does.

By default, 3c adds the -f3c-tool compiler option to disable the Checked
C type checker. 3c's -disccty option (apparently an abbreviation for
"disable Checked C type checker") made 3c _not_ pass -f3c-tool, hence
leaving the type checker _enabled_. 3C's internal flag variable,
DisableCCTypeChecker, was also backwards, though interestingly, the
`3c -help` text was correct ("Do not disable checked c type checker").

This commit renames -disccty to -enccty and renames DisableCCTypeChecker
to EnableCCTypeChecker. We take the opportunity to further improve the
`3c -help` text too.

* Completely remove the old `3c -verify` option.

Now that 3C only runs one compiler "invocation" per translation unit,
with the new diagnostic verifier implementation, callers will be able to
just use the `-Xclang -verify` compiler option. And given that we may
want to use other diagnostic verifier options (e.g., `-verify=PREFIX` or
`-verify-ignore-unexpected`), let's standardize on using the original
compiler options rather than trying to define `3c` options wrapping all
of them.

* Switch from ClangTool::buildASTs to a custom _3CASTBuilderAction.

Replace the existing ArgumentsAdjusters with modifications to the option
data structures in _3CASTBuilderAction. More will be added to
_3CASTBuilderAction in the subsequent commits.

Fixes #536.

* Canonicalize -I directory paths in _3CASTBuilderAction.

This works per translation unit, so hopefully it is a complete and
correct fix to #515 that benefits all callers of 3C.

Remove the previous crude workaround from convert_project.

Fixes #515.

* Make diagnostic verification work after the redesign of #488.

Re-enable diagnostic verification in regression tests where it still
succeeds. It remains disabled in some tests in which failures were
introduced while it was disabled; #609 is for that.

Fixes #503.

* Migrate the difftypes* tests to the diagnostic verifier.

This feels a little cleaner than manually scanning the stderr and lets
us check the diagnostics more precisely, for what that's worth. We
couldn't do this before #488 both because 3C exited before the
diagnostics could be verified and because 3C didn't support custom
verify prefixes. #488 solved the first problem, and this PR solves the
second.

* Document diagnostic verification in 3C's CONTRIBUTING.md.
@Machiry Machiry deleted the one_clang_pass branch January 22, 2022 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants