Description
I'm filing an issue that I believe is extremely rare and probably not worth fixing on its own, but I think it's valuable to record what I've learned, particularly so that I can refer to it in documenting what #598 does and does not fix.
The root cause of #529 was that for some reason, the following code in PersistentSourceLoc
isn't able to get a canonical file path from Clang for code inside macro calls:
checkedc-clang/clang/lib/3C/PersistentSourceLoc.cpp
Lines 66 to 79 in e240faa
In those cases, we fall back to the original
SM.getPresumedLoc(SL).getFilename()
, which I'll call the "plain" path.
According to my tests, for the main source file, the plain path just comes from the compiler command line of the ToolAction
, which in turn comes from the compilation database or (if a compilation database is not used) from the SourceFiles
variable in 3C.cpp
but may be affected by ArgumentsAdjuster
s. For an #include
d file, the plain path is the concatenation of the -I
path and the path in the #include
directive. (When an #include
is implicitly resolved relative to the file in which it appears, the dirname of the plain path of that file plays the role of the -I
path.) However, due to a cache in the FileManager (see FileManager::UniqueRealFiles
), if the same 3C run opens a file with the same inode number several times (via symlinks or probably even hard links), it always uses the same plain path: perhaps the first one that was determined according to the rules above.
However, in general, the plain path is not guaranteed to be the same as the canonical path. We currently do two things to address that problem:
- For the main source file, after any lookup in the compilation database, 3C re-canonicalizes the path via an
ArgumentsAdjuster
. This was added in Ensure presumed source file paths are canonical when using a compilation database. #532 and completely fixes the case of the main source file (which was the case of Failure rewriting static function declaration in macro #529). - For
#include
d files,convert_project
adds-extra-arg-before=-I
options with the canonical paths of all the-I
paths in the compilation database. The use of-extra-arg-before
options that affect all translation units is a hack that could lead to incorrect behavior if-I
paths intentionally differ among translation units, but modulo that caveat, it completely works aroundClangTool::run
chdir
call corrupts internal Clang file path cache (?) #515 for callers that useconvert_project
. (I discovered and documented this inconvert_project
: Add option to expand macros before running 3C. #516 after I tried to remove the workaround and saw benchmark failures due toClangTool::run
chdir
call corrupts internal Clang file path cache (?) #515. Prior toconvert_project
: Add option to expand macros before running 3C. #516,convert_project
did a similar thing but only made the paths absolute, not canonical. I suspect that behavior was added was only because we didn't realize that 3C was capable of taking-I
options from the compilation database, not because we had run intoClangTool::run
chdir
call corrupts internal Clang file path cache (?) #515. But I could be wrong about this, and in any event, it isn't important now.) I later realized during preparation of Ensure presumed source file paths are canonical when using a compilation database. #532 that the same workaround also significantly helps with this issue, and I added a comment to that effect in Ensure presumed source file paths are canonical when using a compilation database. #532.
However, the plain and canonical paths can still differ in one case: an #include
where the portion of the path in the #include
directive (as opposed to the -I
option) is non-canonical. It seems that I overlooked this case while working on #532. This can potentially have several bad effects (that I can think of so far):
-
A program element that 3C matches based on
PersistentSourceLoc
file path may not get matched if it appears once in a macro and once outside a macro. AFAIK, this is currently only an issue for static functions (as originally seen in Failure rewriting static function declaration in macro #529) because they can be declared multiple times and they use thePersistentSourceLoc
file path to distinguish between static functions of the same name in different translation units. It's unusual to#include
a static function from another file; there are some cases such as Fixes for errors in the "array_ptr" interfaces to some libc functions declared in checkedc_extensions.h, and removal of the conditional in the snprintf declaration in stdio_checked.h checkedc/checkedc#456, but it would be even more unusual to put the function inside a macro.For program elements that 3C matches based on the complete
PersistentSourceLoc
rather than just the file path (such as structs as in structs are not global : good or bad (relates to preprocessor, in part) ? #499), having a non-macro occurrence and a macro occurrence at the exact samePersistentSourceLoc
falls into the category of "PSL collision" cases that we consider pathological and don't care about at all right now. -
3C decides whether the element is inside the base directory based on the plain path, which may be incorrect. However, regardless of the base directory, the element will be unwritable because of the macro, so the only impact is a wrong reason in the root-cause warning.
One complete fix would be to have the PersistentSourceLoc
code quoted above canonicalize the file path itself, as I contemplated in the comment quoted above. If we can find a way to ensure that Clang always gives us a canonical path in the first place, that might be preferable, but it might be hard to be confident that we've handled all cases.