-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer][NFC] Introduce framework for checker families #139256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The checker classes (i.e. classes derived from `CheckerBase` via the utility template `Checker<...>`) act as intermediates between the user and the analyzer engine, so they have two interfaces: - On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports. - On the backend side, they can handle various checker callbacks and they "leave a mark" on the `ExplodedNode`s that are created by them. (These `ProgramPointTag` marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.) In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of `mutable std::unique_ptr<BugType>` members and redundant data members (when a checker used its custom `CheckNames` array and ignored the inherited single `Name`). My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with `{parent checker object, index of part}` pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member. This is implemented by separating the functionality of `CheckerBase` into two new classes: `CheckerFrontend` and `CheckerBackend`. The name `CheckerBase` is kept (as a class derived from both `CheckerFrontend` and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and `Checker<...>` continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation: - The class `CheckerFamily<...>` works exactly like `Checker<...>` but inherits from `CheckerBackend` instead of `CheckerBase`, so it won't have a superfluous single `Name` member. - Classes deriving from `CheckerFamily` can freely own multiple `CheckerFrontend` data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the `BugType`s that they can emit. In this scheme each `CheckerFamily` needs to override the pure virtual method `ProgramPointTag::getTagDescription()` which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.) I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this `CheckerFamily` framework.
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Donát Nagy (NagyDonat) ChangesThe checker classes (i.e. classes derived from
In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with This is implemented by separating the functionality of
In this scheme each I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this Patch is 32.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139256.diff 11 Files Affected:
diff --git a/clang/include/clang/Analysis/ProgramPoint.h b/clang/include/clang/Analysis/ProgramPoint.h
index c40aa3d8ffb72..d81b8e845cb48 100644
--- a/clang/include/clang/Analysis/ProgramPoint.h
+++ b/clang/include/clang/Analysis/ProgramPoint.h
@@ -40,8 +40,8 @@ class ProgramPointTag {
ProgramPointTag(void *tagKind = nullptr) : TagKind(tagKind) {}
virtual ~ProgramPointTag();
- /// The description of this program point which will be displayed when the
- /// ExplodedGraph is dumped in DOT format for debugging.
+ /// The description of this program point which will be dumped for debugging
+ /// purposes. Do not use in user-facing output!
virtual StringRef getTagDescription() const = 0;
/// Used to implement 'isKind' in subclasses.
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
index 8e1d25b3eefa1..33d37febc7327 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
@@ -643,9 +643,10 @@ class BugReporter {
/// reports.
virtual void emitReport(std::unique_ptr<BugReport> R);
- void EmitBasicReport(const Decl *DeclWithIssue, const CheckerBase *Checker,
- StringRef BugName, StringRef BugCategory,
- StringRef BugStr, PathDiagnosticLocation Loc,
+ void EmitBasicReport(const Decl *DeclWithIssue,
+ const CheckerFrontend *Checker, StringRef BugName,
+ StringRef BugCategory, StringRef BugStr,
+ PathDiagnosticLocation Loc,
ArrayRef<SourceRange> Ranges = {},
ArrayRef<FixItHint> Fixits = {});
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 3a635e0d0125a..b05360904f86d 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -27,11 +27,7 @@ class BugReporter;
class BugType {
private:
- struct CheckerPartRef {
- const CheckerBase *Checker;
- CheckerPartIdx Idx;
- };
- using CheckerNameInfo = std::variant<CheckerNameRef, CheckerPartRef>;
+ using CheckerNameInfo = std::variant<CheckerNameRef, const CheckerFrontend *>;
const CheckerNameInfo CheckerName;
const std::string Description;
@@ -43,7 +39,7 @@ class BugType {
public:
// Straightforward constructor where the checker name is specified directly.
// TODO: As far as I know all applications of this constructor involve ugly
- // hacks that could be avoided by switching to a different constructor.
+ // hacks that could be avoided by switching to the other constructor.
// When those are all eliminated, this constructor should be removed to
// eliminate the `variant` and simplify this class.
BugType(CheckerNameRef CheckerName, StringRef Desc,
@@ -52,18 +48,11 @@ class BugType {
SuppressOnSink(SuppressOnSink) {}
// Constructor that can be called from the constructor of a checker object.
// At that point the checker name is not yet available, but we can save a
- // pointer to the checker and later use that to query the name.
- BugType(const CheckerBase *Checker, StringRef Desc,
+ // pointer to the checker and use that to query the name.
+ BugType(const CheckerFrontend *CF, StringRef Desc,
StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
- : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
- Category(Cat), SuppressOnSink(SuppressOnSink) {}
- // Constructor that can be called from the constructor of a checker object
- // when it has multiple parts with separate names. We save the name and the
- // part index to be able to query the name of that part later.
- BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
- StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
- : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
- Category(Cat), SuppressOnSink(SuppressOnSink) {}
+ : CheckerName(CF), Description(Desc), Category(Cat),
+ SuppressOnSink(SuppressOnSink) {}
virtual ~BugType() = default;
StringRef getDescription() const { return Description; }
@@ -72,8 +61,7 @@ class BugType {
if (const auto *CNR = std::get_if<CheckerNameRef>(&CheckerName))
return *CNR;
- auto [Checker, Idx] = std::get<CheckerPartRef>(CheckerName);
- return Checker->getName(Idx);
+ return std::get<const CheckerFrontend *>(CheckerName)->getName();
}
/// isSuppressOnSink - Returns true if bug reports associated with this bug
@@ -82,6 +70,24 @@ class BugType {
bool isSuppressOnSink() const { return SuppressOnSink; }
};
+/// Trivial convenience class for the common case when a certain checker
+/// frontend always uses the same bug type. This way instead of writing
+/// ```
+/// CheckerFrontend LongCheckerFrontendName;
+/// BugType LongCheckerFrontendNameBug{LongCheckerFrontendName, "..."};
+/// ```
+/// we can use `CheckerFrontendWithBugType LongCheckerFrontendName{"..."}`.
+class CheckerFrontendWithBugType : public CheckerFrontend {
+ BugType BT;
+
+public:
+ CheckerFrontendWithBugType(StringRef Desc,
+ StringRef Cat = categories::LogicError,
+ bool SuppressOnSink = false)
+ : BT(this, Desc, Cat, SuppressOnSink) {}
+ const BugType &getBT() const { return BT; }
+};
+
} // namespace ento
} // end clang namespace
diff --git a/clang/include/clang/StaticAnalyzer/Core/Checker.h b/clang/include/clang/StaticAnalyzer/Core/Checker.h
index a54c5bee612f6..45b0398f3aca5 100644
--- a/clang/include/clang/StaticAnalyzer/Core/Checker.h
+++ b/clang/include/clang/StaticAnalyzer/Core/Checker.h
@@ -484,83 +484,87 @@ class Call {
} // end eval namespace
-class CheckerBase : public ProgramPointTag {
- /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
- /// multiple user-facing checkers that have separate names and can be enabled
- /// separately, but are backed by the same singleton checker object.
- SmallVector<std::optional<CheckerNameRef>, 1> RegisteredNames;
-
- friend class ::clang::ento::CheckerManager;
+/// A `CheckerFrontend` instance is what the user recognizes as "one checker":
+/// it has a public canonical name (injected from the `CheckerManager`), can be
+/// enabled or disabled, can have associated checker options and can be printed
+/// as the "source" of bug reports.
+/// The singleton instance of a simple `Checker<...>` is-a `CheckerFrontend`
+/// (for historical reasons, to preserve old straightforward code), while the
+/// singleton instance of a `CheckerFamily<...>` class owns multiple
+/// `CheckerFrontend` instances as data members.
+/// Modeling checkers that are hidden from the user but can be enabled or
+/// disabled separately (as dependencies of other checkers) are also considered
+/// to be `CheckerFrontend`s.
+class CheckerFrontend {
+ /// The `Name` is nullopt if and only if the checker is disabled.
+ std::optional<CheckerNameRef> Name;
public:
- CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
- assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
- std::optional<CheckerNameRef> Name = RegisteredNames[Idx];
- assert(Name && "Requested checker part is not registered!");
- return *Name;
- }
-
- bool isPartEnabled(CheckerPartIdx Idx) const {
- return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
- }
-
- void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
- // Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
- assert(Idx < 256 && "Checker part identifiers should be small integers.");
-
- if (Idx >= RegisteredNames.size())
- RegisteredNames.resize(Idx + 1, std::nullopt);
-
- assert(!RegisteredNames[Idx] && "Repeated registration of checker a part!");
- RegisteredNames[Idx] = Name;
- }
-
- StringRef getTagDescription() const override {
- // When the ExplodedGraph is dumped for debugging (in DOT format), this
- // method is called to attach a description to nodes created by this
- // checker _class_. Ideally this should be recognizable identifier of the
- // whole class, but for this debugging purpose it's sufficient to use the
- // name of the first registered checker part.
- for (const auto &OptName : RegisteredNames)
- if (OptName)
- return *OptName;
-
- return "Unregistered checker";
+ void enable(CheckerManager &Mgr) {
+ assert(!Name && "Checker part registered twice!");
+ Name = Mgr.getCurrentCheckerName();
}
+ bool isEnabled() const { return static_cast<bool>(Name); }
+ CheckerNameRef getName() const { return *Name; }
+};
+/// `CheckerBackend` is an abstract base class that serves as the common
+/// ancestor of all the `Checker<...>` and `CheckerFamily<...>` classes which
+/// can create `ExplodedNode`s (by acting as a `ProgramPointTag`) and can be
+/// registered to handle various checker callbacks. (Moreover the debug
+/// callback `printState` is also introduced here.)
+class CheckerBackend : public ProgramPointTag {
+public:
/// Debug state dump callback, see CheckerManager::runCheckersForPrintState.
/// Default implementation does nothing.
virtual void printState(raw_ostream &Out, ProgramStateRef State,
const char *NL, const char *Sep) const;
};
-/// Dump checker name to stream.
-raw_ostream& operator<<(raw_ostream &Out, const CheckerBase &Checker);
-
-/// Tag that can use a checker name as a message provider
-/// (see SimpleProgramPointTag).
-class CheckerProgramPointTag : public SimpleProgramPointTag {
+/// The non-templated common ancestor of all the simple `Checker<...>` classes.
+class CheckerBase : public CheckerFrontend, public CheckerBackend {
public:
- CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
- CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+ /// Attached to nodes created by this checker class when the ExplodedGraph is
+ /// dumped for debugging.
+ StringRef getTagDescription() const override;
};
-template <typename CHECK1, typename... CHECKs>
-class Checker : public CHECK1, public CHECKs..., public CheckerBase {
+// Template magic to implement the static method `_register()` which registers
+// the `Checker` or `CheckerFamily` for all the implemented callbacks.
+template <typename CHECKER, typename CHECK1, typename... CHECKs>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {
+ CHECK1::_register(Chk, Mgr);
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
+}
+
+template <typename CHECKER>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {}
+
+/// Simple checker classes that implement one frontend (i.e. checker name)
+/// should derive from this template and specify all the implemented callbacks
+/// (i.e. classes like `check::PreStmt` or `eval::Call`) as template arguments
+/// of `Checker`.
+template <typename... CHECKs>
+class Checker : public CheckerBase, public CHECKs... {
public:
template <typename CHECKER>
- static void _register(CHECKER *checker, CheckerManager &mgr) {
- CHECK1::_register(checker, mgr);
- Checker<CHECKs...>::_register(checker, mgr);
+ static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
}
};
-template <typename CHECK1>
-class Checker<CHECK1> : public CHECK1, public CheckerBase {
+/// Checker families (where a single backend class implements multiple related
+/// frontends) should derive from this template, specify all the implemented
+/// callbacks (i.e. classes like `check::PreStmt` or `eval::Call`) as template
+/// arguments of `FamilyChecker` and implement the pure virtual method
+/// `StringRef getTagDescription()` which is inherited from `ProgramPointTag`
+/// and should return a string identifying the class for debugging purposes.
+template <typename... CHECKs>
+class CheckerFamily : public CheckerBackend, public CHECKs... {
public:
template <typename CHECKER>
- static void _register(CHECKER *checker, CheckerManager &mgr) {
- CHECK1::_register(checker, mgr);
+ static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+ registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
}
};
@@ -581,6 +585,20 @@ class EventDispatcher {
}
};
+/// Tag that can use a checker name as a message provider
+/// (see SimpleProgramPointTag).
+/// FIXME: This is a cargo cult class which is copied into several checkers but
+/// does not provide anything useful.
+/// The only added functionality provided by this class (compared to
+/// SimpleProgramPointTag) is that it composes the tag description string from
+/// two arguments -- but tag descriptions only appear in debug output so there
+/// is no reason to bother with this.
+class CheckerProgramPointTag : public SimpleProgramPointTag {
+public:
+ CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
+ CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+};
+
/// We dereferenced a location that may be null.
struct ImplicitNullDerefEvent {
SVal Location;
diff --git a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
index 03ffadd346d0b..8fa122f004bfe 100644
--- a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
@@ -39,7 +39,8 @@ class AnalysisManager;
class CXXAllocatorCall;
class BugReporter;
class CallEvent;
-class CheckerBase;
+class CheckerFrontend;
+class CheckerBackend;
class CheckerContext;
class CheckerRegistry;
struct CheckerRegistryData;
@@ -64,9 +65,9 @@ class CheckerFn<RET(Ps...)> {
Func Fn;
public:
- CheckerBase *Checker;
+ CheckerBackend *Checker;
- CheckerFn(CheckerBase *checker, Func fn) : Fn(fn), Checker(checker) {}
+ CheckerFn(CheckerBackend *checker, Func fn) : Fn(fn), Checker(checker) {}
RET operator()(Ps... ps) const {
return Fn(Checker, ps...);
@@ -116,19 +117,6 @@ class CheckerNameRef {
operator StringRef() const { return Name; }
};
-/// A single checker class (and its singleton instance) can act as the
-/// implementation of several (user-facing or modeling) checker parts that
-/// have shared state and logic, but have their own names and can be enabled or
-/// disabled separately.
-/// Each checker class that implement multiple parts introduces its own enum
-/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
-/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
-using CheckerPartIdx = unsigned;
-
-/// If a checker doesn't have multiple parts, then its single part is
-/// represented by this index.
-constexpr inline CheckerPartIdx DefaultPart = 0;
-
enum class ObjCMessageVisitKind {
Pre,
Post,
@@ -193,14 +181,7 @@ class CheckerManager {
/// Emits an error through a DiagnosticsEngine about an invalid user supplied
/// checker option value.
- void reportInvalidCheckerOptionValue(const CheckerBase *C,
- StringRef OptionName,
- StringRef ExpectedValueDesc) const {
- reportInvalidCheckerOptionValue(C, DefaultPart, OptionName,
- ExpectedValueDesc);
- }
-
- void reportInvalidCheckerOptionValue(const CheckerBase *C, CheckerPartIdx Idx,
+ void reportInvalidCheckerOptionValue(const CheckerFrontend *CP,
StringRef OptionName,
StringRef ExpectedValueDesc) const;
@@ -210,28 +191,15 @@ class CheckerManager {
// Checker registration.
//===--------------------------------------------------------------------===//
- /// Construct the singleton instance of a checker, register it for the
- /// supported callbacks and record its name with `registerCheckerPart()`.
- /// Arguments passed to this function are forwarded to the constructor of the
- /// checker.
- ///
- /// If `CHECKER` has multiple parts, then the constructor call and the
- /// callback registration only happen within the first `registerChecker()`
- /// call; while the subsequent calls only enable additional parts of the
- /// existing checker object (while registering their names).
- ///
- /// \returns a pointer to the checker object.
- template <typename CHECKER, CheckerPartIdx Idx = DefaultPart, typename... AT>
- CHECKER *registerChecker(AT &&...Args) {
- // This assert could be removed but then we need to make sure that calls
- // registering different parts of the same checker pass the same arguments.
- static_assert(
- Idx == DefaultPart || !sizeof...(AT),
- "Argument forwarding isn't supported with multi-part checkers!");
-
+ /// If the the singleton instance of a checker class is not yet constructed,
+ /// then construct it (with the supplied arguments), register it for the
+ /// callbacks that are supported by it, and return it. Otherwise, just return
+ /// a pointer to the existing instance.
+ template <typename CHECKER, typename... AT>
+ CHECKER *getChecker(AT &&...Args) {
CheckerTag Tag = getTag<CHECKER>();
- std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
+ std::unique_ptr<CheckerBackend> &Ref = CheckerTags[Tag];
if (!Ref) {
std::unique_ptr<CHECKER> Checker =
std::make_unique<CHECKER>(std::forward<AT>(Args)...);
@@ -239,18 +207,18 @@ class CheckerManager {
Ref = std::move(Checker);
}
- CHECKER *Result = static_cast<CHECKER *>(Ref.get());
- Result->registerCheckerPart(Idx, CurrentCheckerName);
- return Result;
+ return static_cast<CHECKER *>(Ref.get());
}
- template <typename CHECKER>
- CHECKER *getChecker() {
- CheckerTag Tag = getTag<CHECKER>();
- std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
- assert(Ref && "Requested checker is not registered! Maybe you should add it"
- " as a dependency in Checkers.td?");
- return static_cast<CHECKER *>(Ref.get());
+ /// Register a single-part checker (derived from `Checker`): construct its
+ /// singleton instance, register it for the supported callbacks and record
+ /// its name (with `CheckerFrontend::enable`). Calling this multiple times
+ /// triggers an assertion failure.
+ template <typename CHECKER, typename... AT>
+ CHECKER *registerChecker(AT &&...Args) {
+ CHECKER *Chk = getChecker<CHECKER>(std::forward<AT>(Args)...);
+ Chk->enable(*this);
+ return Chk;
}
template <typename CHECKER> bool isRegisteredChecker() {
@@ -482,7 +450,7 @@ class CheckerManager {
/// Run checkers for debug-printing a ProgramState.
///
/// Unlike most other callbacks, any checker can simply implement the virtual
- /// method CheckerBase::printState if it has custom data to print.
+ /// method CheckerBackend::printState if it has custom data to print.
///
/// \param Out The output stream
/// \param State The state being printed
@@ -651,7 +619,7 @@ class CheckerManager {
template <typename T>
static void *getTag() { static int tag; return &tag; }
- llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBase>> CheckerTags;
+ llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBackend>> CheckerTags;
struct DeclCheckerInfo {
CheckDeclFunc CheckFn;
diff --git a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
index 3dd57732305b2..7672c63a646e4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
@@ -25,7 +25,7 @@ using namespace ento;
using namespace taint;
namespace {
-class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
+class DivZeroChecker : public CheckerFamily<check::PreStmt<BinaryOperator>> {
void reportBug(StringRef Msg, ProgramStateRef StateZero,
CheckerContext &C) const;
void reportTaintBug(StringRef Msg, ProgramStateRef StateZero,
@@ -33,17 +33,15 @@ class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
llvm::ArrayRef<SymbolRef> TaintedSym...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is a really good direction. I like it.
I left a couple minor remarks inline.
I handled all the inline comments. I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of |
I'll think about this, but overall I'd prefer simplicity for defining Checkers - which may make magic behind the doors slightly more involved. |
I prototyped a "get the name of template argument as string" solution which is sufficient for our use cases (where the checker family is a non-templated class type) and works on all supported versions of all three compilers (GCC, clang, MSVC) that are supported for LLVM compilation: #include <iostream>
#include <string>
class MyFancyCheckerFamily {};
template <class T>
std::string getTypeName() {
#ifndef _MSC_VER
std::string res = __PRETTY_FUNCTION__;
int start = res.find('='), end = res.find(';');
if (end == -1)
end = res.find(']');
if (start == -1 || end == -1)
return "unknown-checker-family"; // paranoia, should not happen
return res.substr(start + 2, end - start - 2);
#else
std::string res = __FUNCSIG__;
int start = res.find("<class "), end = res.find('>');
if (start == -1 || end == -1)
return "unknown-checker-family"; // paranoia, should not happen
return res.substr(start + 7, end - start - 7);
#endif
}
int main() {
std::cout << getTypeName<MyFancyCheckerFamily>() << std::endl;
return 0;
} If I invoke this machinery from the registration functions of the checker families (where each checker type is available as a template argument), then it can "magically" ensure that This is a bit ugly, because there is no standard way to stringify a type, so I need to rely on the macros (By the way, note that "simple" single-part @steakhal (or anybody else) What do you think about this? |
I don't like this. LLVM is used as a library for wide set of tools and compiled with an even wider set of compilers. As a user, I wouldn't mind a solution like this, but as a library vendor, I'd vote very much against this. |
🤔 These strings are only relevant for debugging, so this whole machinery could be limited to debug builds (or builds with an off-by-default flag). That way these debug names would be still available for the very specific situation when somebody is debugging the analyzer, but wouldn't cause problems for anybody else. However, I'm also open to other ideas and I'd be happy to accept a better solution if you can find one. I hope that I can merge the |
I think I can't push to your branch. So here is what I'd do as git format-patch: 0001-NFC-Rename-getTagDescription-to-getDebugName.patch.txt |
Thanks for the patches! I originally thought that renaming I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:
However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the I also thought about tweaking |
I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this. Frankly, I would have preferred if the internal checker class name would be printed there instead of the user-facing name of the checker - but given that I despise defining this for every checker as a mandatory virtual function for checker families, I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.
I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered. |
You're correct wrt the current state of the code but I'm planning to get rid of many "backend/modeling" checkers that have no real purpose 😅 -- in my opinion having a superfluous checker definition in There are a few modeling checkers that fulfill a valid role because they act as dependencies of checkers defined in other files/classes, but I want to eliminate the pattern where a checker class defines a modeling checker which is the dependency of the other "real" checkers from the class (to be registered first) and does nothing useful.
Sorry for adding time pressure...
I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?) Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a |
I'd recommend you to look at the Just have a look at And thanks again for sticking around. I can understand if this thread already got far longer than it should have taken. |
The checker classes (i.e. classes derived from
CheckerBase
via the utility templateChecker<...>
) act as intermediates between the user and the analyzer engine, so they have two interfaces:ExplodedNode
s that are created by them. (TheseProgramPointTag
marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of
mutable std::unique_ptr<BugType>
members and redundant data members (when a checker used its customCheckNames
array and ignored the inherited singleName
).My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.
This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with
{parent checker object, index of part}
pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.This is implemented by separating the functionality of
CheckerBase
into two new classes:CheckerFrontend
andCheckerBackend
. The nameCheckerBase
is kept (as a class derived from bothCheckerFrontend
andCheckerBackend
), so "simple" checkers that useCheckerBase
andChecker<...>
continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:CheckerFamily<...>
works exactly likeChecker<...>
but inherits fromCheckerBackend
instead ofCheckerBase
, so it won't have a superfluous singleName
member.CheckerFamily
can freely own multipleCheckerFrontend
data members, which are enabled within the registration methods corresponding to their name and can be used to initialize theBugType
s that they can emit.In this scheme each
CheckerFamily
needs to override the pure virtual methodProgramPointTag::getTagDescription()
which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this
CheckerFamily
framework.