[analyzer][NFC] Introduce framework for checker families #139256

NagyDonat · 2025-05-09T12:51:25Z

The checker classes (i.e. classes derived from CheckerBase via the utility template Checker<...>) act as intermediates between the user and the analyzer engine, so they have two interfaces:

On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports.
On the backend side, they can handle various checker callbacks and they "leave a mark" on the ExplodedNodes that are created by them. (These ProgramPointTag marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)

In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of mutable std::unique_ptr<BugType> members and redundant data members (when a checker used its custom CheckNames array and ignored the inherited single Name).

My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.

This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with {parent checker object, index of part} pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.

This is implemented by separating the functionality of CheckerBase into two new classes: CheckerFrontend and CheckerBackend. The name CheckerBase is kept (as a class derived from both CheckerFrontend and CheckerBackend), so "simple" checkers that use CheckerBase and Checker<...> continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:

The class CheckerFamily<...> works exactly like Checker<...> but inherits from CheckerBackend instead of CheckerBase, so it won't have a superfluous single Name member.
Classes deriving from CheckerFamily can freely own multiple CheckerFrontend data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the BugTypes that they can emit.

In this scheme each CheckerFamily needs to override the pure virtual method ProgramPointTag::getTagDescription() which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)

I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this CheckerFamily framework.

The checker classes (i.e. classes derived from `CheckerBase` via the utility template `Checker<...>`) act as intermediates between the user and the analyzer engine, so they have two interfaces: - On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports. - On the backend side, they can handle various checker callbacks and they "leave a mark" on the `ExplodedNode`s that are created by them. (These `ProgramPointTag` marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.) In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of `mutable std::unique_ptr<BugType>` members and redundant data members (when a checker used its custom `CheckNames` array and ignored the inherited single `Name`). My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with `{parent checker object, index of part}` pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member. This is implemented by separating the functionality of `CheckerBase` into two new classes: `CheckerFrontend` and `CheckerBackend`. The name `CheckerBase` is kept (as a class derived from both `CheckerFrontend` and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and `Checker<...>` continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation: - The class `CheckerFamily<...>` works exactly like `Checker<...>` but inherits from `CheckerBackend` instead of `CheckerBase`, so it won't have a superfluous single `Name` member. - Classes deriving from `CheckerFamily` can freely own multiple `CheckerFrontend` data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the `BugType`s that they can emit. In this scheme each `CheckerFamily` needs to override the pure virtual method `ProgramPointTag::getTagDescription()` which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.) I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this `CheckerFamily` framework.

llvmbot · 2025-05-09T12:51:55Z

@llvm/pr-subscribers-clang-static-analyzer-1
@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

The checker classes (i.e. classes derived from CheckerBase via the utility template Checker<...>) act as intermediates between the user and the analyzer engine, so they have two interfaces:

On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports.
On the backend side, they can handle various checker callbacks and they "leave a mark" on the ExplodedNodes that are created by them. (These ProgramPointTag marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)

In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of mutable std::unique_ptr<BugType> members and redundant data members (when a checker used its custom CheckNames array and ignored the inherited single Name).

My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.

This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with {parent checker object, index of part} pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.

This is implemented by separating the functionality of CheckerBase into two new classes: CheckerFrontend and CheckerBackend. The name CheckerBase is kept (as a class derived from both CheckerFrontend and CheckerBackend), so "simple" checkers that use CheckerBase and Checker<...> continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:

The class CheckerFamily<...> works exactly like Checker<...> but inherits from CheckerBackend instead of CheckerBase, so it won't have a superfluous single Name member.
Classes deriving from CheckerFamily can freely own multiple CheckerFrontend data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the BugTypes that they can emit.

In this scheme each CheckerFamily needs to override the pure virtual method ProgramPointTag::getTagDescription() which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)

I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this CheckerFamily framework.

Patch is 32.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139256.diff

11 Files Affected:

(modified) clang/include/clang/Analysis/ProgramPoint.h (+2-2)
(modified) clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h (+4-3)
(modified) clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h (+25-19)
(modified) clang/include/clang/StaticAnalyzer/Core/Checker.h (+75-57)
(modified) clang/include/clang/StaticAnalyzer/Core/CheckerManager.h (+24-56)
(modified) clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp (+16-18)
(modified) clang/lib/StaticAnalyzer/Checkers/ObjCSelfInitChecker.cpp (+1-1)
(modified) clang/lib/StaticAnalyzer/Checkers/VirtualCallChecker.cpp (+15-14)
(modified) clang/lib/StaticAnalyzer/Core/BugReporter.cpp (+2-2)
(modified) clang/lib/StaticAnalyzer/Core/Checker.cpp (+5-9)
(modified) clang/lib/StaticAnalyzer/Core/CheckerManager.cpp (+11-11)

diff --git a/clang/include/clang/Analysis/ProgramPoint.h b/clang/include/clang/Analysis/ProgramPoint.h
index c40aa3d8ffb72..d81b8e845cb48 100644
--- a/clang/include/clang/Analysis/ProgramPoint.h
+++ b/clang/include/clang/Analysis/ProgramPoint.h
@@ -40,8 +40,8 @@ class ProgramPointTag {
   ProgramPointTag(void *tagKind = nullptr) : TagKind(tagKind) {}
   virtual ~ProgramPointTag();
 
-  /// The description of this program point which will be displayed when the
-  /// ExplodedGraph is dumped in DOT format for debugging.
+  /// The description of this program point which will be dumped for debugging
+  /// purposes. Do not use in user-facing output!
   virtual StringRef getTagDescription() const = 0;
 
   /// Used to implement 'isKind' in subclasses.
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
index 8e1d25b3eefa1..33d37febc7327 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
@@ -643,9 +643,10 @@ class BugReporter {
   /// reports.
   virtual void emitReport(std::unique_ptr<BugReport> R);
 
-  void EmitBasicReport(const Decl *DeclWithIssue, const CheckerBase *Checker,
-                       StringRef BugName, StringRef BugCategory,
-                       StringRef BugStr, PathDiagnosticLocation Loc,
+  void EmitBasicReport(const Decl *DeclWithIssue,
+                       const CheckerFrontend *Checker, StringRef BugName,
+                       StringRef BugCategory, StringRef BugStr,
+                       PathDiagnosticLocation Loc,
                        ArrayRef<SourceRange> Ranges = {},
                        ArrayRef<FixItHint> Fixits = {});
 
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 3a635e0d0125a..b05360904f86d 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -27,11 +27,7 @@ class BugReporter;
 
 class BugType {
 private:
-  struct CheckerPartRef {
-    const CheckerBase *Checker;
-    CheckerPartIdx Idx;
-  };
-  using CheckerNameInfo = std::variant<CheckerNameRef, CheckerPartRef>;
+  using CheckerNameInfo = std::variant<CheckerNameRef, const CheckerFrontend *>;
 
   const CheckerNameInfo CheckerName;
   const std::string Description;
@@ -43,7 +39,7 @@ class BugType {
 public:
   // Straightforward constructor where the checker name is specified directly.
   // TODO: As far as I know all applications of this constructor involve ugly
-  // hacks that could be avoided by switching to a different constructor.
+  // hacks that could be avoided by switching to the other constructor.
   // When those are all eliminated, this constructor should be removed to
   // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
@@ -52,18 +48,11 @@ class BugType {
         SuppressOnSink(SuppressOnSink) {}
   // Constructor that can be called from the constructor of a checker object.
   // At that point the checker name is not yet available, but we can save a
-  // pointer to the checker and later use that to query the name.
-  BugType(const CheckerBase *Checker, StringRef Desc,
+  // pointer to the checker and use that to query the name.
+  BugType(const CheckerFrontend *CF, StringRef Desc,
           StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-      : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
-        Category(Cat), SuppressOnSink(SuppressOnSink) {}
-  // Constructor that can be called from the constructor of a checker object
-  // when it has multiple parts with separate names. We save the name and the
-  // part index to be able to query the name of that part later.
-  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
-          StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-      : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
-        Category(Cat), SuppressOnSink(SuppressOnSink) {}
+      : CheckerName(CF), Description(Desc), Category(Cat),
+        SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
@@ -72,8 +61,7 @@ class BugType {
     if (const auto *CNR = std::get_if<CheckerNameRef>(&CheckerName))
       return *CNR;
 
-    auto [Checker, Idx] = std::get<CheckerPartRef>(CheckerName);
-    return Checker->getName(Idx);
+    return std::get<const CheckerFrontend *>(CheckerName)->getName();
   }
 
   /// isSuppressOnSink - Returns true if bug reports associated with this bug
@@ -82,6 +70,24 @@ class BugType {
   bool isSuppressOnSink() const { return SuppressOnSink; }
 };
 
+/// Trivial convenience class for the common case when a certain checker
+/// frontend always uses the same bug type. This way instead of writing
+/// ```
+///   CheckerFrontend LongCheckerFrontendName;
+///   BugType LongCheckerFrontendNameBug{LongCheckerFrontendName, "..."};
+/// ```
+/// we can use `CheckerFrontendWithBugType LongCheckerFrontendName{"..."}`.
+class CheckerFrontendWithBugType : public CheckerFrontend {
+  BugType BT;
+
+public:
+  CheckerFrontendWithBugType(StringRef Desc,
+                             StringRef Cat = categories::LogicError,
+                             bool SuppressOnSink = false)
+      : BT(this, Desc, Cat, SuppressOnSink) {}
+  const BugType &getBT() const { return BT; }
+};
+
 } // namespace ento
 
 } // end clang namespace
diff --git a/clang/include/clang/StaticAnalyzer/Core/Checker.h b/clang/include/clang/StaticAnalyzer/Core/Checker.h
index a54c5bee612f6..45b0398f3aca5 100644
--- a/clang/include/clang/StaticAnalyzer/Core/Checker.h
+++ b/clang/include/clang/StaticAnalyzer/Core/Checker.h
@@ -484,83 +484,87 @@ class Call {
 
 } // end eval namespace
 
-class CheckerBase : public ProgramPointTag {
-  /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
-  /// multiple user-facing checkers that have separate names and can be enabled
-  /// separately, but are backed by the same singleton checker object.
-  SmallVector<std::optional<CheckerNameRef>, 1> RegisteredNames;
-
-  friend class ::clang::ento::CheckerManager;
+/// A `CheckerFrontend` instance is what the user recognizes as "one checker":
+/// it has a public canonical name (injected from the `CheckerManager`), can be
+/// enabled or disabled, can have associated checker options and can be printed
+/// as the "source" of bug reports.
+/// The singleton instance of a simple `Checker<...>` is-a `CheckerFrontend`
+/// (for historical reasons, to preserve old straightforward code), while the
+/// singleton instance of a `CheckerFamily<...>` class owns multiple
+/// `CheckerFrontend` instances as data members.
+/// Modeling checkers that are hidden from the user but can be enabled or
+/// disabled separately (as dependencies of other checkers) are also considered
+/// to be `CheckerFrontend`s.
+class CheckerFrontend {
+  /// The `Name` is nullopt if and only if the checker is disabled.
+  std::optional<CheckerNameRef> Name;
 
 public:
-  CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
-    assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
-    std::optional<CheckerNameRef> Name = RegisteredNames[Idx];
-    assert(Name && "Requested checker part is not registered!");
-    return *Name;
-  }
-
-  bool isPartEnabled(CheckerPartIdx Idx) const {
-    return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
-  }
-
-  void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
-    // Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
-    assert(Idx < 256 && "Checker part identifiers should be small integers.");
-
-    if (Idx >= RegisteredNames.size())
-      RegisteredNames.resize(Idx + 1, std::nullopt);
-
-    assert(!RegisteredNames[Idx] && "Repeated registration of checker a part!");
-    RegisteredNames[Idx] = Name;
-  }
-
-  StringRef getTagDescription() const override {
-    // When the ExplodedGraph is dumped for debugging (in DOT format), this
-    // method is called to attach a description to nodes created by this
-    // checker _class_. Ideally this should be recognizable identifier of the
-    // whole class, but for this debugging purpose it's sufficient to use the
-    // name of the first registered checker part.
-    for (const auto &OptName : RegisteredNames)
-      if (OptName)
-        return *OptName;
-
-    return "Unregistered checker";
+  void enable(CheckerManager &Mgr) {
+    assert(!Name && "Checker part registered twice!");
+    Name = Mgr.getCurrentCheckerName();
   }
+  bool isEnabled() const { return static_cast<bool>(Name); }
+  CheckerNameRef getName() const { return *Name; }
+};
 
+/// `CheckerBackend` is an abstract base class that serves as the common
+/// ancestor of all the `Checker<...>` and `CheckerFamily<...>` classes which
+/// can create `ExplodedNode`s (by acting as a `ProgramPointTag`) and can be
+/// registered to handle various checker callbacks. (Moreover the debug
+/// callback `printState` is also introduced here.)
+class CheckerBackend : public ProgramPointTag {
+public:
   /// Debug state dump callback, see CheckerManager::runCheckersForPrintState.
   /// Default implementation does nothing.
   virtual void printState(raw_ostream &Out, ProgramStateRef State,
                           const char *NL, const char *Sep) const;
 };
 
-/// Dump checker name to stream.
-raw_ostream& operator<<(raw_ostream &Out, const CheckerBase &Checker);
-
-/// Tag that can use a checker name as a message provider
-/// (see SimpleProgramPointTag).
-class CheckerProgramPointTag : public SimpleProgramPointTag {
+/// The non-templated common ancestor of all the simple `Checker<...>` classes.
+class CheckerBase : public CheckerFrontend, public CheckerBackend {
 public:
-  CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
-  CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+  /// Attached to nodes created by this checker class when the ExplodedGraph is
+  /// dumped for debugging.
+  StringRef getTagDescription() const override;
 };
 
-template <typename CHECK1, typename... CHECKs>
-class Checker : public CHECK1, public CHECKs..., public CheckerBase {
+// Template magic to implement the static method `_register()` which registers
+// the `Checker` or `CheckerFamily` for all the implemented callbacks.
+template <typename CHECKER, typename CHECK1, typename... CHECKs>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {
+  CHECK1::_register(Chk, Mgr);
+  registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
+}
+
+template <typename CHECKER>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {}
+
+/// Simple checker classes that implement one frontend (i.e. checker name)
+/// should derive from this template and specify all the implemented callbacks
+/// (i.e. classes like `check::PreStmt` or `eval::Call`) as template arguments
+/// of `Checker`.
+template <typename... CHECKs>
+class Checker : public CheckerBase, public CHECKs... {
 public:
   template <typename CHECKER>
-  static void _register(CHECKER *checker, CheckerManager &mgr) {
-    CHECK1::_register(checker, mgr);
-    Checker<CHECKs...>::_register(checker, mgr);
+  static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+    registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
   }
 };
 
-template <typename CHECK1>
-class Checker<CHECK1> : public CHECK1, public CheckerBase {
+/// Checker families (where a single backend class implements multiple related
+/// frontends) should derive from this template, specify all the implemented
+/// callbacks (i.e. classes like `check::PreStmt` or `eval::Call`) as template
+/// arguments of `FamilyChecker` and implement the pure virtual method
+/// `StringRef getTagDescription()` which is inherited from `ProgramPointTag`
+/// and should return a string identifying the class for debugging purposes.
+template <typename... CHECKs>
+class CheckerFamily : public CheckerBackend, public CHECKs... {
 public:
   template <typename CHECKER>
-  static void _register(CHECKER *checker, CheckerManager &mgr) {
-    CHECK1::_register(checker, mgr);
+  static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+    registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
   }
 };
 
@@ -581,6 +585,20 @@ class EventDispatcher {
   }
 };
 
+/// Tag that can use a checker name as a message provider
+/// (see SimpleProgramPointTag).
+/// FIXME: This is a cargo cult class which is copied into several checkers but
+/// does not provide anything useful.
+/// The only added functionality provided by this class (compared to
+/// SimpleProgramPointTag) is that it composes the tag description string from
+/// two arguments -- but tag descriptions only appear in debug output so there
+/// is no reason to bother with this.
+class CheckerProgramPointTag : public SimpleProgramPointTag {
+public:
+  CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
+  CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+};
+
 /// We dereferenced a location that may be null.
 struct ImplicitNullDerefEvent {
   SVal Location;
diff --git a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
index 03ffadd346d0b..8fa122f004bfe 100644
--- a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
@@ -39,7 +39,8 @@ class AnalysisManager;
 class CXXAllocatorCall;
 class BugReporter;
 class CallEvent;
-class CheckerBase;
+class CheckerFrontend;
+class CheckerBackend;
 class CheckerContext;
 class CheckerRegistry;
 struct CheckerRegistryData;
@@ -64,9 +65,9 @@ class CheckerFn<RET(Ps...)> {
   Func Fn;
 
 public:
-  CheckerBase *Checker;
+  CheckerBackend *Checker;
 
-  CheckerFn(CheckerBase *checker, Func fn) : Fn(fn), Checker(checker) {}
+  CheckerFn(CheckerBackend *checker, Func fn) : Fn(fn), Checker(checker) {}
 
   RET operator()(Ps... ps) const {
     return Fn(Checker, ps...);
@@ -116,19 +117,6 @@ class CheckerNameRef {
   operator StringRef() const { return Name; }
 };
 
-/// A single checker class (and its singleton instance) can act as the
-/// implementation of several (user-facing or modeling) checker parts that
-/// have shared state and logic, but have their own names and can be enabled or
-/// disabled separately.
-/// Each checker class that implement multiple parts introduces its own enum
-/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
-/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
-using CheckerPartIdx = unsigned;
-
-/// If a checker doesn't have multiple parts, then its single part is
-/// represented by this index.
-constexpr inline CheckerPartIdx DefaultPart = 0;
-
 enum class ObjCMessageVisitKind {
   Pre,
   Post,
@@ -193,14 +181,7 @@ class CheckerManager {
 
   /// Emits an error through a DiagnosticsEngine about an invalid user supplied
   /// checker option value.
-  void reportInvalidCheckerOptionValue(const CheckerBase *C,
-                                       StringRef OptionName,
-                                       StringRef ExpectedValueDesc) const {
-    reportInvalidCheckerOptionValue(C, DefaultPart, OptionName,
-                                    ExpectedValueDesc);
-  }
-
-  void reportInvalidCheckerOptionValue(const CheckerBase *C, CheckerPartIdx Idx,
+  void reportInvalidCheckerOptionValue(const CheckerFrontend *CP,
                                        StringRef OptionName,
                                        StringRef ExpectedValueDesc) const;
 
@@ -210,28 +191,15 @@ class CheckerManager {
   // Checker registration.
   //===--------------------------------------------------------------------===//
 
-  /// Construct the singleton instance of a checker, register it for the
-  /// supported callbacks and record its name with `registerCheckerPart()`.
-  /// Arguments passed to this function are forwarded to the constructor of the
-  /// checker.
-  ///
-  /// If `CHECKER` has multiple parts, then the constructor call and the
-  /// callback registration only happen within the first `registerChecker()`
-  /// call; while the subsequent calls only enable additional parts of the
-  /// existing checker object (while registering their names).
-  ///
-  /// \returns a pointer to the checker object.
-  template <typename CHECKER, CheckerPartIdx Idx = DefaultPart, typename... AT>
-  CHECKER *registerChecker(AT &&...Args) {
-    // This assert could be removed but then we need to make sure that calls
-    // registering different parts of the same checker pass the same arguments.
-    static_assert(
-        Idx == DefaultPart || !sizeof...(AT),
-        "Argument forwarding isn't supported with multi-part checkers!");
-
+  /// If the the singleton instance of a checker class is not yet constructed,
+  /// then construct it (with the supplied arguments), register it for the
+  /// callbacks that are supported by it, and return it. Otherwise, just return
+  /// a pointer to the existing instance.
+  template <typename CHECKER, typename... AT>
+  CHECKER *getChecker(AT &&...Args) {
     CheckerTag Tag = getTag<CHECKER>();
 
-    std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
+    std::unique_ptr<CheckerBackend> &Ref = CheckerTags[Tag];
     if (!Ref) {
       std::unique_ptr<CHECKER> Checker =
           std::make_unique<CHECKER>(std::forward<AT>(Args)...);
@@ -239,18 +207,18 @@ class CheckerManager {
       Ref = std::move(Checker);
     }
 
-    CHECKER *Result = static_cast<CHECKER *>(Ref.get());
-    Result->registerCheckerPart(Idx, CurrentCheckerName);
-    return Result;
+    return static_cast<CHECKER *>(Ref.get());
   }
 
-  template <typename CHECKER>
-  CHECKER *getChecker() {
-    CheckerTag Tag = getTag<CHECKER>();
-    std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
-    assert(Ref && "Requested checker is not registered! Maybe you should add it"
-                  " as a dependency in Checkers.td?");
-    return static_cast<CHECKER *>(Ref.get());
+  /// Register a single-part checker (derived from `Checker`): construct its
+  /// singleton instance, register it for the supported callbacks and record
+  /// its name (with `CheckerFrontend::enable`). Calling this multiple times
+  /// triggers an assertion failure.
+  template <typename CHECKER, typename... AT>
+  CHECKER *registerChecker(AT &&...Args) {
+    CHECKER *Chk = getChecker<CHECKER>(std::forward<AT>(Args)...);
+    Chk->enable(*this);
+    return Chk;
   }
 
   template <typename CHECKER> bool isRegisteredChecker() {
@@ -482,7 +450,7 @@ class CheckerManager {
   /// Run checkers for debug-printing a ProgramState.
   ///
   /// Unlike most other callbacks, any checker can simply implement the virtual
-  /// method CheckerBase::printState if it has custom data to print.
+  /// method CheckerBackend::printState if it has custom data to print.
   ///
   /// \param Out   The output stream
   /// \param State The state being printed
@@ -651,7 +619,7 @@ class CheckerManager {
   template <typename T>
   static void *getTag() { static int tag; return &tag; }
 
-  llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBase>> CheckerTags;
+  llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBackend>> CheckerTags;
 
   struct DeclCheckerInfo {
     CheckDeclFunc CheckFn;
diff --git a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
index 3dd57732305b2..7672c63a646e4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
@@ -25,7 +25,7 @@ using namespace ento;
 using namespace taint;
 
 namespace {
-class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
+class DivZeroChecker : public CheckerFamily<check::PreStmt<BinaryOperator>> {
   void reportBug(StringRef Msg, ProgramStateRef StateZero,
                  CheckerContext &C) const;
   void reportTaintBug(StringRef Msg, ProgramStateRef StateZero,
@@ -33,17 +33,15 @@ class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
                       llvm::ArrayRef<SymbolRef> TaintedSym...
[truncated]

clang/include/clang/StaticAnalyzer/Core/Checker.h

steakhal

I think this one is a really good direction. I like it.
I left a couple minor remarks inline.

clang/include/clang/StaticAnalyzer/Core/Checker.h

clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp

clang/lib/StaticAnalyzer/Checkers/VirtualCallChecker.cpp

clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp

clang/lib/StaticAnalyzer/Core/BugReporter.cpp

clang/lib/StaticAnalyzer/Core/CheckerManager.cpp

clang/include/clang/StaticAnalyzer/Core/CheckerManager.h

NagyDonat · 2025-05-13T14:07:13Z

I handled all the inline comments.

I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of CheckerFamily<> override getTagDescription() and return their class name as a hardcoded string literal. I think this approach is acceptable, and if we keep it then I'll add a comment in CheckerFamily which says that subclasses must override getTagDescription() this way. However, I also feel that there might be a better solution -- @steakhal what do you think?

steakhal · 2025-05-13T16:55:01Z

I handled all the inline comments.

I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of CheckerFamily<> override getTagDescription() and return their class name as a hardcoded string literal. I think this approach is acceptable, and if we keep it then I'll add a comment in CheckerFamily which says that subclasses must override getTagDescription() this way. However, I also feel that there might be a better solution -- @steakhal what do you think?

I'll think about this, but overall I'd prefer simplicity for defining Checkers - which may make magic behind the doors slightly more involved.

NagyDonat · 2025-05-15T15:27:43Z

I prototyped a "get the name of template argument as string" solution which is sufficient for our use cases (where the checker family is a non-templated class type) and works on all supported versions of all three compilers (GCC, clang, MSVC) that are supported for LLVM compilation:
https://godbolt.org/z/8b9YGfvdG

#include <iostream>
#include <string>

class MyFancyCheckerFamily {};

template <class T>
std::string getTypeName() {
#ifndef _MSC_VER
    std::string res = __PRETTY_FUNCTION__;
    int start = res.find('='), end = res.find(';');
    if (end == -1)
        end = res.find(']');
    if (start == -1 || end == -1)
        return "unknown-checker-family"; // paranoia, should not happen
    return res.substr(start + 2, end - start - 2);
#else
    std::string res = __FUNCSIG__;
    int start = res.find("<class "), end = res.find('>');
    if (start == -1 || end == -1)
        return "unknown-checker-family"; // paranoia, should not happen
    return res.substr(start + 7, end - start - 7);
#endif
}

int main() {
  std::cout << getTypeName<MyFancyCheckerFamily>() << std::endl;
  return 0;
}

If I invoke this machinery from the registration functions of the checker families (where each checker type is available as a template argument), then it can "magically" ensure that getTagDescription() (i.e. the debug name) returns the class name of the checker family as a string.

This is a bit ugly, because there is no standard way to stringify a type, so I need to rely on the macros __PRETTY_FUNCTION__ (GCC and clang with slightly different content) / __FUNCSIG__ (MSVC), but I don't think that there is a cleaner solution that doesn't burden the classes derived from CheckerFamily with any boilerplate (like overriding getTagDescription()).

(By the way, note that "simple" single-part Checkers already get their debug name automatically, because it can be equal to their user-facing name; so this question only affects the CheckerFamilyes.)

@steakhal (or anybody else) What do you think about this?

steakhal · 2025-05-15T15:34:10Z

What do you think about this?

I don't like this. LLVM is used as a library for wide set of tools and compiled with an even wider set of compilers.
I know that LLVM advertises a couple of compilers that are supported as host toolchains, but I'd say that's likely not a complete set.

As a user, I wouldn't mind a solution like this, but as a library vendor, I'd vote very much against this.
There must be some other way. I'll try to squeeze some free time exploring this to help you.

NagyDonat · 2025-05-15T15:40:19Z

🤔 These strings are only relevant for debugging, so this whole machinery could be limited to debug builds (or builds with an off-by-default flag). That way these debug names would be still available for the very specific situation when somebody is debugging the analyzer, but wouldn't cause problems for anybody else.

However, I'm also open to other ideas and I'd be happy to accept a better solution if you can find one.

I hope that I can merge the CheckerFamily framework in the foreseeable future (within a week?), because then I would be able to continue with the follow-up changes: (1) converting the existing multipart checkers to this framework and then (2) introducing checker family logic in a few other checkers (e.g. splitting off the taint-related part of security.ArrayBound into a separate part as Dániel Krupp did with the other optin.taint checkers).

steakhal · 2025-05-15T17:57:34Z

I think I can't push to your branch. So here is what I'd do as git format-patch:

0001-NFC-Rename-getTagDescription-to-getDebugName.patch.txt
0002-HACK-Smuggle-in-the-checker-name-into-the-CheckerFam.patch.txt

NagyDonat · 2025-05-16T08:39:04Z

Thanks for the patches! I originally thought that renaming getTagDescription -> getDebugName would be a separate follow-up commit but I see that it's a surprisingly small change (that function was barely used) so I'm happy to include it here.

I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:

this muddies the distinction between the CheckerFamily and one of the owned CheckerFrontends (may be a bit confusing for those who are unfamiliar with the analyzer code);
with this solution the same CheckerFamily may get a different debug name if different sub-checkers are enabled from it (in that particular run).

However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the __PRETTY_FUNCTION__ hack for using the class name is ugly.

I also thought about tweaking CheckerFamily::getDebugName to produce results like family-of-core.DivideZero for checker families (to highlight that it is connected to a checker family and not the particular checker frontend). What would you think about this?

steakhal · 2025-05-16T14:24:03Z

Thanks for the patches! I originally thought that renaming getTagDescription -> getDebugName would be a separate follow-up commit but I see that it's a surprisingly small change (that function was barely used) so I'm happy to include it here.

I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:
* this muddies the distinction between the `CheckerFamily` and one of the owned `CheckerFrontend`s (may be a bit confusing for those who are unfamiliar with the analyzer code);

* with this solution the same `CheckerFamily` may get a different debug name if different sub-checkers are enabled from it (in that particular run).
However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the __PRETTY_FUNCTION__ hack for using the class name is ugly.

I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this. Frankly, I would have preferred if the internal checker class name would be printed there instead of the user-facing name of the checker - but given that I despise defining this for every checker as a mandatory virtual function for checker families, I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.

I also thought about tweaking CheckerFamily::getDebugName to produce results like family-of-core.DivideZero for checker families (to highlight that it is connected to a checker family and not the particular checker frontend). What would you think about this?

I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered.

NagyDonat · 2025-05-16T15:16:27Z

I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this.

You're correct wrt the current state of the code but I'm planning to get rid of many "backend/modeling" checkers that have no real purpose 😅 -- in my opinion having a superfluous checker definition in Checkers.td is even worse than having a boilerplate getDebugName override.

There are a few modeling checkers that fulfill a valid role because they act as dependencies of checkers defined in other files/classes, but I want to eliminate the pattern where a checker class defines a modeling checker which is the dependency of the other "real" checkers from the class (to be registered first) and does nothing useful.

I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.

Sorry for adding time pressure...

I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered.

I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?)

Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a family-of- prefix to the debug names, which is a bit awkward but eliminates the ambiguity and the potential for confusion (somebody who briefly looks at some debug output will immediately see that the name identifies the family).

steakhal · 2025-05-16T19:39:43Z

I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?)

Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a family-of- prefix to the debug names, which is a bit awkward but eliminates the ambiguity and the potential for confusion (somebody who briefly looks at some debug output will immediately see that the name identifies the family).

I'd recommend you to look at the Checkers.inc file in the build folder somewhere - the file that gets generated from the Checkers.td. Notice that every checker has a registerXXX and shouldRegisterXXX function where XXX is the verbatim spelling of the checker class implementing it. We should just expose XXX just like we currently expose the similar CheckerManager::getCurrentCheckerName() (I haven't checked this, its just a strong suspicion). We could call this getCurrentCheckerClassName or something along these lines.
This should be 5-10 lines of change approximately.

Just have a look at Checkers.inc and you will see what I'm talking about.

And thanks again for sticking around. I can understand if this thread already got far longer than it should have taken.
I admit I failed to give enough attention to this thread, and now I'm pushing back because I believe the desired solution is right on the corner.

NagyDonat · 2025-05-17T10:04:13Z

I'd recommend you to look at the Checkers.inc file in the build folder somewhere - the file that gets generated from the Checkers.td. Notice that every checker has a registerXXX and shouldRegisterXXX function where XXX is the verbatim spelling of the checker class implementing it. We should just expose XXX just like we currently expose the similar CheckerManager::getCurrentCheckerName() (I haven't checked this, its just a strong suspicion). We could call this getCurrentCheckerClassName or something along these lines. This should be 5-10 lines of change approximately.

Just have a look at Checkers.inc and you will see what I'm talking about.

For a moment I was very happy to see this solution, but unfortunately XXX is not the name of the checker class (= backend = full family), but instead it is just yet another kind of "internal" name that is assigned to each individual checker (= frontend = one checker from the family).

You're right that for the simple single-part checkers this kind of name usually coincides with the checker class name, but it is still not a single name for the full checker family. For example consider the class DivZeroChecker which implements core.DivideZero and optin.taint.TaintedDiv -- this is introduced in Checkers.td as

let ParentPackage = Core in {
//...
def DivZeroChecker : Checker<"DivideZero">,
  HelpText<"Check for division by zero">,  
  Documentation<HasDocumentation>;      
//...
}
let ParentPackage = TaintOptIn in {
//...
 def TaintedDivChecker: Checker<"TaintedDiv">,                     
   HelpText<"Check for divisions where the denominator is tainted "
            "(attacker controlled) and might be 0.">,              
   Dependencies<[TaintPropagationChecker]>,                        
   Documentation<HasDocumentation>;
//...                           
}

In this particular case using "debug name = internal name of the first checker part" would work (and it is a core checker so we can say that it won't be disabled), but if we have a checker family where all the parts are non-core checkers, then this solution cannot assign a single stable name to the whole family.

And thanks again for sticking around. I can understand if this thread already got far longer than it should have taken. I admit I failed to give enough attention to this thread, and now I'm pushing back because I believe the desired solution is right on the corner.

Don't worry about this -- your feedback is valuable (and it's completely understandable that you also need to work on different things) and I agree with you that it is important to implement this framework well even if it takes a bit more time.

NagyDonat · 2025-05-17T15:00:26Z

Accommodating the modeling checkers in the checker family framework

I thought a bit about the place of the modeling checkers in the checker family framework (which is introduced in this PR). This question is a bit architectural, so in theory this post would "belong to" discourse, but I'm posting this here because I don't want to fragment the discussion.

As far as I see currently the modeling checkers can fulfill two purposes:

(1) If a checker class provides a modeling checker, it is a common dependency of the frontend/reporting checkers from the same family; so the registration function of the modeling checker can serve as the canonical place for constructing the singleton instance of the checker class.
- (This is no longer needed in the new checker family framework.)
(2) Some modeling checkers are also dependencies of "external" checkers (i.e. checkers that are not from the same family) and they provided a way to enable the backend without enabling any of the frontend/reporting checkers.
- (We will still need something like this in the new framework.)

Unfortunately the modeling checkers were introduced gradually in an ad hoc fashion, so this architecture has several drawbacks:

adding a modeling checker requires verbose boilerplate in Checkers.td and the checker source file;
there are checker families that don't have an associated modeling checker;
modeling checkers were introduced as "like a regular checker, but hidden", so they have vestigial remains of user-facing features (e.g. they have a user-friendly name, they can have checker options) that are no longer relevant for them.

"Default" plan

In my opinion this is the best and least disruptive way for handling the modeling checkers in the "checker family" framework:

Modeling checkers that are only relevant for purpose (1) are simply removed from the codebase. They are just implementation details of the old checker part registration process and the new framework makes them redundant.
The few modeling checkers that are also relevant for purpose (2) "survive" without serious changes: they become CheckerFrontends within their CheckerFamily and other checkers can specify them as dependencies (but they are hidden from the user).
The format of Checkers.td stays the same -- only specifies data about the checker frontends, it is not aware of checker families and the "from the same family" relationships.

This approach avoids disruptive changes and eliminates a significant amount of boilerplate code (all the superfluous checker families).

Alternative plan

We rename CheckerFrontend to ReportingChecker and CheckerBackend to ModelingChecker.
Extend the format of Checkers.td to support three kinds of checker entities:
- ModelingCheckers that have a debug name and can be activated as dependencies (and probably they can also have dependencies);
- ReportingCheckers that have a user-facing name, can be enabled or disabled by the user, can have checker options and depend on one or more ModelingCheckers;
- Checkers which are just a shortcut for introducing a ModelingChecker and a single ReportingChecker that depends on it.
Adding a new plain checker is mostly unchanged: it is implemented as a subclass of Checker<...> and it's specified in Checkers.td as a Checker object.
- In this architecture even these simple checkers have separate modeling and reporting "parts" (at least in theory), so they should support activating their modeling part (as a dependency) without enabling their reporting part -- but this could be implemented within the engine if we say that bug reports created by a disabled ReportingChecker are automatically suppressed (treated as invisible sink nodes).
Adding a new checker family requires a bit more boilerplate compared to the default plan: in addition to implementing it (as a subclass of CheckerFamily<...>) and describing its sub-checkers with ReportingChecker entries in Checkers.td, the developer also needs to create a ModelingChecker entry in Checkers.td (which specifies the debug name of the family and acts as a common dependency of the ReportingCheckers).
Limitation: this doesn't support having two distinct modeling checkers within the same implementation class.

This approach would be more disruptive (we'd need to change the parsing of Checkers.td and the way of handling dependencies) and and increase the amount of boilerplate (instead of the "each checker family needs a getDebugName method" requirement we'd get the requirement that "each checker family needs its own ModelingChecker entry in Checkers.td"), but would clarify the distinction between modeling and reporting checkers.

@steakhal @ anybody else What do you think about this question? How would you integrate modeling checkers into the checker family framework?

steakhal · 2025-05-18T12:00:51Z

For a moment I was very happy to see this solution, but unfortunately XXX is not the name of the checker class (= backend = full family), but instead it is just yet another kind of "internal" name that is assigned to each individual checker (= frontend = one checker from the family).

You're right that for the simple single-part checkers this kind of name usually coincides with the checker class name, but it is still not a single name for the full checker family. For example consider the class DivZeroChecker which implements core.DivideZero and optin.taint.TaintedDiv -- this is introduced in Checkers.td as
let ParentPackage = Core in {
//...
def DivZeroChecker : Checker<"DivideZero">,
  HelpText<"Check for division by zero">,  
  Documentation<HasDocumentation>;      
//...
}
let ParentPackage = TaintOptIn in {
//...
 def TaintedDivChecker: Checker<"TaintedDiv">,                     
   HelpText<"Check for divisions where the denominator is tainted "
            "(attacker controlled) and might be 0.">,              
   Dependencies<[TaintPropagationChecker]>,                        
   Documentation<HasDocumentation>;
//...                           
}
In this particular case using "debug name = internal name of the first checker part" would work (and it is a core checker so we can say that it won't be disabled), but if we have a checker family where all the parts are non-core checkers, then this solution cannot assign a single stable name to the whole family.

To me what is important with this debug name is to be able to easily connect it with an implementation file to look at.
Given that many times the checker name matches this debug name, and that is also spelled at least partially in the file name (even if this isn't quite true all the cases), its a much better choice than hard-coding something.
This is the reason why I wanted to avoid "magic" prefixing it with "family-of-XXX" because at that point we would lose the ability to directly grep for this exact string and expect a match. This is a weak argument though because at this point we can expect that devs would be able to find the right checker even with the family prefix. This is why I don't have hard feelings about having this or not having this prefix. I'm good either way.

This debug name should be the XXX part of a registerXXX fn generated for each checker (let it be modeling or reporting), and if it's a checker family, then the debug name of all the "parts" should be the name of the family. This would directly point to what implementation to look at no matter how the checker is surfaced to the users, under what aliases or what have we.
In what I proposed as a quick demonstration is the user facing checker name, so that's not quite aligned with what I'd want to ideally see.

Reflecting on your proposal:
First, let's align on why we currently have the concept of "modeling" and "reporting" checkers. I think you are probably already aware of this, but let's clarify this.

It's a great property if the exploded graph remains sort of the same no matter what checkers are enabled. It makes it easier to reason about it and when and why we have sink nodes, affecting what paths explored etc.
In the past (and also still today AFAIK in some cases) checkers checked some property and reported a fatal error node. So there was a sink node if the checker was enabled, but as soon as it was disabled, the sink node was missing, discovering different issues in other checkers down the line in the exploded graph.
To solve this, was the effort of splitting the checkers into modeling and reporting and emit the sink node regardless. This way only the user-facing diagnostics would be different, but the underlying egraph would remain the same regardless if we enable or disable a checker. (This statement isn't quite true, but you get the principle).

So the problem was that we had to hack together multiple "checker frontends" and conditionally report a fatal bug or a sink node on an error.
In theory, a "checker frontend" should only ever depend on a set of "checker backends" aka. "modeling checkers".
I wish we had some better way of doing this. I think we are now better than ever to achieve this that we tie together the BugType and a checker frontend that we can query.

I reread both outlines solutions and both seemed better than what we have today. Neither are for free, but the second resonated more with me, and was aligned with my expectations. This makes me question why we want to improve the ergonomics of defining and organizing checkers? Is this the most important pain point we want to focus on right now?

To me btw, this checker families and what we had before with raw bool flags are the same thing. It's just now a lot less ugly.
I'll be honest with you, that right now I can't see if the currently more appealing direction with option 2 would be really the better path to take (or something along that line). So I think it's a risky direction to have so many changes on the flight in the subject before consolidating a statusquo first with checker families. After that we may have a better view what's next, if we still believe that the ergonomics is the best place to invest.

NagyDonat · 2025-05-19T12:59:09Z

First, let's align on why we currently have the concept of "modeling" and "reporting" checkers. I think you are probably already aware of this, but let's clarify this.

It's a great property if the exploded graph remains sort of the same no matter what checkers are enabled. It makes it easier to reason about it and when and why we have sink nodes, affecting what paths explored etc.
In the past (and also still today AFAIK in some cases) checkers checked some property and reported a fatal error node. So there was a sink node if the checker was enabled, but as soon as it was disabled, the sink node was missing, discovering different issues in other checkers down the line in the exploded graph.
To solve this, was the effort of splitting the checkers into modeling and reporting and emit the sink node regardless. This way only the user-facing diagnostics would be different, but the underlying egraph would remain the same regardless if we enable or disable a checker. (This statement isn't quite true, but you get the principle).

I'm aware of this goal that a checker should emit sinks even if the reports are suppressed, and I agree that this would be useful as the effect of disabling a "well-behaving", trustworthy checker. However, I would guess that checkers are usually disabled when they mismodel the analyzed code and produce crazy garbage (or e.g. it's an optin checker checking that enforces a design rule which is intentionally not followed), and in those situations I think the user needs a tool that is more drastic than just replacing the reports with sinks.

So the problem was that we had to hack together multiple "checker frontends" and conditionally report a fatal bug or a sink node on an error.
In theory, a "checker frontend" should only ever depend on a set of "checker backends" aka. "modeling checkers".
I wish we had some better way of doing this.

I agree and I hope that the checker family framework will provide a way towards implementing this.

I think we are now better than ever to achieve this that we tie together the BugType and a checker frontend that we can query.

Actually I introduced the class CheckerFrontendWithBugType only as an ad-hoc, invisible helper that I intend to use to write slightly shorter code in situations where it applies. In more complex checker families where a single frontend can have multiple bug types, I intend to define multiple BugTypes that each take the same single CheckerFrontend as their constructor argument. (But if the transition to the new framework is complete, we can eliminate the BugType constructor that takes a CheckerName, so each BugType will hold a pointer to a CheckerFrontend.)

This makes me question why we want to improve the ergonomics of defining and organizing checkers? Is this the most important pain point we want to focus on right now?

No, this is not the most important pain point -- this was just annoying me personally, so I tried to fix it quickly and got carried away... I wouldn't say that this is completely useless (we will be able to build new functionality onto the new framework in a way that wouldn't be possible with the old swamp of hacks), but this isn't an urgent project.

To me btw, this checker families and what we had before with raw bool flags are the same thing. It's just now a lot less ugly.

I agree, this is after all a non-functional change -- I'm proposing it because I got fed up with ugly code.

I'll be honest with you, that right now I can't see if the currently more appealing direction with option 2 would be really the better path to take (or something along that line). So I think it's a risky direction to have so many changes on the flight in the subject before consolidating a statusquo first with checker families. After that we may have a better view what's next, if we still believe that the ergonomics is the best place to invest.

I agree here as well. When I started to describe the "Default" and "Alternative" plans I initially felt that we need to pick between them right now; but as I thought about it more, it seems that they are not opposed and in fact the best path for eventually implementing the "Alternative" plan (which is indeed more appealing intuitively) would be first implementing the "Default" plan (which cleans up lots of hacks, but leaves the modeling checkers unchanged) and then later introducing the "Alternative" plan if we feel that we need it.

Based on this I'll quickly incorporate your patches and the "use the internal classname-like checker name" logic onto the branch, and then I think we can merge it as a good enough implementation of the Default plan framework.

This fragment is often the same as the class name.

NagyDonat · 2025-05-19T16:14:26Z

In commit 13f4a30 I implemented your suggestion that the debug name of the checker name should be derived from the name fragment of the registration functions (which is called CLASS but may be different from the checker class name), but this change had to touch yet another technically indebted area – the interface provided for registering checkers from analyzer plugins – and became surprisingly ugly.

I pushed this commit mostly to show its convoluted nature, I would strongly prefer rejecting it, because it does not solve the fundamental issue (namely, that the debug name of the checker family is something "taken from" one subchecker and will change depending on which subcheckers are registered) and introduces several additional complications:

a few already complex methods become even more complicated because they need to forward yet another argument;
checkers that are registered from plugins won't have this kind of debug name;
this way the debug name of the checker will look like the class name, but can be something else, which is IMO worse than something that is consistently different.

If I sort the possible implementations of CheckerFamily::getDebugName in order of increasing ugliness, then I get something like:

(least ugly) leave it pure virtual and override it in each of the two dozen checker families.
- Provides a nice proper identifier (the class name).
- Boilerplate is boring, but not as bad as hacky or convoluted code.
- There is ample precedent for similar amounts of boilerplate (e.g. the checker registration functions) and this would be much less boilerplate than the old way of introducing multipart checkers (with all the mutable std::unique_ptr<BugType>s).
- Would just add 20-30 oneline functions, which isn't significantly more than a "general" solution.
- Can be easily replaced by a different solution later if we find a good one (e.g. after reforming the structure of Checkers.td).
Use the class name in assertion-enabled builds on GCC, LLVM (from __PRETTY_FUNCTION__) and MSVC (from __FUNCSIG__), and default to a filler ("UnknownCheckerFamily") on production builds and unknown compilers.
- Provides a nice proper identifier (the class name).
- Requires hacky magical code, but can be hidden away in a separate header and won't complicate other stuff.
- Works out of the box for practically all developers of the static analyzer.
- Won't break the build of LLVM/clang elsewhere.
Use the CLASS field of the checker specifications in Checkers.td, but ensure that it is always identical to the class name (and the system can handle multiple checkers with the same CLASS).
- Provides a nice proper identifier (the class name).
- Requires updating the information in Checkers.td.
- Requires touching the fragile "register checkers from analyzer plugins" logic (as in 13f4a30)
- Currently the naming scheme of the checker registration function relies on the fact that the CLASS field is a unique identifier that distinguishes multiple checkers from the same class -- we would need to devise a different solution for this.
Use "family-of-" + name of first registered subchecker, i.e. family-of-core.DivideZero.
- Ugly, but not misleading identifiers: it is clear that this identifies a family, and the checker name is searchable.
- Easy to implement.
- "Identifier" is not stable: depends on the set of currently registered subcheckers (but at least its semantical meaning stays the same).
- Format may be bikeshedded: I'm also ok with family:core.DivideZero or something similar.
Use name of first registered subchecker, i.e. plain core.DivideZero.
- Misleading, developers would naturally assume that it "obviously" refers to the single checker part.
- Very easy to implement, shortest code that actually provides an identifier.
- "Identifier" is not stable: depends on the set of currently registered subcheckers.
Use the CLASS field of the checker specifications in Checkers.td (as currently implemented by 13f4a30)
- Very misleading, looks like a nice proper identifier (the class name) but may be different in rare cases.
- Requires touching the fragile "register checkers from analyzer plugins" logic.
- "Identifier" is not stable: depends on the set of currently registered subcheckers.

@steakhal What do you think about this list?

steakhal

Thank you for the detailed answer to my previous reply.
I think we are completely aligned on that front.

Thank you for accepting my patches, and also for implementing exactly what I had in mind for the debug names.
It's awesome. I don't think it's too complicated or ugly TBH.
I agree it's not nice, but we can either merge this or discuss how else we could do something even better.

I'm happy as this PR looks right now, except for having that backward compatibility overload for plugins.
We can just move on with life and let them migrate once the next clang is out. To me, the upgrade path looks straight forward and this isn't the only API they will need to adjust anyway. They should brace for impact.

Now, to the alternative options you layed out.
(Thanks for thinking about this btw.)
(PS: Sorry about my seemingly unstructured response here to the list. To me, these options are not orthogonal and it makes it difficult to reflect.)

I'm fine with option 6, aka. just merge this PR as-is.

Use the CLASS field of the checker specifications in Checkers.td (as currently implemented by 13f4a30)
Very misleading, looks like a nice proper identifier (the class name) but may be different in rare cases.
Requires touching the fragile "register checkers from analyzer plugins" logic.
"Identifier" is not stable: depends on the set of currently registered subcheckers.

Could you give an example when it's misleading and not refers to a class name? I figured the tablegen construct we have ensures this.
We should be fine touching the plugin registration. This shouldn't be disruptive.
WDYM by that this identifier is not stable? We could have a tablegen property like ImplementedBy<"MyChecker"> for the few checker families we currently have, and say that by default they are implemented by CLASS.

I reject options 1 and 2, as we already discussed.

I agree with your statement in option 3 about the identifier of a class. And I think I already layed out something like this part of the previous section in this reply.

I'm fine with option 4, but I want to avoid this if possible.

I reject option 5, as I find it a buggy implementation of option 6 (or option 4).

steakhal · 2025-05-19T17:18:50Z

clang/include/clang/StaticAnalyzer/Frontend/CheckerRegistry.h

+  /// Adds a checker to the registry. This overload doesn't take a `DebugName`
+  /// (which usually looks like `DivZeroChecker`), so it uses the user-facing
+  /// `FullName` (which usually looks like ``core.DivideZero`) as a debug name.
+  /// THIS IS DEPRECATED and is only provided to preserve compatibility with
+  /// legacy plugins.
+  /// TODO: Eventually remove this from the codebase.


We can break compatibility. In fact, a clang plugin is only compatible with the same major version of clang already.
There is no such thing in Clang as API or ABI compatibility across major releases.
I'd recommend dropping this compatibility overload.

Ok, I will remove this method variant from the codebase (I'll need to update a unit test).

Unfortunately this "no DebugName, we need to pass the FullName issue" also affects the templated variant of the method which is below this one, and that one is used by a dozen unit tests, so modifying it would be bothersome (I would like to skip updating it, but even if I do it perhaps it should be in a separate commit).

Could you point me to some examples? That way we could use collective intelligence and maybe I could figure out a way forward.

It would be straightforward to update the templated variant of addChecker by adding an extra DebugName parameter and updating all the code that calls it – but it is called from 20+ locations, so I felt that it would be bothersome to update them all:

clang/include/clang/StaticAnalyzer/Frontend/CheckerRegistry.h |141 Col 8| void addChecker(StringRef FullName, StringRef Desc, StringRef DocsUri, clang/lib/Analysis/plugins/CheckerDependencyHandling/CheckerDependencyHandling.cpp |20 Col 12| registry.addChecker<Dependency>("example.Dependency", "", ""); clang/lib/Analysis/plugins/CheckerDependencyHandling/CheckerDependencyHandling.cpp |21 Col 12| registry.addChecker<DependendentChecker>("example.DependendentChecker", "", clang/lib/Analysis/plugins/SampleAnalyzer/MainCallChecker.cpp |48 Col 12| registry.addChecker<MainCallChecker>( clang/unittests/StaticAnalyzer/BugReportInterestingnessTest.cpp |122 Col 16| Registry.addChecker<InterestingnessTestChecker>("test.Interestingness", clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp |623 Col 14| Registry.addChecker<CallDescChecker>("test.CallDescChecker", "Description", clang/unittests/StaticAnalyzer/CallEventTest.cpp |58 Col 14| Registry.addChecker<CXXDeallocatorChecker>("test.CXXDeallocator", clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |80 Col 14| Registry.addChecker<ExprEngineVisitPreChecker>("ExprEngineVisitPreChecker", clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |89 Col 14| Registry.addChecker<ExprEngineVisitPostChecker>( clang/unittests/StaticAnalyzer/ExprEngineVisitTest.cpp |98 Col 14| Registry.addChecker<MemAccessChecker>("MemAccessChecker", "Desc", clang/unittests/StaticAnalyzer/ObjcBug-124477.cpp |39 Col 14| Registry.addChecker<FlipFlagOnCheckLocation>("test.FlipFlagOnCheckLocation", clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp |47 Col 14| Registry.addChecker<CustomChecker>("test.CustomChecker", "Description", ""); clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp |76 Col 14| Registry.addChecker<CustomChecker>("test.LocIncDecChecker", "Description", clang/unittests/StaticAnalyzer/SValSimplifyerTest.cpp |71 Col 14| Registry.addChecker<SimplifyChecker>("SimplifyChecker", "EmptyDescription", clang/unittests/StaticAnalyzer/SValTest.cpp |160 Col 1| SVAL_TEST(GetConstType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |180 Col 1| SVAL_TEST(GetLocAsIntType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |202 Col 1| SVAL_TEST(GetSymExprType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |224 Col 1| SVAL_TEST(GetPointerType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |278 Col 1| SVAL_TEST(GetCompoundType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |330 Col 1| SVAL_TEST(GetStringType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |342 Col 1| SVAL_TEST(GetThisType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |359 Col 1| SVAL_TEST(GetFunctionPtrType, R"( clang/unittests/StaticAnalyzer/SValTest.cpp |372 Col 1| SVAL_TEST(GetLabelType, R"( clang/unittests/StaticAnalyzer/TestReturnValueUnderConstruction.cpp |52 Col 16| Registry.addChecker<TestReturnValueUnderConstructionChecker>(

I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.
And btw, could we assert that when adding a checker the debug names are unique?

I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.

Most of these are mock checkers where the debug name is completely irrelevant, they won't be used for actual analysis. In fact I created a method called addMockChecker(StringRef FullName) which is sufficient for these simple situations.

And btw, could we assert that when adding a checker the debug names are unique?

Perhaps, although providing unique debug names for the mock checkers in the unit tests is pointless. Overall, the whole "debug name" concept is an extremely unimportant feature that is only used during manual investigation of bugs, so IMO it doesn't deserve complex assertions and we should stop making it the centerpiece of our architecture that demands reorganizations and constrains otherwise good plans.

I see. Maybe we could expose a macro to fill the debug name with the stringified token of the class?
I'm not a fan of macros but this could be a good use of them if it makes it more ergonomic.

Most of these are mock checkers where the debug name is completely irrelevant, they won't be used for actual analysis. In fact I created a method called addMockChecker(StringRef FullName) which is sufficient for these simple situations.

And btw, could we assert that when adding a checker the debug names are unique?

Perhaps, although providing unique debug names for the mock checkers in the unit tests is pointless. Overall, the whole "debug name" concept is an extremely unimportant feature that is only used during manual investigation of bugs, so IMO it doesn't deserve complex assertions and we should stop making it the centerpiece of our architecture that demands reorganizations and constrains otherwise good plans.

Makes sense. Lets not enforce this.

NagyDonat · 2025-05-20T16:48:39Z

I'm happy as this PR looks right now, except for having that backward compatibility overload for plugins.
We can just move on with life and let them migrate once the next clang is out. To me, the upgrade path looks straightforward and this isn't the only API they will need to adjust anyway. They should brace for impact.

I first created commit 84eb5af which eliminated the compatibility overloads and introduced addMockChecker to give a name to a pattern that appeared many times. However, as updated the doc comments, I realized that the fully general addChecker is needlessly verbose and hostile for plugin authors, so I created commit 04bfbf1 which introduces other overloads that are more suitable for plugins.

Unfortunately these changes add ±100 changed lines, which makes this commit even larger -- but at least I got to fix some minor problems (e.g. obsolete docs and one case of the mutable std::unique_ptr<BugType> disease) in the plugin test code.

Could you give an example when it's misleading and not refers to a class name? I figured the tablegen construct we have ensures this.

The second argument of the CHECKER macro is customarily called CLASS, but in fact it is a unique identifier of the particular checker fronted which is only used in the macro magic that assigns unique names to the registration methods. In each checker family there is at most one checker whose CLASS matches the actual class name, because otherwise it would be impossible to define separate registration methods for them. (I also looked at the tablegen code in clang/utils/TableGen/ClangSACheckersEmitter.cpp and nothing tries to check that it is a valid class name.) As a concrete example, the Checkers.inc lines for the divzero checkers look like

CHECKER("core.DivideZero", DivZeroChecker, "Check for division by zero", "https://...", false)
CHECKER("optin.taint.TaintedDiv", TaintedDivChecker, "Check for divisions ...", "https://...", false)

and you can see that TaintedDivChecker is not a real class name.

We should be fine touching the plugin registration. This shouldn't be disruptive.

Ok, noted -- I wasn't familiar with the plugin ecosystem.

WDYM by that this identifier is not stable?

Under this plan 6, the "identifier" = debug name of the checker family is the CLASS field of the sub-checker (=CheckerFrontend) which is enabled first (on that particular run).

As the CLASSes of the subcheckers are necessarily all different, enabling different sub-checkers always changes the debug name unless dependency relationships guarantee that the first subchecker is always enabled. However, in this "first subchecker is a common dependency of all other subcheckers" architecture the first subchecker does nothing, so it's just a very complex form boilerplate.

We could have a tablegen property like ImplementedBy<"MyChecker"> for the few checker families we currently have, and say that by default they are implemented by CLASS.

As I described above, we would need these ImplementedBy<"MyChecker"> markers in all checker families for almost all subcheckers (because only one subchecker can own the real CLASS name) -- which would be more boilerplate than just declaring one StringRef getDebugName() override {return "SomeCheckerFamily";} per checker family.

Based on these, I'm still pleading you to reconsider your rejection of option 1, because it is still the most clear solution to ensure that getDebugName() consistently returns the class name.

If the class name of the CheckerFamily is not stringified automatically via option 2 (which you also reject), then it must be written down somewhere. Directly placing it at its single point of use (getDebugName()) is much simpler than hiding it away in the highly magical Checkers.td definitions and then threading yet another "move this value from here to there" sphagetti into the already tangled checker registration code. For a novice checker developer who is familiar with C++, the task "provide a trivial implementation for this virtual method" is much easier than "put yet another field into this project-specific magical metadata file"!

Moreover, placing the class name into the current CheckerFrontend-centric Checkers.td (in a stable way that doesn't depend on the set of subcheckers enabled by the user) is not an easy task:

If each checker family has a "modeling checker" as its first subchecker which is a common dependency of all the "real" checkers, then the full modeling checker (two registration method + Checker<> specification in Checkers.td + dependency relationships) is boilerplate – lots of boilerplate – which could be avoided in all other implementations.
If Checker<> specification in Checker.td includes an ImplementedBy<"MyChecker"> to correct situations where CLASS differs from the real class name, then each checker family requires $(N-1)$ of these ImplementedBy tags where $N$ is the number of subcheckers. If $N > 2$, this is more verbose than defining just one method override.
If we update Checkers.td to ensure that the CLASS field always contains the real class name, then we need to add an extra field which can be used to distinguish the names of the register methods. Adding one RegisterNameTag<"Foo"> to each checker part (possibly excluding one) is more verbose than defining just one method override for the whole family.

Another reason for supporting plan 1 is that it is a good "placeholder" that leaves space for later improvements that could provide an actually good way of injecting the checker class name. For example in the future we could move towards my "alternative" plan, where I suggested that we should

Extend the format of Checkers.td to support three kinds of checker entities:

ModelingCheckers that have a debug name and can be activated as dependencies (and probably they can also have dependencies);

ReportingCheckers that have a user-facing name, can be enabled or disabled by the user, can have checker options and depend on one or more ModelingCheckers;

Checkers which are just a shortcut for introducing a ModelingChecker and a single ReportingChecker that depends on it.

In this system each checker family would correspond to a single lightweight ModelingChecker entry in Checkers.td from which we could easily query the class name.

NagyDonat · 2025-05-20T22:41:49Z

I have yet another implementation idea where:

boilerplate is either minimal or nonexistent;
there is no need to pass an additional string value (the debug name) through the checker registry code;
the debug name is guaranteed to be the same as the class name.

The minimal-boilerplate version looks like

template <typename T>
StringRef getCheckerFamilyName();

#define REGISTER_CHECKER_FAMILY(NAME) template <> StringRef getCheckerFamilyName<NAME>() { return #NAME; }

// in the body of CheckerFamily
  template <typename CHECKER>
  static void _register(CHECKER *Chk, CheckerManager &Mgr) {
    Chk->CheckerBackendName = getCheckerFamilyName<CHECKER>();
    (CHECKs::_register(Chk, Mgr), ...);
  }

// in each file that implements a checker family
REGISTER_CHECKER_FAMILY(MyBeautifulCheckerFamily)

If we assume that each checker family class is mentioned in Checkers.td, then we can replace REGISTER_CHECKER_FAMILY with a header that says

#define GET_CHECKERS
#define CHECKER(FULLNAME, CLASS, HELPTEXT, DOC_URI, IS_HIDDEN) template <> StringRef getCheckerFamilyName<typename CLASS>() { return #CLASS; }
#include "Checkers.inc"
#undef CHECKER
#undef GET_CHECKERS

... however this is a hack, and I would prefer explicit REGISTER_CHECKER_FAMILY macro calls – either in the checker files or in a separate header.

(Disclaimer: I wrote this at home without testing, so the code may need tweaks before it can be compiled.)

@steakhal WDYT about this one?

steakhal · 2025-05-23T14:08:10Z

I don't think I have time for this. I'm really sorry. I read the first couple paragraphs then gave up. I'm really short of time.
The two commits made sense. And resolved most if not all of my major concerns.

What would it take to merge this ASAP?

NagyDonat · 2025-05-23T14:32:11Z

I don't think I have time for this. I'm really sorry. I read the first couple paragraphs then gave up. I'm really short of time.
The two commits made sense. And resolved most if not all of my major concerns.

I see, and I understand that this review process became much longer than the ideal. Unfortunately, I have major concerns with this PR and would not like to merge it as is. Based on your comments I suspect that you are misunderstanding the situation (especially about the contents of Checkers.td), and that's what I'm trying to clarify in my long comment #139256 (comment) .

What would it take to merge this ASAP?

Just please let me return to the trivial solution where each CheckerFamily directly defines its StringRef getDebugName() override { return "NameOfTheClass"; }. I'm strongly convinced that this is the most elegant approach (as I explained this in my long comment #139256 (comment) ).

I would not be opposed to eliminate this one last piece of boilerplate in follow-up commits if there is a clear solution but the current state of this PR is NOT a clear solution.

steakhal · 2025-05-23T14:38:19Z

What would it take to merge this ASAP?

Just please let me return to the trivial solution where each CheckerFamily directly defines its StringRef getDebugName() override { return ClassName; }. I'm strongly convinced that this is the most elegant approach (as I explained this in my long comment #139256 (comment) ).

Fine, lets do that. I hope we can untangle this in the future.

NagyDonat · 2025-05-23T15:14:41Z

Thanks for your understanding – and sorry for stretching this review process for so long.

I will push and merge the reduced commit on Monday (to be able to follow up its effect in the CI). I will probably also create a spinoff PR that contains some minor NFC code quality improvements (introducing addMockChecker, adding comments) for the CodeRegistry areas.

This commit reverts the last 6 commits on this PR, restoring the revision where each class derived from `CheckerFamily` needs to override `getTagDescription()` with a trivial boilerplate implementation that returns the name of the class as a string. Obviously, boilerplate is not a good thing, but unfortunately there is no portable way to stringify the name of a type (e.g. class), so any portable implementation of `getTagDescription` would need to take the name of the class from *somewhere* where it's present as a string -- and then directly placing it in a method override is much simpler than loading it from `Checkers.td`. Note that the existing `CLASS` field in `Checkers.td` is not suitable for our goals, because instead of storing the same class name for each sub-checker, in fact each sub-checker needs to have its own unique value there (to ensure that the names of the register methods are all unique). For a more detailed explanation see my review comment llvm#139256 (comment) I'm completely open to reopening discussion about a boilerplate-free solution for `CheckerFamily::getTagDescription` in the future, but even if we find a good solution, it should be a separate commit. This commit reverts the commit llvm@7da3862 which renames `getTagDescription()` to `getDebugName()` because although I really like that change (by Balázs Benics), I think it would be better as a separate commit because it's only tangentially related to this one. This commit also reverts some comment clarifications and minor code quality improvements in the areas related to `CheckerRegistry`. I'll try to merge those later as a spinoff NFC commit.

github-actions · 2025-05-26T14:54:00Z

✅ With the latest revision this PR passed the C/C++ code formatter.

NagyDonat · 2025-05-26T14:59:07Z

I reverted this PR to an earlier stage and added a more verbose explanation for the requirement that classes derived from CheckerFamily need to override getTagDescription.

@steakhal In addition to the other changes, I also reverted commit 7da3862 where you renamed getTagDescription to getDebugName, because although I completely agree with that change, I think it would be nicer to merge it as a separate commit (because e.g. this relatively large commit would be easier to understand without the name change). Do you want to create the PR for it (after this is merged) or should I create it?

steakhal · 2025-05-26T15:05:47Z

@steakhal In addition to the other changes, I also reverted commit 7da3862 where you renamed getTagDescription to getDebugName, because although I completely agree with that change, I think it would be nicer to merge it as a separate commit (because e.g. this relatively large commit would be easier to understand without the name change). Do you want to create the PR for it (after this is merged) or should I create it?

I don't want to work on this, nor to propose a PR in this area.

The checker classes (i.e. classes derived from `CheckerBase` via the utility template `Checker<...>`) act as intermediates between the user and the analyzer engine, so they have two interfaces: - On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports. - On the backend side, they can handle various checker callbacks and they "leave a mark" on the `ExplodedNode`s that are created by them. (These `ProgramPointTag` marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.) In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of `mutable std::unique_ptr<BugType>` members and redundant data members (when a checker used its custom `CheckNames` array and ignored the inherited single `Name`). My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness. This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with `{parent checker object, index of part}` pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member. This is implemented by separating the functionality of `CheckerBase` into two new classes: `CheckerFrontend` and `CheckerBackend`. The name `CheckerBase` is kept (as a class derived from both `CheckerFrontend` and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and `Checker<...>` continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation: - The class `CheckerFamily<...>` works exactly like `Checker<...>` but inherits from `CheckerBackend` instead of `CheckerBase`, so it won't have a superfluous single `Name` member. - Classes deriving from `CheckerFamily` can freely own multiple `CheckerFrontend` data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the `BugType`s that they can emit. In this scheme each `CheckerFamily` needs to override the pure virtual method `ProgramPointTag::getTagDescription()` which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.) I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this `CheckerFamily` framework.

NagyDonat requested review from Xazax-hun, haoNoQ, gamesh411 and steakhal May 9, 2025 12:51

llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:analysis labels May 9, 2025

Fix abbreviation of 'CheckerFrontend'

23d128c

NagyDonat commented May 9, 2025

View reviewed changes

clang/include/clang/StaticAnalyzer/Core/Checker.h Show resolved Hide resolved

steakhal reviewed May 13, 2025

View reviewed changes

NagyDonat added 5 commits May 13, 2025 15:02

Use class name as tag description of checker family

8459b2f

Make CheckerFrontendWithBugType a subclass of BugType

76950c1

Rename CheckerFrontend *CF to Checker

1cbbc02

Declare data members in separate declarations

c7a96f2

Use fold expression over comma

4e8035f

steakhal and others added 4 commits May 19, 2025 15:21

NFC Rename getTagDescription to getDebugName

7da3862

HACK: Smuggle in the checker name into the CheckerFamily class

4becdeb

Use registration function name fragment as checker family debug name

13f4a30

This fragment is often the same as the class name.

s/FamilyChecker/CheckerFamily/ in comment

ffe6ac4

steakhal reviewed May 19, 2025

View reviewed changes

NagyDonat added 2 commits May 20, 2025 16:09

Refactor use of CheckerRegistry::addChecker()

84eb5af

Provide a simplified interface for plugins defining checkers

04bfbf1

NagyDonat mentioned this pull request May 26, 2025

[analyzer] Introduce the check::BlockEntrance checker callback #140924

Merged

Update the doc-comment of CheckerFamily

84192a7

NagyDonat requested a review from steakhal May 26, 2025 14:53

Satisfy git-clang-format

9c01aab

steakhal approved these changes May 26, 2025

View reviewed changes

NagyDonat merged commit 6833076 into llvm:main May 26, 2025
11 checks passed

[analyzer][NFC] Introduce framework for checker families #139256

[analyzer][NFC] Introduce framework for checker families #139256

Uh oh!

Conversation

NagyDonat commented May 9, 2025

Uh oh!

llvmbot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

steakhal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NagyDonat commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 13, 2025

Uh oh!

NagyDonat commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 15, 2025

Uh oh!

NagyDonat commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 15, 2025

Uh oh!

NagyDonat commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal commented May 16, 2025

Uh oh!

NagyDonat commented May 16, 2025

Uh oh!

steakhal commented May 16, 2025

Uh oh!

NagyDonat commented May 17, 2025

Uh oh!

NagyDonat commented May 17, 2025

Accommodating the modeling checkers in the checker family framework

"Default" plan

Alternative plan

Uh oh!

steakhal commented May 18, 2025

Uh oh!

NagyDonat commented May 19, 2025

Uh oh!

NagyDonat commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steakhal left a comment

Choose a reason for hiding this comment

Uh oh!

steakhal May 19, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal May 20, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 20, 2025

Choose a reason for hiding this comment

Uh oh!

steakhal May 20, 2025

Choose a reason for hiding this comment

Uh oh!

NagyDonat May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

llvmbot commented May 9, 2025 •

edited

Loading

NagyDonat commented May 13, 2025 •

edited

Loading

NagyDonat commented May 15, 2025 •

edited

Loading

NagyDonat commented May 15, 2025 •

edited

Loading

NagyDonat commented May 16, 2025 •

edited

Loading

NagyDonat commented May 19, 2025 •

edited

Loading

NagyDonat May 20, 2025 •

edited

Loading

NagyDonat May 20, 2025 •

edited

Loading

NagyDonat commented May 20, 2025 •

edited

Loading

NagyDonat commented May 20, 2025 •

edited

Loading

NagyDonat commented May 23, 2025 •

edited

Loading

github-actions bot commented May 26, 2025 •

edited

Loading

NagyDonat commented May 26, 2025 •

edited

Loading