Skip to content

[analyzer][NFC] Introduce framework for checker families #139256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

NagyDonat
Copy link
Contributor

The checker classes (i.e. classes derived from CheckerBase via the utility template Checker<...>) act as intermediates between the user and the analyzer engine, so they have two interfaces:

  • On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports.
  • On the backend side, they can handle various checker callbacks and they "leave a mark" on the ExplodedNodes that are created by them. (These ProgramPointTag marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)

In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of mutable std::unique_ptr<BugType> members and redundant data members (when a checker used its custom CheckNames array and ignored the inherited single Name).

My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.

This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with {parent checker object, index of part} pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.

This is implemented by separating the functionality of CheckerBase into two new classes: CheckerFrontend and CheckerBackend. The name CheckerBase is kept (as a class derived from both CheckerFrontend and CheckerBackend), so "simple" checkers that use CheckerBase and Checker<...> continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:

  • The class CheckerFamily<...> works exactly like Checker<...> but inherits from CheckerBackend instead of CheckerBase, so it won't have a superfluous single Name member.
  • Classes deriving from CheckerFamily can freely own multiple CheckerFrontend data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the BugTypes that they can emit.

In this scheme each CheckerFamily needs to override the pure virtual method ProgramPointTag::getTagDescription() which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)

I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this CheckerFamily framework.

The checker classes (i.e. classes derived from `CheckerBase` via the
utility template `Checker<...>`) act as intermediates between the user
and the analyzer engine, so they have two interfaces:
- On the frontend side, they have a public name, can be enabled or
  disabled, can accept checker options and can be reported as the source
  of bug reports.
- On the backend side, they can handle various checker callbacks and
  they "leave a mark" on the `ExplodedNode`s that are created by them.
  (These `ProgramPointTag` marks are internal: they appear in debug logs
  and can be queried by checker logic; but the user doesn't see them.)

In a significant majority of the checkers there is 1:1 correspondence
between these sides, but there are also many checker classes where
several related user-facing checkers share the same backend class.
Historically each of these "multi-part checker" classes had its own
hacks to juggle its multiple names, which led to lots of ugliness like
lazy initialization of `mutable std::unique_ptr<BugType>` members and
redundant data members (when a checker used its custom `CheckNames`
array and ignored the inherited single `Name`).

My recent commit 2709998 tried to
unify and standardize these existing solutions to get rid of some of the
technical debt, but it still used enum values to identify the checker
parts within a "multi-part" checker class, which led to some ugliness.

This commit introduces a new framework which takes a more direct,
object-oriented approach: instead of identifying checker parts with
`{parent checker object, index of part}` pairs, the parts of a
multi-part checker become stand-alone objects that store their own name
(and enabled/disabled status) as a data member.

This is implemented by separating the functionality of `CheckerBase`
into two new classes: `CheckerFrontend` and `CheckerBackend`. The name
`CheckerBase` is kept (as a class derived from both `CheckerFrontend`
and `CheckerBackend`), so "simple" checkers that use `CheckerBase` and
`Checker<...>` continues to work without changes. However we also get
first-class support for the "many frontends - one backend" situation:
- The class `CheckerFamily<...>` works exactly like `Checker<...>` but
  inherits from `CheckerBackend` instead of `CheckerBase`, so it won't
  have a superfluous single `Name` member.
- Classes deriving from `CheckerFamily` can freely own multiple
  `CheckerFrontend` data members, which are enabled within the
  registration methods corresponding to their name and can be used to
  initialize the `BugType`s that they can emit.

In this scheme each `CheckerFamily` needs to override the pure virtual
method `ProgramPointTag::getTagDescription()` which returns a string
which represents that class for debugging purposes. (Previously this
used the name of one arbitrary sub-checker, which was passable for
debugging purposes, but not too elegant.)

I'm planning to implement follow-up commits that convert all the
"multi-part" checkers to this `CheckerFamily` framework.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:analysis labels May 9, 2025
@llvmbot
Copy link
Member

llvmbot commented May 9, 2025

@llvm/pr-subscribers-clang-static-analyzer-1
@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

The checker classes (i.e. classes derived from CheckerBase via the utility template Checker&lt;...&gt;) act as intermediates between the user and the analyzer engine, so they have two interfaces:

  • On the frontend side, they have a public name, can be enabled or disabled, can accept checker options and can be reported as the source of bug reports.
  • On the backend side, they can handle various checker callbacks and they "leave a mark" on the ExplodedNodes that are created by them. (These ProgramPointTag marks are internal: they appear in debug logs and can be queried by checker logic; but the user doesn't see them.)

In a significant majority of the checkers there is 1:1 correspondence between these sides, but there are also many checker classes where several related user-facing checkers share the same backend class. Historically each of these "multi-part checker" classes had its own hacks to juggle its multiple names, which led to lots of ugliness like lazy initialization of mutable std::unique_ptr&lt;BugType&gt; members and redundant data members (when a checker used its custom CheckNames array and ignored the inherited single Name).

My recent commit 2709998 tried to unify and standardize these existing solutions to get rid of some of the technical debt, but it still used enum values to identify the checker parts within a "multi-part" checker class, which led to some ugliness.

This commit introduces a new framework which takes a more direct, object-oriented approach: instead of identifying checker parts with {parent checker object, index of part} pairs, the parts of a multi-part checker become stand-alone objects that store their own name (and enabled/disabled status) as a data member.

This is implemented by separating the functionality of CheckerBase into two new classes: CheckerFrontend and CheckerBackend. The name CheckerBase is kept (as a class derived from both CheckerFrontend and CheckerBackend), so "simple" checkers that use CheckerBase and Checker&lt;...&gt; continues to work without changes. However we also get first-class support for the "many frontends - one backend" situation:

  • The class CheckerFamily&lt;...&gt; works exactly like Checker&lt;...&gt; but inherits from CheckerBackend instead of CheckerBase, so it won't have a superfluous single Name member.
  • Classes deriving from CheckerFamily can freely own multiple CheckerFrontend data members, which are enabled within the registration methods corresponding to their name and can be used to initialize the BugTypes that they can emit.

In this scheme each CheckerFamily needs to override the pure virtual method ProgramPointTag::getTagDescription() which returns a string which represents that class for debugging purposes. (Previously this used the name of one arbitrary sub-checker, which was passable for debugging purposes, but not too elegant.)

I'm planning to implement follow-up commits that convert all the "multi-part" checkers to this CheckerFamily framework.


Patch is 32.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139256.diff

11 Files Affected:

  • (modified) clang/include/clang/Analysis/ProgramPoint.h (+2-2)
  • (modified) clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h (+4-3)
  • (modified) clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h (+25-19)
  • (modified) clang/include/clang/StaticAnalyzer/Core/Checker.h (+75-57)
  • (modified) clang/include/clang/StaticAnalyzer/Core/CheckerManager.h (+24-56)
  • (modified) clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp (+16-18)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ObjCSelfInitChecker.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/VirtualCallChecker.cpp (+15-14)
  • (modified) clang/lib/StaticAnalyzer/Core/BugReporter.cpp (+2-2)
  • (modified) clang/lib/StaticAnalyzer/Core/Checker.cpp (+5-9)
  • (modified) clang/lib/StaticAnalyzer/Core/CheckerManager.cpp (+11-11)
diff --git a/clang/include/clang/Analysis/ProgramPoint.h b/clang/include/clang/Analysis/ProgramPoint.h
index c40aa3d8ffb72..d81b8e845cb48 100644
--- a/clang/include/clang/Analysis/ProgramPoint.h
+++ b/clang/include/clang/Analysis/ProgramPoint.h
@@ -40,8 +40,8 @@ class ProgramPointTag {
   ProgramPointTag(void *tagKind = nullptr) : TagKind(tagKind) {}
   virtual ~ProgramPointTag();
 
-  /// The description of this program point which will be displayed when the
-  /// ExplodedGraph is dumped in DOT format for debugging.
+  /// The description of this program point which will be dumped for debugging
+  /// purposes. Do not use in user-facing output!
   virtual StringRef getTagDescription() const = 0;
 
   /// Used to implement 'isKind' in subclasses.
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
index 8e1d25b3eefa1..33d37febc7327 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
@@ -643,9 +643,10 @@ class BugReporter {
   /// reports.
   virtual void emitReport(std::unique_ptr<BugReport> R);
 
-  void EmitBasicReport(const Decl *DeclWithIssue, const CheckerBase *Checker,
-                       StringRef BugName, StringRef BugCategory,
-                       StringRef BugStr, PathDiagnosticLocation Loc,
+  void EmitBasicReport(const Decl *DeclWithIssue,
+                       const CheckerFrontend *Checker, StringRef BugName,
+                       StringRef BugCategory, StringRef BugStr,
+                       PathDiagnosticLocation Loc,
                        ArrayRef<SourceRange> Ranges = {},
                        ArrayRef<FixItHint> Fixits = {});
 
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 3a635e0d0125a..b05360904f86d 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -27,11 +27,7 @@ class BugReporter;
 
 class BugType {
 private:
-  struct CheckerPartRef {
-    const CheckerBase *Checker;
-    CheckerPartIdx Idx;
-  };
-  using CheckerNameInfo = std::variant<CheckerNameRef, CheckerPartRef>;
+  using CheckerNameInfo = std::variant<CheckerNameRef, const CheckerFrontend *>;
 
   const CheckerNameInfo CheckerName;
   const std::string Description;
@@ -43,7 +39,7 @@ class BugType {
 public:
   // Straightforward constructor where the checker name is specified directly.
   // TODO: As far as I know all applications of this constructor involve ugly
-  // hacks that could be avoided by switching to a different constructor.
+  // hacks that could be avoided by switching to the other constructor.
   // When those are all eliminated, this constructor should be removed to
   // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
@@ -52,18 +48,11 @@ class BugType {
         SuppressOnSink(SuppressOnSink) {}
   // Constructor that can be called from the constructor of a checker object.
   // At that point the checker name is not yet available, but we can save a
-  // pointer to the checker and later use that to query the name.
-  BugType(const CheckerBase *Checker, StringRef Desc,
+  // pointer to the checker and use that to query the name.
+  BugType(const CheckerFrontend *CF, StringRef Desc,
           StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-      : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
-        Category(Cat), SuppressOnSink(SuppressOnSink) {}
-  // Constructor that can be called from the constructor of a checker object
-  // when it has multiple parts with separate names. We save the name and the
-  // part index to be able to query the name of that part later.
-  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
-          StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-      : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
-        Category(Cat), SuppressOnSink(SuppressOnSink) {}
+      : CheckerName(CF), Description(Desc), Category(Cat),
+        SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
@@ -72,8 +61,7 @@ class BugType {
     if (const auto *CNR = std::get_if<CheckerNameRef>(&CheckerName))
       return *CNR;
 
-    auto [Checker, Idx] = std::get<CheckerPartRef>(CheckerName);
-    return Checker->getName(Idx);
+    return std::get<const CheckerFrontend *>(CheckerName)->getName();
   }
 
   /// isSuppressOnSink - Returns true if bug reports associated with this bug
@@ -82,6 +70,24 @@ class BugType {
   bool isSuppressOnSink() const { return SuppressOnSink; }
 };
 
+/// Trivial convenience class for the common case when a certain checker
+/// frontend always uses the same bug type. This way instead of writing
+/// ```
+///   CheckerFrontend LongCheckerFrontendName;
+///   BugType LongCheckerFrontendNameBug{LongCheckerFrontendName, "..."};
+/// ```
+/// we can use `CheckerFrontendWithBugType LongCheckerFrontendName{"..."}`.
+class CheckerFrontendWithBugType : public CheckerFrontend {
+  BugType BT;
+
+public:
+  CheckerFrontendWithBugType(StringRef Desc,
+                             StringRef Cat = categories::LogicError,
+                             bool SuppressOnSink = false)
+      : BT(this, Desc, Cat, SuppressOnSink) {}
+  const BugType &getBT() const { return BT; }
+};
+
 } // namespace ento
 
 } // end clang namespace
diff --git a/clang/include/clang/StaticAnalyzer/Core/Checker.h b/clang/include/clang/StaticAnalyzer/Core/Checker.h
index a54c5bee612f6..45b0398f3aca5 100644
--- a/clang/include/clang/StaticAnalyzer/Core/Checker.h
+++ b/clang/include/clang/StaticAnalyzer/Core/Checker.h
@@ -484,83 +484,87 @@ class Call {
 
 } // end eval namespace
 
-class CheckerBase : public ProgramPointTag {
-  /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
-  /// multiple user-facing checkers that have separate names and can be enabled
-  /// separately, but are backed by the same singleton checker object.
-  SmallVector<std::optional<CheckerNameRef>, 1> RegisteredNames;
-
-  friend class ::clang::ento::CheckerManager;
+/// A `CheckerFrontend` instance is what the user recognizes as "one checker":
+/// it has a public canonical name (injected from the `CheckerManager`), can be
+/// enabled or disabled, can have associated checker options and can be printed
+/// as the "source" of bug reports.
+/// The singleton instance of a simple `Checker<...>` is-a `CheckerFrontend`
+/// (for historical reasons, to preserve old straightforward code), while the
+/// singleton instance of a `CheckerFamily<...>` class owns multiple
+/// `CheckerFrontend` instances as data members.
+/// Modeling checkers that are hidden from the user but can be enabled or
+/// disabled separately (as dependencies of other checkers) are also considered
+/// to be `CheckerFrontend`s.
+class CheckerFrontend {
+  /// The `Name` is nullopt if and only if the checker is disabled.
+  std::optional<CheckerNameRef> Name;
 
 public:
-  CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
-    assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
-    std::optional<CheckerNameRef> Name = RegisteredNames[Idx];
-    assert(Name && "Requested checker part is not registered!");
-    return *Name;
-  }
-
-  bool isPartEnabled(CheckerPartIdx Idx) const {
-    return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
-  }
-
-  void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
-    // Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
-    assert(Idx < 256 && "Checker part identifiers should be small integers.");
-
-    if (Idx >= RegisteredNames.size())
-      RegisteredNames.resize(Idx + 1, std::nullopt);
-
-    assert(!RegisteredNames[Idx] && "Repeated registration of checker a part!");
-    RegisteredNames[Idx] = Name;
-  }
-
-  StringRef getTagDescription() const override {
-    // When the ExplodedGraph is dumped for debugging (in DOT format), this
-    // method is called to attach a description to nodes created by this
-    // checker _class_. Ideally this should be recognizable identifier of the
-    // whole class, but for this debugging purpose it's sufficient to use the
-    // name of the first registered checker part.
-    for (const auto &OptName : RegisteredNames)
-      if (OptName)
-        return *OptName;
-
-    return "Unregistered checker";
+  void enable(CheckerManager &Mgr) {
+    assert(!Name && "Checker part registered twice!");
+    Name = Mgr.getCurrentCheckerName();
   }
+  bool isEnabled() const { return static_cast<bool>(Name); }
+  CheckerNameRef getName() const { return *Name; }
+};
 
+/// `CheckerBackend` is an abstract base class that serves as the common
+/// ancestor of all the `Checker<...>` and `CheckerFamily<...>` classes which
+/// can create `ExplodedNode`s (by acting as a `ProgramPointTag`) and can be
+/// registered to handle various checker callbacks. (Moreover the debug
+/// callback `printState` is also introduced here.)
+class CheckerBackend : public ProgramPointTag {
+public:
   /// Debug state dump callback, see CheckerManager::runCheckersForPrintState.
   /// Default implementation does nothing.
   virtual void printState(raw_ostream &Out, ProgramStateRef State,
                           const char *NL, const char *Sep) const;
 };
 
-/// Dump checker name to stream.
-raw_ostream& operator<<(raw_ostream &Out, const CheckerBase &Checker);
-
-/// Tag that can use a checker name as a message provider
-/// (see SimpleProgramPointTag).
-class CheckerProgramPointTag : public SimpleProgramPointTag {
+/// The non-templated common ancestor of all the simple `Checker<...>` classes.
+class CheckerBase : public CheckerFrontend, public CheckerBackend {
 public:
-  CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
-  CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+  /// Attached to nodes created by this checker class when the ExplodedGraph is
+  /// dumped for debugging.
+  StringRef getTagDescription() const override;
 };
 
-template <typename CHECK1, typename... CHECKs>
-class Checker : public CHECK1, public CHECKs..., public CheckerBase {
+// Template magic to implement the static method `_register()` which registers
+// the `Checker` or `CheckerFamily` for all the implemented callbacks.
+template <typename CHECKER, typename CHECK1, typename... CHECKs>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {
+  CHECK1::_register(Chk, Mgr);
+  registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
+}
+
+template <typename CHECKER>
+static void registerImpl(CHECKER *Chk, CheckerManager &Mgr) {}
+
+/// Simple checker classes that implement one frontend (i.e. checker name)
+/// should derive from this template and specify all the implemented callbacks
+/// (i.e. classes like `check::PreStmt` or `eval::Call`) as template arguments
+/// of `Checker`.
+template <typename... CHECKs>
+class Checker : public CheckerBase, public CHECKs... {
 public:
   template <typename CHECKER>
-  static void _register(CHECKER *checker, CheckerManager &mgr) {
-    CHECK1::_register(checker, mgr);
-    Checker<CHECKs...>::_register(checker, mgr);
+  static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+    registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
   }
 };
 
-template <typename CHECK1>
-class Checker<CHECK1> : public CHECK1, public CheckerBase {
+/// Checker families (where a single backend class implements multiple related
+/// frontends) should derive from this template, specify all the implemented
+/// callbacks (i.e. classes like `check::PreStmt` or `eval::Call`) as template
+/// arguments of `FamilyChecker` and implement the pure virtual method
+/// `StringRef getTagDescription()` which is inherited from `ProgramPointTag`
+/// and should return a string identifying the class for debugging purposes.
+template <typename... CHECKs>
+class CheckerFamily : public CheckerBackend, public CHECKs... {
 public:
   template <typename CHECKER>
-  static void _register(CHECKER *checker, CheckerManager &mgr) {
-    CHECK1::_register(checker, mgr);
+  static void _register(CHECKER *Chk, CheckerManager &Mgr) {
+    registerImpl<CHECKER, CHECKs...>(Chk, Mgr);
   }
 };
 
@@ -581,6 +585,20 @@ class EventDispatcher {
   }
 };
 
+/// Tag that can use a checker name as a message provider
+/// (see SimpleProgramPointTag).
+/// FIXME: This is a cargo cult class which is copied into several checkers but
+/// does not provide anything useful.
+/// The only added functionality provided by this class (compared to
+/// SimpleProgramPointTag) is that it composes the tag description string from
+/// two arguments -- but tag descriptions only appear in debug output so there
+/// is no reason to bother with this.
+class CheckerProgramPointTag : public SimpleProgramPointTag {
+public:
+  CheckerProgramPointTag(StringRef CheckerName, StringRef Msg);
+  CheckerProgramPointTag(const CheckerBase *Checker, StringRef Msg);
+};
+
 /// We dereferenced a location that may be null.
 struct ImplicitNullDerefEvent {
   SVal Location;
diff --git a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
index 03ffadd346d0b..8fa122f004bfe 100644
--- a/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
+++ b/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
@@ -39,7 +39,8 @@ class AnalysisManager;
 class CXXAllocatorCall;
 class BugReporter;
 class CallEvent;
-class CheckerBase;
+class CheckerFrontend;
+class CheckerBackend;
 class CheckerContext;
 class CheckerRegistry;
 struct CheckerRegistryData;
@@ -64,9 +65,9 @@ class CheckerFn<RET(Ps...)> {
   Func Fn;
 
 public:
-  CheckerBase *Checker;
+  CheckerBackend *Checker;
 
-  CheckerFn(CheckerBase *checker, Func fn) : Fn(fn), Checker(checker) {}
+  CheckerFn(CheckerBackend *checker, Func fn) : Fn(fn), Checker(checker) {}
 
   RET operator()(Ps... ps) const {
     return Fn(Checker, ps...);
@@ -116,19 +117,6 @@ class CheckerNameRef {
   operator StringRef() const { return Name; }
 };
 
-/// A single checker class (and its singleton instance) can act as the
-/// implementation of several (user-facing or modeling) checker parts that
-/// have shared state and logic, but have their own names and can be enabled or
-/// disabled separately.
-/// Each checker class that implement multiple parts introduces its own enum
-/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
-/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
-using CheckerPartIdx = unsigned;
-
-/// If a checker doesn't have multiple parts, then its single part is
-/// represented by this index.
-constexpr inline CheckerPartIdx DefaultPart = 0;
-
 enum class ObjCMessageVisitKind {
   Pre,
   Post,
@@ -193,14 +181,7 @@ class CheckerManager {
 
   /// Emits an error through a DiagnosticsEngine about an invalid user supplied
   /// checker option value.
-  void reportInvalidCheckerOptionValue(const CheckerBase *C,
-                                       StringRef OptionName,
-                                       StringRef ExpectedValueDesc) const {
-    reportInvalidCheckerOptionValue(C, DefaultPart, OptionName,
-                                    ExpectedValueDesc);
-  }
-
-  void reportInvalidCheckerOptionValue(const CheckerBase *C, CheckerPartIdx Idx,
+  void reportInvalidCheckerOptionValue(const CheckerFrontend *CP,
                                        StringRef OptionName,
                                        StringRef ExpectedValueDesc) const;
 
@@ -210,28 +191,15 @@ class CheckerManager {
   // Checker registration.
   //===--------------------------------------------------------------------===//
 
-  /// Construct the singleton instance of a checker, register it for the
-  /// supported callbacks and record its name with `registerCheckerPart()`.
-  /// Arguments passed to this function are forwarded to the constructor of the
-  /// checker.
-  ///
-  /// If `CHECKER` has multiple parts, then the constructor call and the
-  /// callback registration only happen within the first `registerChecker()`
-  /// call; while the subsequent calls only enable additional parts of the
-  /// existing checker object (while registering their names).
-  ///
-  /// \returns a pointer to the checker object.
-  template <typename CHECKER, CheckerPartIdx Idx = DefaultPart, typename... AT>
-  CHECKER *registerChecker(AT &&...Args) {
-    // This assert could be removed but then we need to make sure that calls
-    // registering different parts of the same checker pass the same arguments.
-    static_assert(
-        Idx == DefaultPart || !sizeof...(AT),
-        "Argument forwarding isn't supported with multi-part checkers!");
-
+  /// If the the singleton instance of a checker class is not yet constructed,
+  /// then construct it (with the supplied arguments), register it for the
+  /// callbacks that are supported by it, and return it. Otherwise, just return
+  /// a pointer to the existing instance.
+  template <typename CHECKER, typename... AT>
+  CHECKER *getChecker(AT &&...Args) {
     CheckerTag Tag = getTag<CHECKER>();
 
-    std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
+    std::unique_ptr<CheckerBackend> &Ref = CheckerTags[Tag];
     if (!Ref) {
       std::unique_ptr<CHECKER> Checker =
           std::make_unique<CHECKER>(std::forward<AT>(Args)...);
@@ -239,18 +207,18 @@ class CheckerManager {
       Ref = std::move(Checker);
     }
 
-    CHECKER *Result = static_cast<CHECKER *>(Ref.get());
-    Result->registerCheckerPart(Idx, CurrentCheckerName);
-    return Result;
+    return static_cast<CHECKER *>(Ref.get());
   }
 
-  template <typename CHECKER>
-  CHECKER *getChecker() {
-    CheckerTag Tag = getTag<CHECKER>();
-    std::unique_ptr<CheckerBase> &Ref = CheckerTags[Tag];
-    assert(Ref && "Requested checker is not registered! Maybe you should add it"
-                  " as a dependency in Checkers.td?");
-    return static_cast<CHECKER *>(Ref.get());
+  /// Register a single-part checker (derived from `Checker`): construct its
+  /// singleton instance, register it for the supported callbacks and record
+  /// its name (with `CheckerFrontend::enable`). Calling this multiple times
+  /// triggers an assertion failure.
+  template <typename CHECKER, typename... AT>
+  CHECKER *registerChecker(AT &&...Args) {
+    CHECKER *Chk = getChecker<CHECKER>(std::forward<AT>(Args)...);
+    Chk->enable(*this);
+    return Chk;
   }
 
   template <typename CHECKER> bool isRegisteredChecker() {
@@ -482,7 +450,7 @@ class CheckerManager {
   /// Run checkers for debug-printing a ProgramState.
   ///
   /// Unlike most other callbacks, any checker can simply implement the virtual
-  /// method CheckerBase::printState if it has custom data to print.
+  /// method CheckerBackend::printState if it has custom data to print.
   ///
   /// \param Out   The output stream
   /// \param State The state being printed
@@ -651,7 +619,7 @@ class CheckerManager {
   template <typename T>
   static void *getTag() { static int tag; return &tag; }
 
-  llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBase>> CheckerTags;
+  llvm::DenseMap<CheckerTag, std::unique_ptr<CheckerBackend>> CheckerTags;
 
   struct DeclCheckerInfo {
     CheckDeclFunc CheckFn;
diff --git a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
index 3dd57732305b2..7672c63a646e4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
@@ -25,7 +25,7 @@ using namespace ento;
 using namespace taint;
 
 namespace {
-class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
+class DivZeroChecker : public CheckerFamily<check::PreStmt<BinaryOperator>> {
   void reportBug(StringRef Msg, ProgramStateRef StateZero,
                  CheckerContext &C) const;
   void reportTaintBug(StringRef Msg, ProgramStateRef StateZero,
@@ -33,17 +33,15 @@ class DivZeroChecker : public Checker<check::PreStmt<BinaryOperator>> {
                       llvm::ArrayRef<SymbolRef> TaintedSym...
[truncated]

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is a really good direction. I like it.
I left a couple minor remarks inline.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented May 13, 2025

I handled all the inline comments.

I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of CheckerFamily<> override getTagDescription() and return their class name as a hardcoded string literal. I think this approach is acceptable, and if we keep it then I'll add a comment in CheckerFamily which says that subclasses must override getTagDescription() this way. However, I also feel that there might be a better solution -- @steakhal what do you think?

@steakhal
Copy link
Contributor

I handled all the inline comments.

I have one minor architectural question: we should standardize a way to assign a single tag description (that is, an identifier that can be used in debug dumps) to each checker family. The old code automatically used the name of the first checker part as the tag description (which was not very elegant...); while in the current version of this patch subclasses of CheckerFamily<> override getTagDescription() and return their class name as a hardcoded string literal. I think this approach is acceptable, and if we keep it then I'll add a comment in CheckerFamily which says that subclasses must override getTagDescription() this way. However, I also feel that there might be a better solution -- @steakhal what do you think?

I'll think about this, but overall I'd prefer simplicity for defining Checkers - which may make magic behind the doors slightly more involved.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented May 15, 2025

I prototyped a "get the name of template argument as string" solution which is sufficient for our use cases (where the checker family is a non-templated class type) and works on all supported versions of all three compilers (GCC, clang, MSVC) that are supported for LLVM compilation:
https://godbolt.org/z/8b9YGfvdG

#include <iostream>
#include <string>

class MyFancyCheckerFamily {};

template <class T>
std::string getTypeName() {
#ifndef _MSC_VER
    std::string res = __PRETTY_FUNCTION__;
    int start = res.find('='), end = res.find(';');
    if (end == -1)
        end = res.find(']');
    if (start == -1 || end == -1)
        return "unknown-checker-family"; // paranoia, should not happen
    return res.substr(start + 2, end - start - 2);
#else
    std::string res = __FUNCSIG__;
    int start = res.find("<class "), end = res.find('>');
    if (start == -1 || end == -1)
        return "unknown-checker-family"; // paranoia, should not happen
    return res.substr(start + 7, end - start - 7);
#endif
}

int main() {
  std::cout << getTypeName<MyFancyCheckerFamily>() << std::endl;
  return 0;
}

If I invoke this machinery from the registration functions of the checker families (where each checker type is available as a template argument), then it can "magically" ensure that getTagDescription() (i.e. the debug name) returns the class name of the checker family as a string.

This is a bit ugly, because there is no standard way to stringify a type, so I need to rely on the macros __PRETTY_FUNCTION__ (GCC and clang with slightly different content) / __FUNCSIG__ (MSVC), but I don't think that there is a cleaner solution that doesn't burden the classes derived from CheckerFamily with any boilerplate (like overriding getTagDescription()).

(By the way, note that "simple" single-part Checkers already get their debug name automatically, because it can be equal to their user-facing name; so this question only affects the CheckerFamilyes.)

@steakhal (or anybody else) What do you think about this?

@steakhal
Copy link
Contributor

What do you think about this?

I don't like this. LLVM is used as a library for wide set of tools and compiled with an even wider set of compilers.
I know that LLVM advertises a couple of compilers that are supported as host toolchains, but I'd say that's likely not a complete set.

As a user, I wouldn't mind a solution like this, but as a library vendor, I'd vote very much against this.
There must be some other way. I'll try to squeeze some free time exploring this to help you.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented May 15, 2025

🤔 These strings are only relevant for debugging, so this whole machinery could be limited to debug builds (or builds with an off-by-default flag). That way these debug names would be still available for the very specific situation when somebody is debugging the analyzer, but wouldn't cause problems for anybody else.

However, I'm also open to other ideas and I'd be happy to accept a better solution if you can find one.

I hope that I can merge the CheckerFamily framework in the foreseeable future (within a week?), because then I would be able to continue with the follow-up changes: (1) converting the existing multipart checkers to this framework and then (2) introducing checker family logic in a few other checkers (e.g. splitting off the taint-related part of security.ArrayBound into a separate part as Dániel Krupp did with the other optin.taint checkers).

@steakhal
Copy link
Contributor

I think I can't push to your branch. So here is what I'd do as git format-patch:

0001-NFC-Rename-getTagDescription-to-getDebugName.patch.txt
0002-HACK-Smuggle-in-the-checker-name-into-the-CheckerFam.patch.txt

@NagyDonat
Copy link
Contributor Author

NagyDonat commented May 16, 2025

Thanks for the patches! I originally thought that renaming getTagDescription -> getDebugName would be a separate follow-up commit but I see that it's a surprisingly small change (that function was barely used) so I'm happy to include it here.

I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:

  • this muddies the distinction between the CheckerFamily and one of the owned CheckerFrontends (may be a bit confusing for those who are unfamiliar with the analyzer code);
  • with this solution the same CheckerFamily may get a different debug name if different sub-checkers are enabled from it (in that particular run).

However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the __PRETTY_FUNCTION__ hack for using the class name is ugly.

I also thought about tweaking CheckerFamily::getDebugName to produce results like family-of-core.DivideZero for checker families (to highlight that it is connected to a checker family and not the particular checker frontend). What would you think about this?

@steakhal
Copy link
Contributor

Thanks for the patches! I originally thought that renaming getTagDescription -> getDebugName would be a separate follow-up commit but I see that it's a surprisingly small change (that function was barely used) so I'm happy to include it here.

I see that your code the status quo by using the name of an arbitrary sub-checker (the one that is registered first) as the debug name of the full checker family. Previously I have tried to avoid this solution because:

* this muddies the distinction between the `CheckerFamily` and one of the owned `CheckerFrontend`s (may be a bit confusing for those who are unfamiliar with the analyzer code);

* with this solution the same `CheckerFamily` may get a different debug name if different sub-checkers are enabled from it (in that particular run).

However, if you think that these issues are acceptable, then I'm not opposed to using your solution because I agree that the __PRETTY_FUNCTION__ hack for using the class name is ugly.

I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this. Frankly, I would have preferred if the internal checker class name would be printed there instead of the user-facing name of the checker - but given that I despise defining this for every checker as a mandatory virtual function for checker families, I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.

I also thought about tweaking CheckerFamily::getDebugName to produce results like family-of-core.DivideZero for checker families (to highlight that it is connected to a checker family and not the particular checker frontend). What would you think about this?

I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered.

@NagyDonat
Copy link
Contributor Author

I think the order is deterministic, and due to how checker dependencies are resolved, the backend checker would be always the one that is registered first. I have not checked this.

You're correct wrt the current state of the code but I'm planning to get rid of many "backend/modeling" checkers that have no real purpose 😅 -- in my opinion having a superfluous checker definition in Checkers.td is even worse than having a boilerplate getDebugName override.

There are a few modeling checkers that fulfill a valid role because they act as dependencies of checkers defined in other files/classes, but I want to eliminate the pattern where a checker class defines a modeling checker which is the dependency of the other "real" checkers from the class (to be registered first) and does nothing useful.

I opted for this solution given my very limited time frame and that I shouldn't block the progress of this PR.

Sorry for adding time pressure...

I firmly believe that as this is a debugging feature, the best name to print is the class that gets registered.

I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?)

Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a family-of- prefix to the debug names, which is a bit awkward but eliminates the ambiguity and the potential for confusion (somebody who briefly looks at some debug output will immediately see that the name identifies the family).

@steakhal
Copy link
Contributor

I agree, but unfortunately there is no way to get the class name automatically in a platform-independent way without a third-party dependency like the type_name boost library. (Do I understand it correctly that we cannot just drop in an external dependency like this?)

Using the name of one subchecker is probably the least bad available approach and your code is an elegant implementation for it, so I'll probably apply it to get this PR merged. If you're not opposed, I would like to add a family-of- prefix to the debug names, which is a bit awkward but eliminates the ambiguity and the potential for confusion (somebody who briefly looks at some debug output will immediately see that the name identifies the family).

I'd recommend you to look at the Checkers.inc file in the build folder somewhere - the file that gets generated from the Checkers.td. Notice that every checker has a registerXXX and shouldRegisterXXX function where XXX is the verbatim spelling of the checker class implementing it. We should just expose XXX just like we currently expose the similar CheckerManager::getCurrentCheckerName() (I haven't checked this, its just a strong suspicion). We could call this getCurrentCheckerClassName or something along these lines.
This should be 5-10 lines of change approximately.

Just have a look at Checkers.inc and you will see what I'm talking about.

And thanks again for sticking around. I can understand if this thread already got far longer than it should have taken.
I admit I failed to give enough attention to this thread, and now I'm pushing back because I believe the desired solution is right on the corner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants