Skip to content

[clang] Fix the crash when dumping deserialized decls #133395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hokein
Copy link
Collaborator

@hokein hokein commented Mar 28, 2025

This fixes crashes with -dump-deserialized-decls.

The DeserializedDeclsDumper implementation has two issues:

  1. The DelegatingDeserializationListener does not fully implement all interfaces. As a result, the Previous listener object misses some critical state updates, leading to inconsistencies and eventual crashes;
  2. Print declaration names in the middle of deserialization is not safe, because the decl may still be incomplete; we should do it after the deserialization is finished;

No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change.

Copy link

github-actions bot commented Mar 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@hokein hokein changed the title [WIP] Fix the crash when dump the deserialized decl [clang] Fix the crash when dumping deserialized decls Mar 28, 2025
@hokein hokein requested review from ilya-biryukov and VitaNuo March 28, 2025 09:44
@hokein hokein marked this pull request as ready for review March 28, 2025 09:45
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Mar 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 28, 2025

@llvm/pr-subscribers-clang

Author: Haojian Wu (hokein)

Changes

This fixes crashes with -dump-deserialized-decls.

The DeserializedDeclsDumper implementation has two issues:

  1. The DelegatingDeserializationListener does not fully implement all interfaces. As a result, the Previous listener object misses some critical state updates, leading to inconsistencies and eventual crashes;
  2. Print declaration names in the middle of deserialization is not safe, because the decl may still be incomplete; we should do it after the deserialization is finished;

No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change.


Full diff: https://github.com/llvm/llvm-project/pull/133395.diff

4 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+1)
  • (modified) clang/include/clang/Serialization/ASTDeserializationListener.h (+4)
  • (modified) clang/lib/Frontend/FrontendAction.cpp (+38-6)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 04ec2cfef679c..9ae29d7e58626 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -316,6 +316,7 @@ Bug Fixes in This Version
 - Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
 - Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
   ``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
+- Fixed a module crash with `-dump-deserialized-decls`
 
 Bug Fixes to Compiler Builtins
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Serialization/ASTDeserializationListener.h b/clang/include/clang/Serialization/ASTDeserializationListener.h
index ea96faa07c191..ccbaae93f9048 100644
--- a/clang/include/clang/Serialization/ASTDeserializationListener.h
+++ b/clang/include/clang/Serialization/ASTDeserializationListener.h
@@ -27,6 +27,8 @@ class MacroInfo;
 class Module;
 class SourceLocation;
 
+// IMPORTANT: when you add a new interface to this class, please update the
+// DelegatingDeserializationListener in FrontendAction.cpp
 class ASTDeserializationListener {
 public:
   virtual ~ASTDeserializationListener();
@@ -57,6 +59,8 @@ class ASTDeserializationListener {
   /// A module import was read from the AST file.
   virtual void ModuleImportRead(serialization::SubmoduleID ID,
                                 SourceLocation ImportLoc) {}
+  /// The deserialization of the AST file was finished.
+  virtual void FinishedDeserializing() {}
 };
 }
 
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index 9f789f093f55d..416ee50593e74 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -76,6 +76,10 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
     if (Previous)
       Previous->IdentifierRead(ID, II);
   }
+  void MacroRead(serialization::MacroID ID, MacroInfo *MI) override {
+    if (Previous)
+      Previous->MacroRead(ID, MI);
+  }
   void TypeRead(serialization::TypeIdx Idx, QualType T) override {
     if (Previous)
       Previous->TypeRead(Idx, T);
@@ -93,6 +97,19 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
     if (Previous)
       Previous->MacroDefinitionRead(PPID, MD);
   }
+  void ModuleRead(serialization::SubmoduleID ID, Module *Mod) override {
+    if (Previous)
+      Previous->ModuleRead(ID, Mod);
+  }
+  void ModuleImportRead(serialization::SubmoduleID ID,
+                        SourceLocation ImportLoc) override {
+    if (Previous)
+      Previous->ModuleImportRead(ID, ImportLoc);
+  }
+  void FinishedDeserializing() override {
+    if (Previous)
+      Previous->FinishedDeserializing();
+  }
 };
 
 /// Dumps deserialized declarations.
@@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
       : DelegatingDeserializationListener(Previous, DeletePrevious) {}
 
   void DeclRead(GlobalDeclID ID, const Decl *D) override {
-    llvm::outs() << "PCH DECL: " << D->getDeclKindName();
-    if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
-      llvm::outs() << " - ";
-      ND->printQualifiedName(llvm::outs());
+    PendingDecls.push_back(D);
+    DelegatingDeserializationListener::DeclRead(ID, D);
+  }
+  void FinishedDeserializing() override {
+    auto Decls = std::move(PendingDecls);
+    for (const auto *D : Decls) {
+      llvm::outs() << "PCH DECL: " << D->getDeclKindName();
+      if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+        llvm::outs() << " - ";
+        ND->printQualifiedName(llvm::outs());
+      }
+      llvm::outs() << "\n";
     }
-    llvm::outs() << "\n";
 
-    DelegatingDeserializationListener::DeclRead(ID, D);
+    if (!PendingDecls.empty()) {
+      llvm::errs() << "Deserialized more decls while printing, total of "
+                   << PendingDecls.size() << "\n";
+      PendingDecls.clear();
+    }
+    DelegatingDeserializationListener::FinishedDeserializing();
   }
+
+private:
+  std::vector<const Decl *> PendingDecls;
 };
 
 /// Checks deserialized declarations and emits error if a name
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..9e3712226491a 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10877,6 +10877,8 @@ void ASTReader::FinishedDeserializing() {
     // decls to the consumer.
     if (Consumer)
       PassInterestingDeclsToConsumer();
+    if (DeserializationListener)
+      DeserializationListener->FinishedDeserializing();
   }
 }
 

@llvmbot
Copy link
Member

llvmbot commented Mar 28, 2025

@llvm/pr-subscribers-clang-modules

Author: Haojian Wu (hokein)

Changes

This fixes crashes with -dump-deserialized-decls.

The DeserializedDeclsDumper implementation has two issues:

  1. The DelegatingDeserializationListener does not fully implement all interfaces. As a result, the Previous listener object misses some critical state updates, leading to inconsistencies and eventual crashes;
  2. Print declaration names in the middle of deserialization is not safe, because the decl may still be incomplete; we should do it after the deserialization is finished;

No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change.


Full diff: https://github.com/llvm/llvm-project/pull/133395.diff

4 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+1)
  • (modified) clang/include/clang/Serialization/ASTDeserializationListener.h (+4)
  • (modified) clang/lib/Frontend/FrontendAction.cpp (+38-6)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 04ec2cfef679c..9ae29d7e58626 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -316,6 +316,7 @@ Bug Fixes in This Version
 - Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
 - Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
   ``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
+- Fixed a module crash with `-dump-deserialized-decls`
 
 Bug Fixes to Compiler Builtins
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Serialization/ASTDeserializationListener.h b/clang/include/clang/Serialization/ASTDeserializationListener.h
index ea96faa07c191..ccbaae93f9048 100644
--- a/clang/include/clang/Serialization/ASTDeserializationListener.h
+++ b/clang/include/clang/Serialization/ASTDeserializationListener.h
@@ -27,6 +27,8 @@ class MacroInfo;
 class Module;
 class SourceLocation;
 
+// IMPORTANT: when you add a new interface to this class, please update the
+// DelegatingDeserializationListener in FrontendAction.cpp
 class ASTDeserializationListener {
 public:
   virtual ~ASTDeserializationListener();
@@ -57,6 +59,8 @@ class ASTDeserializationListener {
   /// A module import was read from the AST file.
   virtual void ModuleImportRead(serialization::SubmoduleID ID,
                                 SourceLocation ImportLoc) {}
+  /// The deserialization of the AST file was finished.
+  virtual void FinishedDeserializing() {}
 };
 }
 
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index 9f789f093f55d..416ee50593e74 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -76,6 +76,10 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
     if (Previous)
       Previous->IdentifierRead(ID, II);
   }
+  void MacroRead(serialization::MacroID ID, MacroInfo *MI) override {
+    if (Previous)
+      Previous->MacroRead(ID, MI);
+  }
   void TypeRead(serialization::TypeIdx Idx, QualType T) override {
     if (Previous)
       Previous->TypeRead(Idx, T);
@@ -93,6 +97,19 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
     if (Previous)
       Previous->MacroDefinitionRead(PPID, MD);
   }
+  void ModuleRead(serialization::SubmoduleID ID, Module *Mod) override {
+    if (Previous)
+      Previous->ModuleRead(ID, Mod);
+  }
+  void ModuleImportRead(serialization::SubmoduleID ID,
+                        SourceLocation ImportLoc) override {
+    if (Previous)
+      Previous->ModuleImportRead(ID, ImportLoc);
+  }
+  void FinishedDeserializing() override {
+    if (Previous)
+      Previous->FinishedDeserializing();
+  }
 };
 
 /// Dumps deserialized declarations.
@@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
       : DelegatingDeserializationListener(Previous, DeletePrevious) {}
 
   void DeclRead(GlobalDeclID ID, const Decl *D) override {
-    llvm::outs() << "PCH DECL: " << D->getDeclKindName();
-    if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
-      llvm::outs() << " - ";
-      ND->printQualifiedName(llvm::outs());
+    PendingDecls.push_back(D);
+    DelegatingDeserializationListener::DeclRead(ID, D);
+  }
+  void FinishedDeserializing() override {
+    auto Decls = std::move(PendingDecls);
+    for (const auto *D : Decls) {
+      llvm::outs() << "PCH DECL: " << D->getDeclKindName();
+      if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+        llvm::outs() << " - ";
+        ND->printQualifiedName(llvm::outs());
+      }
+      llvm::outs() << "\n";
     }
-    llvm::outs() << "\n";
 
-    DelegatingDeserializationListener::DeclRead(ID, D);
+    if (!PendingDecls.empty()) {
+      llvm::errs() << "Deserialized more decls while printing, total of "
+                   << PendingDecls.size() << "\n";
+      PendingDecls.clear();
+    }
+    DelegatingDeserializationListener::FinishedDeserializing();
   }
+
+private:
+  std::vector<const Decl *> PendingDecls;
 };
 
 /// Checks deserialized declarations and emits error if a name
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..9e3712226491a 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10877,6 +10877,8 @@ void ASTReader::FinishedDeserializing() {
     // decls to the consumer.
     if (Consumer)
       PassInterestingDeclsToConsumer();
+    if (DeserializationListener)
+      DeserializationListener->FinishedDeserializing();
   }
 }
 

Copy link
Contributor

@VitaNuo VitaNuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hokein!

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two parts to this patch:

  1. forwarding more methods properly
  2. updating the interface, adding new callbacks and changing the behavior of `-dump-deserialized-decls``

I think (1) is a no-brainer. I would be very eager to LGTM it right away, but could you send it as a separate PR? (Especially if it fixes the crash)

The part (2) is a bit more complicated and it would be great to explore the solution space a little more. I've left a comment about reentrancy into the callback that worries me, please take a look.

Another point that we want to be explicit about is that this would change the buffering of -dump-deserialized-decls. It's definitely better than crashing, but we should at least mention in the patch description that the we are now doing extra buffering and certain declarations might be printed later than before.

@@ -27,6 +27,8 @@ class MacroInfo;
class Module;
class SourceLocation;

// IMPORTANT: when you add a new interface to this class, please update the
// DelegatingDeserializationListener in FrontendAction.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future idea: maybe we could move the DelegatingDeserializationListener here or into the Serialization folder?
It is generic and not specific to FrontendAction, so it probably makes more sense to have it here, even if it's only used in FrontendAction. That makes changes like this more localized.

@@ -57,6 +59,8 @@ class ASTDeserializationListener {
/// A module import was read from the AST file.
virtual void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) {}
/// The deserialization of the AST file was finished.
virtual void FinishedDeserializing() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have StartDeserializing in the interface or none of the two.
This should help with the reentrancy question for the implementation that I have below too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess the interface name might be misleading -- it doesn't fully align with ASTReader::FinishedDeserializing(). The listener is only called once when deserialization is completely finished (i.e., when NumCurrentElementsDeserializing reaches 0 and all pending actions have been processed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw that. But in any case you probably want the other callback too.
Having both callbacks would allow to add assertions about the invariants implementation expect. E.g. my concerns about reentrancy could be expressed as bool IsDeserializing field and a few assertions.

@@ -57,6 +59,8 @@ class ASTDeserializationListener {
/// A module import was read from the AST file.
virtual void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) {}
/// The deserialization of the AST file was finished.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document that while causing more serialization in the callbacks may have complicated side-effects and the implementors should be careful?

I believe it's a very non-trivial details that folks reading the code should be warned about.


DelegatingDeserializationListener::DeclRead(ID, D);
if (!PendingDecls.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our theory is that printQualifiedName can start deserializing more.

At the point where the callback is called now, we have already updated the state of ASTReader such that more deserialization will cause this callback to be called again. I.e. this should be a possible stack trace:

OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()
ASTReader::doSomeDeserialization()
...
Decl::printQualifiedName()
OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()

We probably need to figure out an API that does not require handling situations like this.
E.g. right now it means (I believe) that this condition will never be true: the recursive serialization will just remember all other decls and print their names and clear the PendingDecls vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our theory is that printQualifiedName can start deserializing more.

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?

Currently, the contract for OurListener::FinishedDeserializing states that it is called only after deserialization is fully completed. If the implementation of OurListener::FinishedDeserializing itself triggers further deserialization, that seems to break this contract. One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?

No, we cannot assume that. E.g. we can load a function without a body and requesting a body at any other point in code may cause deserialization of the body itself, all declarations it references and so on.

after deserialization is fully completed.

Deserializations get started and completed throughout the program many times and it's generally fine.

One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.

I don't think this works, actually. It's very hard to write code that does not deserialize. And it's probably not necessary to actually have that level of scrutiny. Deserializing from inside the callbacks in the deserialization itself is cheesy, but deserializing more outside of the deserialization is a perfectly valid use-case.

I would recommend a different approach and instead putting it on the author of the interface to figure out when they want to process their results.
E.g. HandleTranslationUnit or HandleTopLevelDecl from ASTConsumer are safe places. One happens once per invocation, another one happens more often (if you need more gradual results). Some users might prefer EndSourceFile.

We would require wiring up other callbacks (ASTConsumer, the whole FrontendAction, etc) in addition to deserialization listener. But that's already the case, e.g.PPCallbacks rarely live outside something else.
So maybe just remove this callback and rely on other ways to output the buffered declarations? How does that sound? And for the problem at hand...

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

So maybe it crashed simply because we did not propagate the other callbacks? If that's the case, a more narrow change that does not add more methods to the interface would be enough. Or is that not the case?

@@ -316,6 +316,7 @@ Bug Fixes in This Version
- Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
- Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
- Fixed a module crash with `-dump-deserialized-decls`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: is it -Xclang=-dump-deserialized-decls? Users might be confused if they decide to try it out simply because it's mentioned in the release notes.

@hokein
Copy link
Collaborator Author

hokein commented Mar 28, 2025

There are two parts to this patch:

  1. forwarding more methods properly
  2. updating the interface, adding new callbacks and changing the behavior of `-dump-deserialized-decls``

I think (1) is a no-brainer. I would be very eager to LGTM it right away, but could you send it as a separate PR? (Especially if it fixes the crash)

The part (2) is a bit more complicated and it would be great to explore the solution space a little more. I've left a comment about reentrancy into the callback that worries me, please take a look.

Another point that we want to be explicit about is that this would change the buffering of -dump-deserialized-decls. It's definitely better than crashing, but we should at least mention in the patch description that the we are now doing extra buffering and certain declarations might be printed later than before.

Applying only (1) will not fix the crash, as the issue with calling printQualifiedName in the middle of deserialization still remains.

I'm happy to split the patch. I agree that (2) requires more thought and careful consideration. I'm not sure I'll have time to finish it before my leave, so I might have to leave it to you. But I can take care of (1).

@ilya-biryukov
Copy link
Contributor

I'm happy to split the patch. I agree that (2) requires more thought and careful consideration. I'm not sure I'll have time to finish it before my leave, so I might have to leave it to you. But I can take care of (1).

Yeah, I feel it makes sense to abandon this until you are back or we really need it. Forwarding the callbacks is a clear improvement anyway.

hokein added a commit that referenced this pull request Mar 28, 2025
…onListener (#133424)

Split from the #133395 per the
review comment.

This patch also moves the `DelegatingDeserializationListener` close to
`ASTDeserializationListener`.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 28, 2025
…serializationListener (#133424)

Split from the llvm/llvm-project#133395 per the
review comment.

This patch also moves the `DelegatingDeserializationListener` close to
`ASTDeserializationListener`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants