-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[clang] Fix the crash when dumping deserialized decls #133395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
@llvm/pr-subscribers-clang Author: Haojian Wu (hokein) ChangesThis fixes crashes with The
No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change. Full diff: https://github.com/llvm/llvm-project/pull/133395.diff 4 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 04ec2cfef679c..9ae29d7e58626 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -316,6 +316,7 @@ Bug Fixes in This Version
- Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
- Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
+- Fixed a module crash with `-dump-deserialized-decls`
Bug Fixes to Compiler Builtins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Serialization/ASTDeserializationListener.h b/clang/include/clang/Serialization/ASTDeserializationListener.h
index ea96faa07c191..ccbaae93f9048 100644
--- a/clang/include/clang/Serialization/ASTDeserializationListener.h
+++ b/clang/include/clang/Serialization/ASTDeserializationListener.h
@@ -27,6 +27,8 @@ class MacroInfo;
class Module;
class SourceLocation;
+// IMPORTANT: when you add a new interface to this class, please update the
+// DelegatingDeserializationListener in FrontendAction.cpp
class ASTDeserializationListener {
public:
virtual ~ASTDeserializationListener();
@@ -57,6 +59,8 @@ class ASTDeserializationListener {
/// A module import was read from the AST file.
virtual void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) {}
+ /// The deserialization of the AST file was finished.
+ virtual void FinishedDeserializing() {}
};
}
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index 9f789f093f55d..416ee50593e74 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -76,6 +76,10 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->IdentifierRead(ID, II);
}
+ void MacroRead(serialization::MacroID ID, MacroInfo *MI) override {
+ if (Previous)
+ Previous->MacroRead(ID, MI);
+ }
void TypeRead(serialization::TypeIdx Idx, QualType T) override {
if (Previous)
Previous->TypeRead(Idx, T);
@@ -93,6 +97,19 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->MacroDefinitionRead(PPID, MD);
}
+ void ModuleRead(serialization::SubmoduleID ID, Module *Mod) override {
+ if (Previous)
+ Previous->ModuleRead(ID, Mod);
+ }
+ void ModuleImportRead(serialization::SubmoduleID ID,
+ SourceLocation ImportLoc) override {
+ if (Previous)
+ Previous->ModuleImportRead(ID, ImportLoc);
+ }
+ void FinishedDeserializing() override {
+ if (Previous)
+ Previous->FinishedDeserializing();
+ }
};
/// Dumps deserialized declarations.
@@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
: DelegatingDeserializationListener(Previous, DeletePrevious) {}
void DeclRead(GlobalDeclID ID, const Decl *D) override {
- llvm::outs() << "PCH DECL: " << D->getDeclKindName();
- if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
- llvm::outs() << " - ";
- ND->printQualifiedName(llvm::outs());
+ PendingDecls.push_back(D);
+ DelegatingDeserializationListener::DeclRead(ID, D);
+ }
+ void FinishedDeserializing() override {
+ auto Decls = std::move(PendingDecls);
+ for (const auto *D : Decls) {
+ llvm::outs() << "PCH DECL: " << D->getDeclKindName();
+ if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+ llvm::outs() << " - ";
+ ND->printQualifiedName(llvm::outs());
+ }
+ llvm::outs() << "\n";
}
- llvm::outs() << "\n";
- DelegatingDeserializationListener::DeclRead(ID, D);
+ if (!PendingDecls.empty()) {
+ llvm::errs() << "Deserialized more decls while printing, total of "
+ << PendingDecls.size() << "\n";
+ PendingDecls.clear();
+ }
+ DelegatingDeserializationListener::FinishedDeserializing();
}
+
+private:
+ std::vector<const Decl *> PendingDecls;
};
/// Checks deserialized declarations and emits error if a name
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..9e3712226491a 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10877,6 +10877,8 @@ void ASTReader::FinishedDeserializing() {
// decls to the consumer.
if (Consumer)
PassInterestingDeclsToConsumer();
+ if (DeserializationListener)
+ DeserializationListener->FinishedDeserializing();
}
}
|
@llvm/pr-subscribers-clang-modules Author: Haojian Wu (hokein) ChangesThis fixes crashes with The
No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change. Full diff: https://github.com/llvm/llvm-project/pull/133395.diff 4 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 04ec2cfef679c..9ae29d7e58626 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -316,6 +316,7 @@ Bug Fixes in This Version
- Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
- Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
+- Fixed a module crash with `-dump-deserialized-decls`
Bug Fixes to Compiler Builtins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Serialization/ASTDeserializationListener.h b/clang/include/clang/Serialization/ASTDeserializationListener.h
index ea96faa07c191..ccbaae93f9048 100644
--- a/clang/include/clang/Serialization/ASTDeserializationListener.h
+++ b/clang/include/clang/Serialization/ASTDeserializationListener.h
@@ -27,6 +27,8 @@ class MacroInfo;
class Module;
class SourceLocation;
+// IMPORTANT: when you add a new interface to this class, please update the
+// DelegatingDeserializationListener in FrontendAction.cpp
class ASTDeserializationListener {
public:
virtual ~ASTDeserializationListener();
@@ -57,6 +59,8 @@ class ASTDeserializationListener {
/// A module import was read from the AST file.
virtual void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) {}
+ /// The deserialization of the AST file was finished.
+ virtual void FinishedDeserializing() {}
};
}
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index 9f789f093f55d..416ee50593e74 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -76,6 +76,10 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->IdentifierRead(ID, II);
}
+ void MacroRead(serialization::MacroID ID, MacroInfo *MI) override {
+ if (Previous)
+ Previous->MacroRead(ID, MI);
+ }
void TypeRead(serialization::TypeIdx Idx, QualType T) override {
if (Previous)
Previous->TypeRead(Idx, T);
@@ -93,6 +97,19 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->MacroDefinitionRead(PPID, MD);
}
+ void ModuleRead(serialization::SubmoduleID ID, Module *Mod) override {
+ if (Previous)
+ Previous->ModuleRead(ID, Mod);
+ }
+ void ModuleImportRead(serialization::SubmoduleID ID,
+ SourceLocation ImportLoc) override {
+ if (Previous)
+ Previous->ModuleImportRead(ID, ImportLoc);
+ }
+ void FinishedDeserializing() override {
+ if (Previous)
+ Previous->FinishedDeserializing();
+ }
};
/// Dumps deserialized declarations.
@@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
: DelegatingDeserializationListener(Previous, DeletePrevious) {}
void DeclRead(GlobalDeclID ID, const Decl *D) override {
- llvm::outs() << "PCH DECL: " << D->getDeclKindName();
- if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
- llvm::outs() << " - ";
- ND->printQualifiedName(llvm::outs());
+ PendingDecls.push_back(D);
+ DelegatingDeserializationListener::DeclRead(ID, D);
+ }
+ void FinishedDeserializing() override {
+ auto Decls = std::move(PendingDecls);
+ for (const auto *D : Decls) {
+ llvm::outs() << "PCH DECL: " << D->getDeclKindName();
+ if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
+ llvm::outs() << " - ";
+ ND->printQualifiedName(llvm::outs());
+ }
+ llvm::outs() << "\n";
}
- llvm::outs() << "\n";
- DelegatingDeserializationListener::DeclRead(ID, D);
+ if (!PendingDecls.empty()) {
+ llvm::errs() << "Deserialized more decls while printing, total of "
+ << PendingDecls.size() << "\n";
+ PendingDecls.clear();
+ }
+ DelegatingDeserializationListener::FinishedDeserializing();
}
+
+private:
+ std::vector<const Decl *> PendingDecls;
};
/// Checks deserialized declarations and emits error if a name
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 0cd2cedb48dd9..9e3712226491a 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -10877,6 +10877,8 @@ void ASTReader::FinishedDeserializing() {
// decls to the consumer.
if (Consumer)
PassInterestingDeclsToConsumer();
+ if (DeserializationListener)
+ DeserializationListener->FinishedDeserializing();
}
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hokein!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parts to this patch:
- forwarding more methods properly
- updating the interface, adding new callbacks and changing the behavior of `-dump-deserialized-decls``
I think (1) is a no-brainer. I would be very eager to LGTM it right away, but could you send it as a separate PR? (Especially if it fixes the crash)
The part (2) is a bit more complicated and it would be great to explore the solution space a little more. I've left a comment about reentrancy into the callback that worries me, please take a look.
Another point that we want to be explicit about is that this would change the buffering of -dump-deserialized-decls
. It's definitely better than crashing, but we should at least mention in the patch description that the we are now doing extra buffering and certain declarations might be printed later than before.
@@ -27,6 +27,8 @@ class MacroInfo; | |||
class Module; | |||
class SourceLocation; | |||
|
|||
// IMPORTANT: when you add a new interface to this class, please update the | |||
// DelegatingDeserializationListener in FrontendAction.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future idea: maybe we could move the DelegatingDeserializationListener
here or into the Serialization
folder?
It is generic and not specific to FrontendAction
, so it probably makes more sense to have it here, even if it's only used in FrontendAction
. That makes changes like this more localized.
@@ -57,6 +59,8 @@ class ASTDeserializationListener { | |||
/// A module import was read from the AST file. | |||
virtual void ModuleImportRead(serialization::SubmoduleID ID, | |||
SourceLocation ImportLoc) {} | |||
/// The deserialization of the AST file was finished. | |||
virtual void FinishedDeserializing() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also have StartDeserializing
in the interface or none of the two.
This should help with the reentrancy question for the implementation that I have below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess the interface name might be misleading -- it doesn't fully align with ASTReader::FinishedDeserializing()
. The listener is only called once when deserialization is completely finished (i.e., when NumCurrentElementsDeserializing
reaches 0 and all pending actions have been processed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw that. But in any case you probably want the other callback too.
Having both callbacks would allow to add assertions about the invariants implementation expect. E.g. my concerns about reentrancy could be expressed as bool IsDeserializing
field and a few assertions.
@@ -57,6 +59,8 @@ class ASTDeserializationListener { | |||
/// A module import was read from the AST file. | |||
virtual void ModuleImportRead(serialization::SubmoduleID ID, | |||
SourceLocation ImportLoc) {} | |||
/// The deserialization of the AST file was finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we document that while causing more serialization in the callbacks may have complicated side-effects and the implementors should be careful?
I believe it's a very non-trivial details that folks reading the code should be warned about.
|
||
DelegatingDeserializationListener::DeclRead(ID, D); | ||
if (!PendingDecls.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our theory is that printQualifiedName
can start deserializing more.
At the point where the callback is called now, we have already updated the state of ASTReader
such that more deserialization will cause this callback to be called again. I.e. this should be a possible stack trace:
OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()
ASTReader::doSomeDeserialization()
...
Decl::printQualifiedName()
OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()
We probably need to figure out an API that does not require handling situations like this.
E.g. right now it means (I believe) that this condition will never be true: the recursive serialization will just remember all other decls and print their names and clear the PendingDecls
vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our theory is that printQualifiedName can start deserializing more.
I tested it with the crash case, and printQualifiedName
does not cause further deserialization with this change, I'm not certain now.
I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?
Currently, the contract for OurListener::FinishedDeserializing
states that it is called only after deserialization is fully completed. If the implementation of OurListener::FinishedDeserializing
itself triggers further deserialization, that seems to break this contract. One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.
I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?
No, we cannot assume that. E.g. we can load a function without a body and requesting a body at any other point in code may cause deserialization of the body itself, all declarations it references and so on.
after deserialization is fully completed.
Deserializations get started and completed throughout the program many times and it's generally fine.
One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.
I don't think this works, actually. It's very hard to write code that does not deserialize. And it's probably not necessary to actually have that level of scrutiny. Deserializing from inside the callbacks in the deserialization itself is cheesy, but deserializing more outside of the deserialization is a perfectly valid use-case.
I would recommend a different approach and instead putting it on the author of the interface to figure out when they want to process their results.
E.g. HandleTranslationUnit
or HandleTopLevelDecl
from ASTConsumer
are safe places. One happens once per invocation, another one happens more often (if you need more gradual results). Some users might prefer EndSourceFile
.
We would require wiring up other callbacks (ASTConsumer
, the whole FrontendAction
, etc) in addition to deserialization listener. But that's already the case, e.g.PPCallbacks
rarely live outside something else.
So maybe just remove this callback and rely on other ways to output the buffered declarations? How does that sound? And for the problem at hand...
I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.
So maybe it crashed simply because we did not propagate the other callbacks? If that's the case, a more narrow change that does not add more methods to the interface would be enough. Or is that not the case?
@@ -316,6 +316,7 @@ Bug Fixes in This Version | |||
- Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982) | |||
- Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where | |||
``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982) | |||
- Fixed a module crash with `-dump-deserialized-decls` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: is it -Xclang=-dump-deserialized-decls
? Users might be confused if they decide to try it out simply because it's mentioned in the release notes.
Applying only (1) will not fix the crash, as the issue with calling printQualifiedName in the middle of deserialization still remains. I'm happy to split the patch. I agree that (2) requires more thought and careful consideration. I'm not sure I'll have time to finish it before my leave, so I might have to leave it to you. But I can take care of (1). |
Yeah, I feel it makes sense to abandon this until you are back or we really need it. Forwarding the callbacks is a clear improvement anyway. |
…serializationListener (#133424) Split from the llvm/llvm-project#133395 per the review comment. This patch also moves the `DelegatingDeserializationListener` close to `ASTDeserializationListener`.
This fixes crashes with
-dump-deserialized-decls
.The
DeserializedDeclsDumper
implementation has two issues:DelegatingDeserializationListener
does not fully implement all interfaces. As a result, thePrevious
listener object misses some critical state updates, leading to inconsistencies and eventual crashes;No unittest is included -- crash was discovered by @VitaNuo in an internal isolated test which is still large, depending on the stl/absl modules. The issue no longer occurs with this change.