Skip to content

[clang] Fix the crash when dumping deserialized decls #133395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,7 @@ Bug Fixes in This Version
- Fixed a modules crash where exception specifications were not propagated properly (#GH121245, relanded in #GH129982)
- Fixed a problematic case with recursive deserialization within ``FinishedDeserializing()`` where
``PassInterestingDeclsToConsumer()`` was called before the declarations were safe to be passed. (#GH129982)
- Fixed a module crash with `-dump-deserialized-decls`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: is it -Xclang=-dump-deserialized-decls? Users might be confused if they decide to try it out simply because it's mentioned in the release notes.


Bug Fixes to Compiler Builtins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ class MacroInfo;
class Module;
class SourceLocation;

// IMPORTANT: when you add a new interface to this class, please update the
// DelegatingDeserializationListener in FrontendAction.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future idea: maybe we could move the DelegatingDeserializationListener here or into the Serialization folder?
It is generic and not specific to FrontendAction, so it probably makes more sense to have it here, even if it's only used in FrontendAction. That makes changes like this more localized.

class ASTDeserializationListener {
public:
virtual ~ASTDeserializationListener();
Expand Down Expand Up @@ -57,6 +59,8 @@ class ASTDeserializationListener {
/// A module import was read from the AST file.
virtual void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) {}
/// The deserialization of the AST file was finished.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document that while causing more serialization in the callbacks may have complicated side-effects and the implementors should be careful?

I believe it's a very non-trivial details that folks reading the code should be warned about.

virtual void FinishedDeserializing() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have StartDeserializing in the interface or none of the two.
This should help with the reentrancy question for the implementation that I have below too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess the interface name might be misleading -- it doesn't fully align with ASTReader::FinishedDeserializing(). The listener is only called once when deserialization is completely finished (i.e., when NumCurrentElementsDeserializing reaches 0 and all pending actions have been processed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw that. But in any case you probably want the other callback too.
Having both callbacks would allow to add assertions about the invariants implementation expect. E.g. my concerns about reentrancy could be expressed as bool IsDeserializing field and a few assertions.

};
}

Expand Down
44 changes: 38 additions & 6 deletions clang/lib/Frontend/FrontendAction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->IdentifierRead(ID, II);
}
void MacroRead(serialization::MacroID ID, MacroInfo *MI) override {
if (Previous)
Previous->MacroRead(ID, MI);
}
void TypeRead(serialization::TypeIdx Idx, QualType T) override {
if (Previous)
Previous->TypeRead(Idx, T);
Expand All @@ -93,6 +97,19 @@ class DelegatingDeserializationListener : public ASTDeserializationListener {
if (Previous)
Previous->MacroDefinitionRead(PPID, MD);
}
void ModuleRead(serialization::SubmoduleID ID, Module *Mod) override {
if (Previous)
Previous->ModuleRead(ID, Mod);
}
void ModuleImportRead(serialization::SubmoduleID ID,
SourceLocation ImportLoc) override {
if (Previous)
Previous->ModuleImportRead(ID, ImportLoc);
}
void FinishedDeserializing() override {
if (Previous)
Previous->FinishedDeserializing();
}
};

/// Dumps deserialized declarations.
Expand All @@ -103,15 +120,30 @@ class DeserializedDeclsDumper : public DelegatingDeserializationListener {
: DelegatingDeserializationListener(Previous, DeletePrevious) {}

void DeclRead(GlobalDeclID ID, const Decl *D) override {
llvm::outs() << "PCH DECL: " << D->getDeclKindName();
if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
llvm::outs() << " - ";
ND->printQualifiedName(llvm::outs());
PendingDecls.push_back(D);
DelegatingDeserializationListener::DeclRead(ID, D);
}
void FinishedDeserializing() override {
auto Decls = std::move(PendingDecls);
for (const auto *D : Decls) {
llvm::outs() << "PCH DECL: " << D->getDeclKindName();
if (const NamedDecl *ND = dyn_cast<NamedDecl>(D)) {
llvm::outs() << " - ";
ND->printQualifiedName(llvm::outs());
}
llvm::outs() << "\n";
}
llvm::outs() << "\n";

DelegatingDeserializationListener::DeclRead(ID, D);
if (!PendingDecls.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our theory is that printQualifiedName can start deserializing more.

At the point where the callback is called now, we have already updated the state of ASTReader such that more deserialization will cause this callback to be called again. I.e. this should be a possible stack trace:

OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()
ASTReader::doSomeDeserialization()
...
Decl::printQualifiedName()
OurListener::FinishDeserializing()
ASTReader::FinishDeserializing()

We probably need to figure out an API that does not require handling situations like this.
E.g. right now it means (I believe) that this condition will never be true: the recursive serialization will just remember all other decls and print their names and clear the PendingDecls vector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our theory is that printQualifiedName can start deserializing more.

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?

Currently, the contract for OurListener::FinishedDeserializing states that it is called only after deserialization is fully completed. If the implementation of OurListener::FinishedDeserializing itself triggers further deserialization, that seems to break this contract. One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

I think a broader question is: once deserialization is finished, can we safely assume that using a loaded declaration will never trigger additional deserialization?

No, we cannot assume that. E.g. we can load a function without a body and requesting a body at any other point in code may cause deserialization of the body itself, all declarations it references and so on.

after deserialization is fully completed.

Deserializations get started and completed throughout the program many times and it's generally fine.

One possible solution is to disallow this behavior, e.g we could add an assertion in ASTReader.cpp.

I don't think this works, actually. It's very hard to write code that does not deserialize. And it's probably not necessary to actually have that level of scrutiny. Deserializing from inside the callbacks in the deserialization itself is cheesy, but deserializing more outside of the deserialization is a perfectly valid use-case.

I would recommend a different approach and instead putting it on the author of the interface to figure out when they want to process their results.
E.g. HandleTranslationUnit or HandleTopLevelDecl from ASTConsumer are safe places. One happens once per invocation, another one happens more often (if you need more gradual results). Some users might prefer EndSourceFile.

We would require wiring up other callbacks (ASTConsumer, the whole FrontendAction, etc) in addition to deserialization listener. But that's already the case, e.g.PPCallbacks rarely live outside something else.
So maybe just remove this callback and rely on other ways to output the buffered declarations? How does that sound? And for the problem at hand...

I tested it with the crash case, and printQualifiedName does not cause further deserialization with this change, I'm not certain now.

So maybe it crashed simply because we did not propagate the other callbacks? If that's the case, a more narrow change that does not add more methods to the interface would be enough. Or is that not the case?

llvm::errs() << "Deserialized more decls while printing, total of "
<< PendingDecls.size() << "\n";
PendingDecls.clear();
}
DelegatingDeserializationListener::FinishedDeserializing();
}

private:
std::vector<const Decl *> PendingDecls;
};

/// Checks deserialized declarations and emits error if a name
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/Serialization/ASTReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10877,6 +10877,8 @@ void ASTReader::FinishedDeserializing() {
// decls to the consumer.
if (Consumer)
PassInterestingDeclsToConsumer();
if (DeserializationListener)
DeserializationListener->FinishedDeserializing();
}
}

Expand Down