Skip to content

[clang][modules] Lazily load by name lookups in module maps #132853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Bigcheese
Copy link
Contributor

Instead of eagerly populating the clang::ModuleMap when looking up a module by name, this patch changes HeaderSearch to only load the modules that are actually used.

This introduces ModuleMap::findOrLoadModule which will load modules from parsed but not loaded module maps. This cannot be used anywhere that the module loading code calls into as it can create infinite recursion.

This currently just reparses module maps when looking up a module by header. This is fine as redeclarations are allowed from the same file, but future patches will also make looking up a module by header lazy.

This patch changes the shadow.m test to use explicitly built modules and #import. This test and the shadow feature are very brittle and do not work in general. The test relied on pcm files being left behind by prior failing clang invocations that were then reused by the last invocation. If you clean the cache then the last invocation will always fail. This is because the input module map and the -fmodule-map-file= module map are parsed in the same module scope, and -fmodule-map-file= is forwarded to implicit module builds. That means you are guaranteed to hit a module redeclaration error if the TU actually imports the module it is trying to shadow.

This patch changes when we load A2's module map to after the A module has been loaded, which sets the IsFromModuleFile bit on A. This means that A2's A is skipped entirely instead of creating a shadow module, and we get textual inclusion. It is possible to construct a case where this would happen before this patch too.

An upcoming patch in this series will rework shadowing to work in the general case, but that's only possible once header -> module lookup is lazy too.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-lldb
@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-llvm-support
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Michael Spencer (Bigcheese)

Changes

Instead of eagerly populating the clang::ModuleMap when looking up a module by name, this patch changes HeaderSearch to only load the modules that are actually used.

This introduces ModuleMap::findOrLoadModule which will load modules from parsed but not loaded module maps. This cannot be used anywhere that the module loading code calls into as it can create infinite recursion.

This currently just reparses module maps when looking up a module by header. This is fine as redeclarations are allowed from the same file, but future patches will also make looking up a module by header lazy.

This patch changes the shadow.m test to use explicitly built modules and #import. This test and the shadow feature are very brittle and do not work in general. The test relied on pcm files being left behind by prior failing clang invocations that were then reused by the last invocation. If you clean the cache then the last invocation will always fail. This is because the input module map and the -fmodule-map-file= module map are parsed in the same module scope, and -fmodule-map-file= is forwarded to implicit module builds. That means you are guaranteed to hit a module redeclaration error if the TU actually imports the module it is trying to shadow.

This patch changes when we load A2's module map to after the A module has been loaded, which sets the IsFromModuleFile bit on A. This means that A2's A is skipped entirely instead of creating a shadow module, and we get textual inclusion. It is possible to construct a case where this would happen before this patch too.

An upcoming patch in this series will rework shadowing to work in the general case, but that's only possible once header -> module lookup is lazy too.


Patch is 31.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132853.diff

17 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticGroups.td (+1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+6)
  • (modified) clang/include/clang/Lex/HeaderSearch.h (+39-2)
  • (modified) clang/include/clang/Lex/ModuleMap.h (+20)
  • (modified) clang/include/clang/Lex/ModuleMapFile.h (+9)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+2-2)
  • (modified) clang/lib/Lex/HeaderSearch.cpp (+89-14)
  • (modified) clang/lib/Lex/ModuleMap.cpp (+139-12)
  • (modified) clang/lib/Lex/ModuleMapFile.cpp (+3)
  • (modified) clang/lib/Sema/SemaModule.cpp (+1-1)
  • (added) clang/test/Modules/Inputs/shadow/A1/A1.h ()
  • (modified) clang/test/Modules/Inputs/shadow/A1/module.modulemap (+3-1)
  • (added) clang/test/Modules/Inputs/shadow/A2/A2.h ()
  • (modified) clang/test/Modules/Inputs/shadow/A2/module.modulemap (+3-1)
  • (added) clang/test/Modules/lazy-by-name-lookup.c (+31)
  • (modified) clang/test/Modules/modulemap-locations.m (+2-1)
  • (modified) clang/test/Modules/shadow.m (+6-5)
diff --git a/clang/include/clang/Basic/DiagnosticGroups.td b/clang/include/clang/Basic/DiagnosticGroups.td
index b9f08d96151c9..1abb63ba3aea6 100644
--- a/clang/include/clang/Basic/DiagnosticGroups.td
+++ b/clang/include/clang/Basic/DiagnosticGroups.td
@@ -576,6 +576,7 @@ def ModuleImport : DiagGroup<"module-import">;
 def ModuleConflict : DiagGroup<"module-conflict">;
 def ModuleFileExtension : DiagGroup<"module-file-extension">;
 def ModuleIncludeDirectiveTranslation : DiagGroup<"module-include-translation">;
+def ModuleMap : DiagGroup<"module-map">;
 def RoundTripCC1Args : DiagGroup<"round-trip-cc1-args">;
 def NewlineEOF : DiagGroup<"newline-eof">;
 def Nullability : DiagGroup<"nullability">;
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 912b8bd46e194..a6866ef868dcd 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -836,6 +836,12 @@ def warn_pp_date_time : Warning<
   ShowInSystemHeader, DefaultIgnore, InGroup<DiagGroup<"date-time">>;
 
 // Module map parsing
+def remark_mmap_parse : Remark<
+  "parsing modulemap '%0'">, ShowInSystemHeader, InGroup<ModuleMap>;
+def remark_mmap_load : Remark<
+  "loading modulemap '%0'">, ShowInSystemHeader, InGroup<ModuleMap>;
+def remark_mmap_load_module : Remark<
+  "loading parsed module '%0'">, ShowInSystemHeader, InGroup<ModuleMap>;
 def err_mmap_unknown_token : Error<"skipping stray token">;
 def err_mmap_expected_module : Error<"expected module declaration">;
 def err_mmap_expected_module_name : Error<"expected module name">;
diff --git a/clang/include/clang/Lex/HeaderSearch.h b/clang/include/clang/Lex/HeaderSearch.h
index f3dac905318c6..2c1e245fbfd37 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -332,13 +332,24 @@ class HeaderSearch {
   /// The mapping between modules and headers.
   mutable ModuleMap ModMap;
 
+  enum ModuleMapDirectoryState {
+    MMDS_Parsed,
+    MMDS_Loaded,
+    MMDS_Invalid,
+  };
+
   /// Describes whether a given directory has a module map in it.
-  llvm::DenseMap<const DirectoryEntry *, bool> DirectoryHasModuleMap;
+  llvm::DenseMap<const DirectoryEntry *, ModuleMapDirectoryState>
+      DirectoryHasModuleMap;
 
   /// Set of module map files we've already loaded, and a flag indicating
   /// whether they were valid or not.
   llvm::DenseMap<const FileEntry *, bool> LoadedModuleMaps;
 
+  /// Set of module map files we've already pre-parsed, and a flag indicating
+  /// whether they were valid or not.
+  llvm::DenseMap<const FileEntry *, bool> PreParsedModuleMaps;
+
   // A map of discovered headers with their associated include file name.
   llvm::DenseMap<const FileEntry *, llvm::SmallString<64>> IncludeNames;
 
@@ -435,7 +446,7 @@ class HeaderSearch {
 
   /// Consider modules when including files from this directory.
   void setDirectoryHasModuleMap(const DirectoryEntry* Dir) {
-    DirectoryHasModuleMap[Dir] = true;
+    DirectoryHasModuleMap[Dir] = MMDS_Loaded;
   }
 
   /// Forget everything we know about headers so far.
@@ -717,6 +728,23 @@ class HeaderSearch {
                          unsigned *Offset = nullptr,
                          StringRef OriginalModuleMapFile = StringRef());
 
+  /// Read the contents of the given module map file.
+  ///
+  /// \param File The module map file.
+  /// \param IsSystem Whether this file is in a system header directory.
+  /// \param ID If the module map file is already mapped (perhaps as part of
+  ///        processing a preprocessed module), the ID of the file.
+  /// \param Offset [inout] An offset within ID to start parsing. On exit,
+  ///        filled by the end of the parsed contents (either EOF or the
+  ///        location of an end-of-module-map pragma).
+  /// \param OriginalModuleMapFile The original path to the module map file,
+  ///        used to resolve paths within the module (this is required when
+  ///        building the module from preprocessed source).
+  /// \returns true if an error occurred, false otherwise.
+  bool preLoadModuleMapFile(FileEntryRef File, bool IsSystem,
+                            FileID ID = FileID(),
+                            StringRef OriginalModuleMapFile = StringRef());
+
   /// Collect the set of all known, top-level modules.
   ///
   /// \param Modules Will be filled with the set of known, top-level modules.
@@ -936,6 +964,10 @@ class HeaderSearch {
                                             FileID ID = FileID(),
                                             unsigned *Offset = nullptr);
 
+  LoadModuleMapResult parseModuleMapFileImpl(FileEntryRef File, bool IsSystem,
+                                             DirectoryEntryRef Dir,
+                                             FileID ID = FileID());
+
   /// Try to load the module map file in the given directory.
   ///
   /// \param DirName The name of the directory where we will look for a module
@@ -958,6 +990,11 @@ class HeaderSearch {
   /// named directory.
   LoadModuleMapResult loadModuleMapFile(DirectoryEntryRef Dir, bool IsSystem,
                                         bool IsFramework);
+
+  LoadModuleMapResult parseModuleMapFile(StringRef DirName, bool IsSystem,
+                                         bool IsFramework);
+  LoadModuleMapResult parseModuleMapFile(DirectoryEntryRef Dir, bool IsSystem,
+                                         bool IsFramework);
 };
 
 /// Apply the header search options to get given HeaderSearch object.
diff --git a/clang/include/clang/Lex/ModuleMap.h b/clang/include/clang/Lex/ModuleMap.h
index 9de1b3b546c11..8c460778891ba 100644
--- a/clang/include/clang/Lex/ModuleMap.h
+++ b/clang/include/clang/Lex/ModuleMap.h
@@ -18,6 +18,7 @@
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/Module.h"
 #include "clang/Basic/SourceLocation.h"
+#include "clang/Lex/ModuleMapFile.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/DenseSet.h"
@@ -267,6 +268,18 @@ class ModuleMap {
   /// Describes whether we haved parsed a particular file as a module
   /// map.
   llvm::DenseMap<const FileEntry *, bool> ParsedModuleMap;
+  llvm::DenseMap<const FileEntry *, const modulemap::ModuleMapFile *>
+      PreParsedModuleMap;
+
+  std::vector<std::unique_ptr<modulemap::ModuleMapFile>> PreParsedModuleMaps;
+
+  /// Map from top level module name to a list of ModuleDecls in the order they
+  /// were discovered. This allows handling shadowing correctly and diagnosing
+  /// redefinitions.
+  llvm::StringMap<SmallVector<std::pair<const modulemap::ModuleMapFile *,
+                                        const modulemap::ModuleDecl *>,
+                              1>>
+      PreParsedModules;
 
   /// Resolve the given export declaration into an actual export
   /// declaration.
@@ -483,6 +496,8 @@ class ModuleMap {
   /// \returns The named module, if known; otherwise, returns null.
   Module *findModule(StringRef Name) const;
 
+  Module *findOrLoadModule(StringRef Name);
+
   Module *findOrInferSubmodule(Module *Parent, StringRef Name);
 
   /// Retrieve a module with the given name using lexical name lookup,
@@ -698,6 +713,11 @@ class ModuleMap {
   void addHeader(Module *Mod, Module::Header Header,
                  ModuleHeaderRole Role, bool Imported = false);
 
+  /// Parse a module map without creating `clang::Module` instances.
+  bool preParseModuleMapFile(FileEntryRef File, bool IsSystem,
+                             DirectoryEntryRef Dir, FileID ID = FileID(),
+                             SourceLocation ExternModuleLoc = SourceLocation());
+
   /// Parse the given module map file, and record any modules we
   /// encounter.
   ///
diff --git a/clang/include/clang/Lex/ModuleMapFile.h b/clang/include/clang/Lex/ModuleMapFile.h
index 1219cc2b50753..7d0e36e9ab86c 100644
--- a/clang/include/clang/Lex/ModuleMapFile.h
+++ b/clang/include/clang/Lex/ModuleMapFile.h
@@ -133,8 +133,17 @@ using TopLevelDecl = std::variant<ModuleDecl, ExternModuleDecl>;
 /// This holds many reference types (StringRef, SourceLocation, etc.) whose
 /// lifetimes are bound by the SourceManager and FileManager used.
 struct ModuleMapFile {
+  /// The FileID used to parse this module map. This is always a local ID.
+  FileID ID;
+
+  /// The directory in which the module map was discovered. Declarations in
+  /// the module map are relative to this directory.
+  OptionalDirectoryEntryRef Dir;
+
   /// Beginning of the file, used for moduleMapFileRead callback.
   SourceLocation Start;
+
+  bool IsSystem;
   std::vector<TopLevelDecl> Decls;
 
   void dump(llvm::raw_ostream &out) const;
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index bff5326e89973..162961bd84065 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -579,13 +579,13 @@ struct ReadModuleNames : ASTReaderListener {
     ModuleMap &MM = PP.getHeaderSearchInfo().getModuleMap();
     for (const std::string &LoadedModule : LoadedModules)
       MM.cacheModuleLoad(*PP.getIdentifierInfo(LoadedModule),
-                         MM.findModule(LoadedModule));
+                         MM.findOrLoadModule(LoadedModule));
     LoadedModules.clear();
   }
 
   void markAllUnavailable() {
     for (const std::string &LoadedModule : LoadedModules) {
-      if (Module *M = PP.getHeaderSearchInfo().getModuleMap().findModule(
+      if (Module *M = PP.getHeaderSearchInfo().getModuleMap().findOrLoadModule(
               LoadedModule)) {
         M->HasIncompatibleModuleFile = true;
 
diff --git a/clang/lib/Lex/HeaderSearch.cpp b/clang/lib/Lex/HeaderSearch.cpp
index ad9263f2994f2..90ab460e98b39 100644
--- a/clang/lib/Lex/HeaderSearch.cpp
+++ b/clang/lib/Lex/HeaderSearch.cpp
@@ -300,7 +300,7 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName,
                                    SourceLocation ImportLoc, bool AllowSearch,
                                    bool AllowExtraModuleMapSearch) {
   // Look in the module map to determine if there is a module by this name.
-  Module *Module = ModMap.findModule(ModuleName);
+  Module *Module = ModMap.findOrLoadModule(ModuleName);
   if (Module || !AllowSearch || !HSOpts->ImplicitModuleMaps)
     return Module;
 
@@ -360,11 +360,11 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName, StringRef SearchName,
     // checked
     DirectoryEntryRef NormalDir = *Dir.getDirRef();
     // Search for a module map file in this directory.
-    if (loadModuleMapFile(NormalDir, IsSystem,
-                          /*IsFramework*/false) == LMM_NewlyLoaded) {
+    if (parseModuleMapFile(NormalDir, IsSystem,
+                           /*IsFramework*/ false) == LMM_NewlyLoaded) {
       // We just loaded a module map file; check whether the module is
       // available now.
-      Module = ModMap.findModule(ModuleName);
+      Module = ModMap.findOrLoadModule(ModuleName);
       if (Module)
         break;
     }
@@ -374,10 +374,10 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName, StringRef SearchName,
     SmallString<128> NestedModuleMapDirName;
     NestedModuleMapDirName = Dir.getDirRef()->getName();
     llvm::sys::path::append(NestedModuleMapDirName, ModuleName);
-    if (loadModuleMapFile(NestedModuleMapDirName, IsSystem,
-                          /*IsFramework*/false) == LMM_NewlyLoaded){
+    if (parseModuleMapFile(NestedModuleMapDirName, IsSystem,
+                           /*IsFramework*/ false) == LMM_NewlyLoaded) {
       // If we just loaded a module map file, look for the module again.
-      Module = ModMap.findModule(ModuleName);
+      Module = ModMap.findOrLoadModule(ModuleName);
       if (Module)
         break;
     }
@@ -394,7 +394,7 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName, StringRef SearchName,
         loadSubdirectoryModuleMaps(Dir);
 
       // Look again for the module.
-      Module = ModMap.findModule(ModuleName);
+      Module = ModMap.findOrLoadModule(ModuleName);
       if (Module)
         break;
     }
@@ -1583,7 +1583,7 @@ bool HeaderSearch::hasModuleMap(StringRef FileName,
       // Success. All of the directories we stepped through inherit this module
       // map file.
       for (unsigned I = 0, N = FixUpDirectories.size(); I != N; ++I)
-        DirectoryHasModuleMap[FixUpDirectories[I]] = true;
+        DirectoryHasModuleMap[FixUpDirectories[I]] = MMDS_Loaded;
       return true;
 
     case LMM_NoDirectory:
@@ -1801,6 +1801,33 @@ HeaderSearch::loadModuleMapFileImpl(FileEntryRef File, bool IsSystem,
   return LMM_NewlyLoaded;
 }
 
+HeaderSearch::LoadModuleMapResult
+HeaderSearch::parseModuleMapFileImpl(FileEntryRef File, bool IsSystem,
+                                     DirectoryEntryRef Dir, FileID ID) {
+  // Check whether we've already parsed this module map, and mark it as being
+  // parsed in case we recursively try to parse it from itself.
+  auto AddResult = PreParsedModuleMaps.insert(std::make_pair(File, true));
+  if (!AddResult.second)
+    return AddResult.first->second ? LMM_AlreadyLoaded : LMM_InvalidModuleMap;
+
+  if (ModMap.preParseModuleMapFile(File, IsSystem, Dir, ID)) {
+    PreParsedModuleMaps[File] = false;
+    return LMM_InvalidModuleMap;
+  }
+
+  // Try to load a corresponding private module map.
+  if (OptionalFileEntryRef PMMFile =
+          getPrivateModuleMap(File, FileMgr, Diags)) {
+    if (ModMap.preParseModuleMapFile(*PMMFile, IsSystem, Dir)) {
+      PreParsedModuleMaps[File] = false;
+      return LMM_InvalidModuleMap;
+    }
+  }
+
+  // This directory has a module map.
+  return LMM_NewlyLoaded;
+}
+
 OptionalFileEntryRef
 HeaderSearch::lookupModuleMapFile(DirectoryEntryRef Dir, bool IsFramework) {
   if (!HSOpts->ImplicitModuleMaps)
@@ -1853,7 +1880,7 @@ Module *HeaderSearch::loadFrameworkModule(StringRef Name, DirectoryEntryRef Dir,
     break;
   }
 
-  return ModMap.findModule(Name);
+  return ModMap.findOrLoadModule(Name);
 }
 
 HeaderSearch::LoadModuleMapResult
@@ -1869,8 +1896,16 @@ HeaderSearch::LoadModuleMapResult
 HeaderSearch::loadModuleMapFile(DirectoryEntryRef Dir, bool IsSystem,
                                 bool IsFramework) {
   auto KnownDir = DirectoryHasModuleMap.find(Dir);
-  if (KnownDir != DirectoryHasModuleMap.end())
-    return KnownDir->second ? LMM_AlreadyLoaded : LMM_InvalidModuleMap;
+  if (KnownDir != DirectoryHasModuleMap.end()) {
+    switch (KnownDir->second) {
+    case MMDS_Parsed:
+      break;
+    case MMDS_Loaded:
+      return LMM_AlreadyLoaded;
+    case MMDS_Invalid:
+      return LMM_InvalidModuleMap;
+    };
+  }
 
   if (OptionalFileEntryRef ModuleMapFile =
           lookupModuleMapFile(Dir, IsFramework)) {
@@ -1880,9 +1915,49 @@ HeaderSearch::loadModuleMapFile(DirectoryEntryRef Dir, bool IsSystem,
     // E.g. Foo.framework/Modules/module.modulemap
     //      ^Dir                  ^ModuleMapFile
     if (Result == LMM_NewlyLoaded)
-      DirectoryHasModuleMap[Dir] = true;
+      DirectoryHasModuleMap[Dir] = MMDS_Loaded;
+    else if (Result == LMM_InvalidModuleMap)
+      DirectoryHasModuleMap[Dir] = MMDS_Invalid;
+    return Result;
+  }
+  return LMM_InvalidModuleMap;
+}
+
+HeaderSearch::LoadModuleMapResult
+HeaderSearch::parseModuleMapFile(StringRef DirName, bool IsSystem,
+                                 bool IsFramework) {
+  if (auto Dir = FileMgr.getOptionalDirectoryRef(DirName))
+    return parseModuleMapFile(*Dir, IsSystem, IsFramework);
+
+  return LMM_NoDirectory;
+}
+
+HeaderSearch::LoadModuleMapResult
+HeaderSearch::parseModuleMapFile(DirectoryEntryRef Dir, bool IsSystem,
+                                 bool IsFramework) {
+  // Check if this modulemap has already been fully loaded, if so skip it.
+  auto KnownDir = DirectoryHasModuleMap.find(Dir);
+  if (KnownDir != DirectoryHasModuleMap.end()) {
+    switch (KnownDir->second) {
+    case MMDS_Parsed:
+    case MMDS_Loaded:
+      return LMM_AlreadyLoaded;
+    case MMDS_Invalid:
+      return LMM_InvalidModuleMap;
+    };
+  }
+
+  if (OptionalFileEntryRef ModuleMapFile =
+          lookupModuleMapFile(Dir, IsFramework)) {
+    LoadModuleMapResult Result =
+        parseModuleMapFileImpl(*ModuleMapFile, IsSystem, Dir);
+    // Add Dir explicitly in case ModuleMapFile is in a subdirectory.
+    // E.g. Foo.framework/Modules/module.modulemap
+    //      ^Dir                  ^ModuleMapFile
+    if (Result == LMM_NewlyLoaded)
+      DirectoryHasModuleMap[Dir] = MMDS_Parsed;
     else if (Result == LMM_InvalidModuleMap)
-      DirectoryHasModuleMap[Dir] = false;
+      DirectoryHasModuleMap[Dir] = MMDS_Invalid;
     return Result;
   }
   return LMM_InvalidModuleMap;
diff --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index e6985a40433ec..f44faa648ac47 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -1052,6 +1052,9 @@ Module *ModuleMap::inferFrameworkModule(DirectoryEntryRef FrameworkDir,
           bool IsFrameworkDir = Parent.ends_with(".framework");
           if (OptionalFileEntryRef ModMapFile =
                   HeaderInfo.lookupModuleMapFile(*ParentDir, IsFrameworkDir)) {
+            // TODO: Pre-parsing a module map should populate
+            //       `InferredDirectories` so we don't need to do a full load
+            //       here.
             parseModuleMapFile(*ModMapFile, Attrs.IsSystem, *ParentDir);
             inferred = InferredDirectories.find(*ParentDir);
           }
@@ -1321,6 +1324,84 @@ void ModuleMap::addHeader(Module *Mod, Module::Header Header,
     Cb->moduleMapAddHeader(HeaderEntry.getName());
 }
 
+bool ModuleMap::preParseModuleMapFile(clang::FileEntryRef File, bool IsSystem,
+                                      clang::DirectoryEntryRef Dir,
+                                      clang::FileID ID,
+                                      SourceLocation ExternModuleLoc) {
+  llvm::DenseMap<const FileEntry *, const modulemap::ModuleMapFile *>::iterator
+      Known = PreParsedModuleMap.find(File);
+  if (Known != PreParsedModuleMap.end())
+    return Known->second == nullptr;
+
+  // If the module map file wasn't already entered, do so now.
+  if (ID.isInvalid()) {
+    ID = SourceMgr.translateFile(File);
+    if (ID.isInvalid() || SourceMgr.isLoadedFileID(ID)) {
+      auto FileCharacter =
+          IsSystem ? SrcMgr::C_System_ModuleMap : SrcMgr::C_User_ModuleMap;
+      ID = SourceMgr.createFileID(File, ExternModuleLoc, FileCharacter);
+    }
+  }
+
+  std::optional<llvm::MemoryBufferRef> Buffer = SourceMgr.getBufferOrNone(ID);
+  if (!Buffer) {
+    PreParsedModuleMap[File] = nullptr;
+    return true;
+  }
+
+  Diags.Report(diag::remark_mmap_parse) << File.getName();
+  std::optional<modulemap::ModuleMapFile> MaybeMMF =
+      modulemap::parseModuleMap(ID, Dir, SourceMgr, Diags, IsSystem, nullptr);
+
+  if (!MaybeMMF) {
+    PreParsedModuleMap[File] = nullptr;
+    return true;
+  }
+
+  PreParsedModuleMaps.push_back(
+      std::make_unique<modulemap::ModuleMapFile>(std::move(*MaybeMMF)));
+  const modulemap::ModuleMapFile &MMF = *PreParsedModuleMaps.back();
+  std::vector<const modulemap::ExternModuleDecl *> PendingExternalModuleMaps;
+  for (const auto &Decl : MMF.Decls) {
+    std::visit(llvm::makeVisitor(
+                   [&](const modulemap::ModuleDecl &MD) {
+                     // Only use the first part of the name even for submodules.
+                     // This will correctly load the submodule declarations when
+                     // the module is loaded.
+                     auto &ModuleDecls =
+                         PreParsedModules[StringRef(MD.Id.front().first)];
+                     ModuleDecls.push_back(std::pair(&MMF, &MD));
+                   },
+                   [&](const modulemap::ExternModuleDecl &EMD) {
+                     PendingExternalModuleMaps.push_back(&EMD);
+                   }),
+               Decl);
+  }
+
+  for (const modulemap::ExternModuleDecl *EMD : PendingExternalModuleMaps) {
+    StringRef FileNameRef = EMD->Path;
+    SmallString<128> ModuleMapFileName;
+    if (llvm::sys::path::is_relative(FileNameRef)) {
+      ModuleMapFileName += Dir.getName();
+      llvm::sys::path::append(ModuleMapFileName, EMD->Path);
+      FileNameRef = ModuleMapFileName;
+    }
+
+    if (auto EFile =
+            SourceMgr.getFileManager().getOptionalFileRef(FileNameRef)) {
+      pr...
[truncated]

@Bigcheese
Copy link
Contributor Author

Linux failure is due to issues with case sensitivity differences.

@Bigcheese
Copy link
Contributor Author

Everything should now consistently use the parse/load terminology. I also fixed the Linux failure which was due to module_map being found twice in different ways due to subdirectory search.

Copy link

github-actions bot commented Mar 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now makes much more sense after the renames from the prep-commit. There are still some naming inconsistencies, though. For example ModuleMap::loadModuleMapFile() both parses and loads a module map file, but ModuleMap::findOrLoadModule() expects the module map file to be already parsed and only loads the module. I think there's value in making this distinction more explicit. Maybe we could say that loading a module/module map file consists of parsing and materializing? No we can consistently name functions that do one, the other, or both.

@Bigcheese
Copy link
Contributor Author

This now makes much more sense after the renames from the prep-commit. There are still some naming inconsistencies, though. For example ModuleMap::loadModuleMapFile() both parses and loads a module map file, but ModuleMap::findOrLoadModule() expects the module map file to be already parsed and only loads the module. I think there's value in making this distinction more explicit. Maybe we could say that loading a module/module map file consists of parsing and materializing? No we can consistently name functions that do one, the other, or both.

I don't think I want all 3 concepts as that's more mental complexity without much benefit. I view it as loadModuleMapFile takes a path and thus necessarily parses before loading. Could call it parseAndLoadModuleMapFile, but that's pretty long.

@jansvoboda11
Copy link
Contributor

This now makes much more sense after the renames from the prep-commit. There are still some naming inconsistencies, though. For example ModuleMap::loadModuleMapFile() both parses and loads a module map file, but ModuleMap::findOrLoadModule() expects the module map file to be already parsed and only loads the module. I think there's value in making this distinction more explicit. Maybe we could say that loading a module/module map file consists of parsing and materializing? No we can consistently name functions that do one, the other, or both.

I don't think I want all 3 concepts as that's more mental complexity without much benefit. I view it as loadModuleMapFile takes a path and thus necessarily parses before loading. Could call it parseAndLoadModuleMapFile, but that's pretty long.

I prefer all 3 concepts because they result in terse function names. But if you're opposed, parseAndLoadModuleMapFile() is still much better than the current name IMO. 👍

Instead of eagerly populating the `clang::ModuleMap` when looking up
a module by name, this patch changes `HeaderSearch` to only load the
modules that are actually used.

This introduces `ModuleMap::findOrLoadModule` which will load modules
from parsed but not loaded module maps. This cannot be used anywhere
that the module loading code calls into as it can create infinite
recursion.

This currently just reparses module maps when looking up a module by
header. This is fine as redeclarations are allowed from the same file,
but future patches will also make looking up a module by header lazy.

This patch changes the shadow.m test to use explicitly built modules
and `#import`. This test and the shadow feature are very brittle and
do not work in general. The test relied on pcm files being left behind
by prior failing clang invocations that were then reused by the last
invocation. If you clean the cache then the last invocation will
always fail. This is because the input module map and the
`-fmodule-map-file=` module map are parsed in the same module scope,
and `-fmodule-map-file=` is forwarded to implicit module builds. That
means you are guaranteed to hit a module redeclaration error if the TU
actually imports the module it is trying to shadow.

This patch changes when we load A2's module map to after the `A`
module has been loaded, which sets the `IsFromModuleFile` bit on `A`.
This means that A2's `A` is skipped entirely instead of creating a
shadow module, and we get textual inclusion. It is possible to
construct a case where this would happen before this patch too.

An upcoming patch in this series will rework shadowing to work in the
general case, but that's only possible once header -> module lookup is
lazy too.
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category clang-tools-extra lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants