-
Notifications
You must be signed in to change notification settings - Fork 282
[FEA] Create a mutable URLClassLoader in the ShimLoader if no URLClassLoader is available #14506
Description
Credit goes to @eordentlich for the idea.
[gerashegalov] Context : this idea enables us to resolve a long-standing issue in spark-rapids since ShimLoader with parallel worlds was introduced. Our ScalaTests in the test module run against a plain single-shim aggregator jar instead of depending on the production dist jar, which is the main idea of having a separate test module.
Is your feature request related to a problem? Please describe.
Certain JVMs (e.g., a forked JVM created by Maven sure-fire plugin for testing) have only AppClassLoader in the classloader chain. Since JDK 9, AppClassLoader is no longer a URLClassLoader (JEP 261).
In these cases ShimLoader.findURLClassLoader() walks the entire chain, finds nothing mutable, and cannot inject shim directory URLs via addURL(). This causes ClassNotFoundException for shim classes like AwsStoragePlugin during plugin initialization:
java.lang.ExceptionInInitializerError
at com.nvidia.spark.rapids.RapidsDriverPlugin.init(Plugin.scala:475)
at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)
...
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093)
at com.udf.CudfComparisonTest.setUp(CudfComparisonTest.java:36)
...
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:495)
Caused by: java.lang.ClassNotFoundException: com.nvidia.spark.rapids.AwsStoragePlugin
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:469)
at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
at org.apache.spark.sql.rapids.execution.TrampolineUtil$.classForName(TrampolineUtil.scala:181)
at com.nvidia.spark.rapids.RapidsPluginUtils$.$anonfun$loadExtensions$1(Plugin.scala:321)
at com.nvidia.spark.rapids.RapidsPluginUtils$.loadExtensions(Plugin.scala:319)
at com.nvidia.spark.rapids.RapidsPluginUtils$.$anonfun$getExtraPlugins$3(Plugin.scala:379)
...
at com.nvidia.spark.rapids.RapidsPluginUtils$.<clinit>(Plugin.scala)
There appear to be two code paths for loading shim classes, only one handles the missing URLClassLoader case.
ShimReflectionUtils(works.)
ShimReflectionUtils.loadClass() calls ShimLoader.getShimClassLoader(). When findURLClassLoader() returns None, there is a fallback to create a MutableURLClassLoader pre-populated with shim URLs. All internal RAPIDS class loading go through this path and work fine.
loadExtensions(breaks.)
loadExtensions() is called from getExtraPlugins() to load extension classes listed in spark-rapids-extra-plugins. It calls into TrampolineUtil.classForName() which calls into Spark classForName. This resolves to the thread context classloader, which in the errant environments is AppClassLoader and not a URLClassLoader, causing the class lookup failure.
Describe the solution you'd like
loadExtensions can use the same workaround to create a mutable URLClassloader if it is not found. This can be achieved either by routing it through getShimClassLoader(), which already has the fallback, or modify updateSparkClassLoader() to implement the same fallback mechanism (creating a URLClassLoader with the sparkCL as the parent).
Describe alternatives you've considered
Currently, to workaround this issue, the end user has the following options:
-
Manually create a child
URLClassLoaderofAppClassLoaderthemselves and set it as the thread context classloader beforeSparkSession.getOrCreate(). -
Use a JVM where there is a URLCLassloader present — e.g., for Java surefire tests, disable forking so that the surefire
IsolatedClassloaderis created in the Maven JVM and can be used by the ShimLoader. But disabling forking is not always desirable as it can limit test throughput. -
Manually going through these steps to build a single-shim jar.