-
Notifications
You must be signed in to change notification settings - Fork 854
Open
Description
SynapseML version
1.0.12
System information
- Language version (e.g. python 3.8, scala 2.12): Python 3.13
- Spark Version (e.g. 3.2.3): 4.0.0
- Spark Platform (e.g. Synapse, Databricks): local
Describe the problem
Hello, i'm new to spark and pyspark. I've been using pyspark locally, to experiment, and wanted to try lightgbm to train a model. I tried to use my data that worked with a xgb model, but after following the install instructions from the main page and following the training example here, i ran in the error below.
I have lightgbm installed on python, and set up synapse when creating the spark session.
I'm using Ubuntu 22.04.
Please let me know if you need any more information about my setup or settings.
Code to reproduce issue
from pyspark.sql import SparkSession
from synapse.ml.lightgbm import LightGBMClassifier
spark = SparkSession.builder \
.config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.12") \
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") \
.getOrCreate()
from pyspark.ml.linalg import Vectors
train_df = spark.createDataFrame([
(Vectors.dense([1.0, 2.0]), 1.0),
(Vectors.dense([3.0, 4.0]), 0.0)
], ["features", "label"])
lgbm_model = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label")
lgbm_model.fit(train_df)Other info / logs
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/07/27 21:13:19 WARN Utils: Your hostname, Corei9-13900K-64GB, resolves to a loopback address: 127.0.1.1; using XXX.XXX.XXX.XX instead (on interface enp3s0)
25/07/27 21:13:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
https://mmlspark.azureedge.net/maven added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/home/matheus/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/pyspark/jars/ivy-2.5.3.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/matheus/.ivy2.5.2/cache
The jars for the packages stored in: /home/matheus/.ivy2.5.2/jars
com.microsoft.azure#synapseml_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-cca6a506-7f7b-4133-bcd6-70314f13f7a3;1.0
confs: [default]
found com.microsoft.azure#synapseml_2.12;1.0.12 in central
found com.microsoft.azure#synapseml-core_2.12;1.0.12 in central
found org.apache.spark#spark-avro_2.12;3.4.1 in central
found org.tukaani#xz;1.9 in central
found commons-lang#commons-lang;2.6 in central
found org.scalactic#scalactic_2.12;3.2.14 in central
found org.scala-lang#scala-reflect;2.12.15 in central
found io.spray#spray-json_2.12;1.3.5 in central
found com.jcraft#jsch;0.1.54 in central
found org.apache.httpcomponents.client5#httpclient5;5.1.3 in central
found org.apache.httpcomponents.core5#httpcore5;5.1.3 in central
found org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 in central
found org.slf4j#slf4j-api;1.7.25 in central
found commons-codec#commons-codec;1.15 in central
found org.apache.httpcomponents#httpmime;4.5.13 in central
found org.apache.httpcomponents#httpclient;4.5.13 in central
found org.apache.httpcomponents#httpcore;4.4.13 in central
found commons-logging#commons-logging;1.2 in central
found com.linkedin.isolation-forest#isolation-forest_3.4.2_2.12;3.0.4 in central
found com.chuusai#shapeless_2.12;2.3.10 in central
found org.testng#testng;6.8.8 in central
found org.beanshell#bsh;2.0b4 in central
found com.beust#jcommander;1.27 in central
found org.scalanlp#breeze_2.12;2.1.0 in central
found org.scalanlp#breeze-macros_2.12;2.1.0 in central
found org.typelevel#spire_2.12;0.17.0 in central
found org.typelevel#spire-macros_2.12;0.17.0 in central
found org.typelevel#algebra_2.12;2.0.1 in central
found org.typelevel#cats-kernel_2.12;2.1.1 in central
found org.typelevel#spire-platform_2.12;0.17.0 in central
found org.typelevel#spire-util_2.12;0.17.0 in central
found dev.ludovic.netlib#blas;3.0.1 in central
found net.sourceforge.f2j#arpack_combined_all;0.1 in central
found dev.ludovic.netlib#lapack;3.0.1 in central
found dev.ludovic.netlib#arpack;3.0.1 in central
found net.sf.opencsv#opencsv;2.3 in central
found com.github.wendykierp#JTransforms;3.1 in central
found pl.edu.icm#JLargeArrays;1.5 in central
found org.apache.commons#commons-math3;3.2 in central
found org.scala-lang.modules#scala-collection-compat_2.12;2.7.0 in central
found com.microsoft.azure#synapseml-deep-learning_2.12;1.0.12 in central
found com.microsoft.azure#synapseml-opencv_2.12;1.0.12 in central
found org.openpnp#opencv;3.2.0-1 in central
found com.microsoft.azure#onnx-protobuf_2.12;0.9.3 in central
found com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 in central
found com.microsoft.azure#synapseml-cognitive_2.12;1.0.12 in central
found com.microsoft.cognitiveservices.speech#client-sdk;1.24.1 in central
found com.microsoft.azure#synapseml-vw_2.12;1.0.12 in central
found com.github.vowpalwabbit#vw-jni;9.3.0 in central
found com.microsoft.azure#synapseml-lightgbm_2.12;1.0.12 in central
found com.microsoft.ml.lightgbm#lightgbmlib;3.3.510 in central
:: resolution report :: resolve 283ms :: artifacts dl 7ms
:: modules in use:
com.beust#jcommander;1.27 from central in [default]
com.chuusai#shapeless_2.12;2.3.10 from central in [default]
com.github.vowpalwabbit#vw-jni;9.3.0 from central in [default]
com.github.wendykierp#JTransforms;3.1 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.linkedin.isolation-forest#isolation-forest_3.4.2_2.12;3.0.4 from central in [default]
com.microsoft.azure#onnx-protobuf_2.12;0.9.3 from central in [default]
com.microsoft.azure#synapseml-cognitive_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml-core_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml-deep-learning_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml-lightgbm_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml-opencv_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml-vw_2.12;1.0.12 from central in [default]
com.microsoft.azure#synapseml_2.12;1.0.12 from central in [default]
com.microsoft.cognitiveservices.speech#client-sdk;1.24.1 from central in [default]
com.microsoft.ml.lightgbm#lightgbmlib;3.3.510 from central in [default]
com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 from central in [default]
commons-codec#commons-codec;1.15 from central in [default]
commons-lang#commons-lang;2.6 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
dev.ludovic.netlib#arpack;3.0.1 from central in [default]
dev.ludovic.netlib#blas;3.0.1 from central in [default]
dev.ludovic.netlib#lapack;3.0.1 from central in [default]
io.spray#spray-json_2.12;1.3.5 from central in [default]
net.sf.opencsv#opencsv;2.3 from central in [default]
net.sourceforge.f2j#arpack_combined_all;0.1 from central in [default]
org.apache.commons#commons-math3;3.2 from central in [default]
org.apache.httpcomponents#httpclient;4.5.13 from central in [default]
org.apache.httpcomponents#httpcore;4.4.13 from central in [default]
org.apache.httpcomponents#httpmime;4.5.13 from central in [default]
org.apache.httpcomponents.client5#httpclient5;5.1.3 from central in [default]
org.apache.httpcomponents.core5#httpcore5;5.1.3 from central in [default]
org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 from central in [default]
org.apache.spark#spark-avro_2.12;3.4.1 from central in [default]
org.beanshell#bsh;2.0b4 from central in [default]
org.openpnp#opencv;3.2.0-1 from central in [default]
org.scala-lang#scala-reflect;2.12.15 from central in [default]
org.scala-lang.modules#scala-collection-compat_2.12;2.7.0 from central in [default]
org.scalactic#scalactic_2.12;3.2.14 from central in [default]
org.scalanlp#breeze-macros_2.12;2.1.0 from central in [default]
org.scalanlp#breeze_2.12;2.1.0 from central in [default]
org.slf4j#slf4j-api;1.7.25 from central in [default]
org.testng#testng;6.8.8 from central in [default]
org.tukaani#xz;1.9 from central in [default]
org.typelevel#algebra_2.12;2.0.1 from central in [default]
org.typelevel#cats-kernel_2.12;2.1.1 from central in [default]
org.typelevel#spire-macros_2.12;0.17.0 from central in [default]
org.typelevel#spire-platform_2.12;0.17.0 from central in [default]
org.typelevel#spire-util_2.12;0.17.0 from central in [default]
org.typelevel#spire_2.12;0.17.0 from central in [default]
pl.edu.icm#JLargeArrays;1.5 from central in [default]
:: evicted modules:
commons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]
org.scala-lang.modules#scala-collection-compat_2.12;2.2.0 by [org.scala-lang.modules#scala-collection-compat_2.12;2.7.0] in [default]
org.apache.commons#commons-math3;3.5 by [org.apache.commons#commons-math3;3.2] in [default]
org.slf4j#slf4j-api;1.7.5 by [org.slf4j#slf4j-api;1.7.25] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 55 | 0 | 0 | 4 || 51 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-cca6a506-7f7b-4133-bcd6-70314f13f7a3
confs: [default]
0 artifacts copied, 51 already retrieved (0kB/5ms)
25/07/27 21:13:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "Thread-4" java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:184)
at py4j.ClientServerConnection.run(ClientServerConnection.java:108)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException: scala.Serializable
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
... 40 more
Exception in thread "Thread-177" java.lang.NoClassDefFoundError: scala/Serializable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(UERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/matheus/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/home/matheus/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/py4j/clientserver.py", line 540, in send_command
raise Py4JNetworkError(
"Answer from Java side is empty", when=proto.EMPTY_RESPONSE)
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matheus/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/home/matheus/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/py4j/clientserver.py", line 540, in send_command
raise Py4JNetworkError(
"Answer from Java side is empty", when=proto.EMPTY_RESPONSE)
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
RLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:184)
at py4j.ClientServerConnection.run(ClientServerConnection.java:108)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException: scala.Serializable
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
... 40 more
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[1], line 16
10 from pyspark.ml.linalg import Vectors
11 train_df = spark.createDataFrame([
12 (Vectors.dense([1.0, 2.0]), 1.0),
13 (Vectors.dense([3.0, 4.0]), 0.0)
14 ], ["features", "label"])
---> 16 lgbm_model = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label")
17 lgbm_model.fit(train_df)
File ~/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/pyspark/__init__.py:115, in keyword_only.<locals>.wrapper(self, *args, **kwargs)
113 raise TypeError("Method %s forces keyword arguments." % func.__name__)
114 self._input_kwargs = kwargs
--> 115 return func(self, **kwargs)
File ~/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/synapse/ml/lightgbm/LightGBMClassifier.py:414, in LightGBMClassifier.__init__(self, java_obj, baggingFraction, baggingFreq, baggingSeed, binSampleCount, boostFromAverage, boostingType, catSmooth, categoricalSlotIndexes, categoricalSlotNames, catl2, chunkSize, dataRandomSeed, dataTransferMode, defaultListenPort, deterministic, driverListenPort, dropRate, dropSeed, earlyStoppingRound, executionMode, extraSeed, featureFraction, featureFractionByNode, featureFractionSeed, featuresCol, featuresShapCol, fobj, improvementTolerance, initScoreCol, isEnableSparse, isProvideTrainingMetric, isUnbalance, labelCol, lambdaL1, lambdaL2, leafPredictionCol, learningRate, matrixType, maxBin, maxBinByFeature, maxCatThreshold, maxCatToOnehot, maxDeltaStep, maxDepth, maxDrop, maxNumClasses, maxStreamingOMPThreads, metric, microBatchSize, minDataInLeaf, minDataPerBin, minDataPerGroup, minGainToSplit, minSumHessianInLeaf, modelString, monotoneConstraints, monotoneConstraintsMethod, monotonePenalty, negBaggingFraction, numBatches, numIterations, numLeaves, numTasks, numThreads, objective, objectiveSeed, otherRate, parallelism, passThroughArgs, posBaggingFraction, predictDisableShapeCheck, predictionCol, probabilityCol, rawPredictionCol, referenceDataset, repartitionByGroupingColumn, samplingMode, samplingSubsetSize, seed, skipDrop, slotNames, thresholds, timeout, topK, topRate, uniformDrop, useBarrierExecutionMode, useMissing, useSingleDatasetMode, validationIndicatorCol, verbosity, weightCol, xGBoostDartMode, zeroAsMissing)
412 super(LightGBMClassifier, self).__init__()
413 if java_obj is None:
--> 414 self._java_obj = self._new_java_obj("com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier", self.uid)
415 else:
416 self._java_obj = java_obj
File ~/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/pyspark/ml/util.py:313, in try_remote_return_java_class.<locals>.wrapped(java_class, *args)
311 return java_class
312 else:
--> 313 return f(java_class, *args)
File ~/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/pyspark/ml/wrapper.py:102, in JavaWrapper._new_java_obj(java_class, *args)
100 java_obj = _jvm()
101 for name in java_class.split("."):
--> 102 java_obj = getattr(java_obj, name)
103 java_args = [_py2java(sc, arg) for arg in args]
104 return java_obj(*java_args)
File ~/Documents/Projects/PersonalProjects/fraud-detection/.venv/lib/python3.13/site-packages/py4j/java_gateway.py:1704, in JavaPackage.__getattr__(self, name)
1701 return JavaClass(
1702 answer[proto.CLASS_FQN_START:], self._gateway_client)
1703 else:
-> 1704 raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
Py4JError: com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier does not exist in the JVMWhat component(s) does this bug affect?
-
area/cognitive: Cognitive project -
area/core: Core project -
area/deep-learning: DeepLearning project -
area/lightgbm: Lightgbm project -
area/opencv: Opencv project -
area/vw: VW project -
area/website: Website -
area/build: Project build system -
area/notebooks: Samples under notebooks folder -
area/docker: Docker usage -
area/models: models related issue
What language(s) does this bug affect?
-
language/scala: Scala source code -
language/python: Pyspark APIs -
language/r: R APIs -
language/csharp: .NET APIs -
language/new: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/synapse: Azure Synapse integrations -
integrations/azureml: Azure ML integrations -
integrations/databricks: Databricks integrations