-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Hi,
I was wondering if you could help with a problem I am getting when running geoscan in a DBR 10.4 LTS cluster. After creating the dataframe with latitude, and longitude columns and trying to run a personalized geoscan, the cluster gets stuck on pending stage (in my case 62 tasks, description below).
Are there any dependencies that can cause this? Unfortunately, there is no logging in the cluster than can help me track the root cause.
Thanks,
Ivan
Code for creating the model
from geoscan import GeoscanPersonalized
import mlflow
with mlflow.start_run(run_name='GEOSCAN') as run:
geoscan = GeoscanPersonalized() \
.setLatitudeCol('lat') \
.setLongitudeCol('lon') \
.setPredictionCol('cluster') \
.setGroupedCol("type") \
.setEpsilon(20) \
.setMinPts(3)
mlflow.log_param('epsilon', 20)
mlflow.log_param('minPts', 3)
model = geoscan.fit(df)
mlflow.spark.log_model(model, "geoscan")
run_id = run.info.run_id
Task pending execution
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:250)
com.databricks.labs.gis.ml.GeoscanPersonalizedModel$GeoscanPersonalizedModelWriter.saveData(GeoscanPersonalizedModel.scala:139)
com.databricks.labs.gis.ml.GeoscanPersonalizedModel$GeoscanPersonalizedModelWriter.saveImpl(GeoscanPersonalizedModel.scala:129)
org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$5(Pipeline.scala:257)
org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:175)
org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:170)
org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:43)
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4(Pipeline.scala:257)
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4$adapted(Pipeline.scala:254)
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1(Pipeline.scala:254)
org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1$adapted(Pipeline.scala:247)
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:284)
scala.util.Try$.apply(Try.scala:213)
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:284)
org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:247)
org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:346)
simenojensen
Metadata
Metadata
Assignees
Labels
No labels