Skip to content

Meeting a problem in Gaussian Mixture clustering part #8

@Xiao-Zhanpeng

Description

@Xiao-Zhanpeng

`# Gaussian Mixture clustering
from pyspark.ml.clustering import GaussianMixture

t0 = time()
gm = GaussianMixture(k=8, maxIter=150, seed=seed, featuresCol="pca_features",
predictionCol="cluster", probabilityCol="gm_prob")

gm_pipeline = Pipeline(stages=[pca_slicer, pca, gm])
gm_model = gm_pipeline.fit(scaled_train_df)

gm_train_df = gm_model.transform(scaled_train_df).cache()
gm_cv_df = gm_model.transform(scaled_cv_df).cache()
gm_test_df = gm_model.transform(scaled_test_df).cache()

gm_params = (gm_model.stages[2].gaussiansDF.rdd
.map(lambda row: [row['mean'].toArray(), row['cov'].toArray()])
.collect())
gm_weights = gm_model.stages[2].weights

print(gm_train_df.count())
print(gm_cv_df.count())
print(gm_test_df.count())
print(time() - t0)`

When i run this part in jupyter notebook, an error appear:
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
14
15 gm_params = (gm_model.stages[2].gaussiansDF.rdd
---> 16 .map(lambda row: [row['mean'].toArray(), row['cov'].toArray()])
17 .collect())
18 gm_weights = gm_model.stages[2].weights

C:\Spark\python\pyspark\rdd.py in collect(self)
813 to be small, as all the data is loaded into the driver's memory.
814 """
--> 815 with SCCallSiteSync(self.context) as css:
816 sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
817 return list(_load_from_socket(sock_info, self._jrdd_deserializer))

C:\Spark\python\pyspark\traceback_utils.py in enter(self)
70 def enter(self):
71 if SCCallSiteSync._spark_stack_depth == 0:
---> 72 self._context._jsc.setCallSite(self._call_site)
73 SCCallSiteSync._spark_stack_depth += 1
74

AttributeError: 'NoneType' object has no attribute 'setCallSite'`

I do some research but there is few answer,some people said it‘s spark own bug.And by the way,i didn't use docker image but build the "Anaconda 3.7.6 + pyspark 2.4.5" environment to run these code.

Can you please help me solve the problem? I'll thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions