Open
Description
Is your feature request related to a problem? Please describe.
I love snorkel.labeling.filter_unlabeled_dataframe()
. I want a pyspark equivalent: snorkel.labeling.filter_unlabeled_spark_rdd
or snorkel.labeling.filter_unlabeled_spark_dataframe
.
Describe the solution you'd like
Implement the same filtering for pyspark.sql.DataFrame
s or pyspark.RDD
s.
Describe alternatives you've considered
I am just implementing this myself at the moment. I don't see an alternative to this function.
Additional context
The numpy.ndarray
in for example L_train
returned by SparkLFApplier
may have to be serialized into something else so Spark can use it. SparkLFApplier
could then optionally return this format, if it makes that easier.
Activity