Skip to content

INFO:distributed.protocol.pickle:Failed to serialize #8090

Open
@afalamna

Description

@afalamna
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual/3")

df2 = dd.from_pandas(pd.DataFrame({'ngram': ngrams_list}), npartitions=4)  
csv_dask = dd.read_csv(csv_file_path)

csv_dask = csv_dask.repartition(npartitions=1).reset_index()


def emb_skill(skill):

  return embed([skill])[0].numpy()

df2 = df2.assign(embeddings = df2['ngram'].map_partitions(lambda series: series.apply(emb_skill), meta='object'))
csv_dask = csv_dask.assign(embeddings = csv_dask['hcms_skills'].map_partitions(lambda series: series.apply(emb_skill), meta='object'))

csv_dask.reset_index()

df2.compute()
csv_dask.compute()

This is a snippet of my code I want to use dask distributed to make it faster but it gives me this error whenever I am trying can you please help and tell me what I am doing wrong?

INFO:distributed.protocol.pickle:Failed to serialize (<function map_chunk at 0x7e8804b995a0>, Delayed('emb_skill-4a6ac665-a176-4bea-9f6c-c278abbf6062'), ["('from_sequence-956324e59717b7e4a6b95e21dce2c68b', 0)"], None, {}). Exception: can't pickle repeated message fields, convert to list first
2023-08-09 16:30:40,955 - distributed.protocol.core - CRITICAL - Failed to Serialize

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions