Skip to content

[Bug] cannot import name 'FdedupRayTransformConfiguration' from 'fdedup_transform_ray' #898

Open
@MFahadShahid

Description

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/universal/fdedup

What happened + What you expected to happen

I have setup a virtual environment and followed the mentioned steps for installing data-prep-kit. I'm testing the end-to-end pipeline examples (sample notebook and demo-with-launcher) and facing the following error:
cannot import name 'FdedupRayTransformConfiguration' from 'fdedup_transform_ray' (/opt/conda/envs/data-prep-kit/lib/python3.11/site-packages/fdedup_transform_ray.py)

Reproduction script

input_folder = "sample_data/docid_out"
output_folder = "sample_data/fdedup_out"

import os
import sys

from data_processing.utils import ParamsUtils
from fdedup_transform_ray import FdedupRayTransformConfiguration

local_conf = {
"input_folder": input_folder,
"output_folder": output_folder,
}
worker_options = {"num_cpus": 0.8}
code_location = {"github": "github", "commit_hash": "12345", "path": "path"}
fdedup_params = {
# columns used
"fdedup_doc_column": "contents",
"fdedup_id_column": "int_id_column",
"fdedup_cluster_column": "hash_column",
"data_local_config": ParamsUtils.convert_to_ast(local_conf)
}

params = common_config_params| fdedup_params

Pass commandline params

sys.argv = ParamsUtils.dict_to_req(d=params)

launch

fdedup_launcher = RayTransformLauncher(FdedupRayTransformConfiguration())
fdedup_launcher.launch()

Anything else

No response

OS

Red Hat Enterprise Linux (RHEL)

Python

3.11.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions