Description
Search before asking
- I searched the issues and found no similar issues.
Component
Transforms/universal/fdedup
What happened + What you expected to happen
I have setup a virtual environment and followed the mentioned steps for installing data-prep-kit. I'm testing the end-to-end pipeline examples (sample notebook and demo-with-launcher) and facing the following error:
cannot import name 'FdedupRayTransformConfiguration' from 'fdedup_transform_ray' (/opt/conda/envs/data-prep-kit/lib/python3.11/site-packages/fdedup_transform_ray.py)
Reproduction script
input_folder = "sample_data/docid_out"
output_folder = "sample_data/fdedup_out"
import os
import sys
from data_processing.utils import ParamsUtils
from fdedup_transform_ray import FdedupRayTransformConfiguration
local_conf = {
"input_folder": input_folder,
"output_folder": output_folder,
}
worker_options = {"num_cpus": 0.8}
code_location = {"github": "github", "commit_hash": "12345", "path": "path"}
fdedup_params = {
# columns used
"fdedup_doc_column": "contents",
"fdedup_id_column": "int_id_column",
"fdedup_cluster_column": "hash_column",
"data_local_config": ParamsUtils.convert_to_ast(local_conf)
}
params = common_config_params| fdedup_params
Pass commandline params
sys.argv = ParamsUtils.dict_to_req(d=params)
launch
fdedup_launcher = RayTransformLauncher(FdedupRayTransformConfiguration())
fdedup_launcher.launch()
Anything else
No response
OS
Red Hat Enterprise Linux (RHEL)
Python
3.11.x
Are you willing to submit a PR?
- Yes I am willing to submit a PR!