Skip to content

[Bug] Error running lang_id and code_quality kfp pipelines #1038

Open
@revit13

Description

Search before asking

  • I searched the issues and found no similar issues.

Component

KFP V1 workflows

What happened + What you expected to happen

running make workflow-test under language/lang_id gives the following error:

(orchestrate pid=273, ip=10.244.2.16) 10:35:49 INFO - Cluster resources: {'cpus': 4, 'gpus': 0, 'memory': 12.0, 'object_store': 3.1541931135579944}
(orchestrate pid=273, ip=10.244.2.16) 10:35:49 INFO - Number of workers - 2 with {'num_cpus': 0.8, 'max_restarts': -1} each
(RayTransformFileProcessor pid=273, ip=10.244.1.19) 10:35:50 ERROR - Exception creating transform  401 Client Error. (Request ID: Root=1-67aa4706-032e880645c8d1a22105e459;60b4b4c8-3d63-4ea1-97bb-42a7c98c2028)
(RayTransformFileProcessor pid=273, ip=10.244.1.19)
(RayTransformFileProcessor pid=273, ip=10.244.1.19) Repository Not Found for url: https://huggingface.co/facebook/fasttext-language-identification/resolve/main/model.bin.
(RayTransformFileProcessor pid=273, ip=10.244.1.19) Please make sure you specified the correct `repo_id` and `repo_type`.
(RayTransformFileProcessor pid=273, ip=10.244.1.19) If you are trying to access a private or gated repo, make sure you are authenticated.
(RayTransformFileProcessor pid=273, ip=10.244.1.19) Invalid credentials in Authorization header
(RayTransformFileProcessor pid=273, ip=10.244.1.19) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RayTransformFileProcessor.__init__() (pid=273, ip=10.244.1.19, actor_id=777db29ec2b2a2ac5d1202fd02000000, repr=<data_processing_ray.runtime.ray.transform_file_processor.RayTransformFileProcessor object at 0x7f45b4d5af20>)
(RayTransformFileProcessor pid=273, ip=10.244.1.19)   File "/home/ray/anaconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
(RayTransformFileProcessor pid=273, ip=10.244.1.19)     raise HTTPError(http_error_msg, response=self)
(RayTransformFileProcessor pid=273, ip=10.244.1.19) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/facebook/fasttext-language-identification/resolve/main/model.bin
(RayTransformFileProcessor pid=273, ip=10.244.1.19)
(RayTransformFileProcessor pid=273, ip=10.244.1.19) The above exception was the direct cause of the following exception:
(RayTransformFileProcessor pid=273, ip=10.244.1.19)
(RayTransformFileProcessor pid=273, ip=10.244.1.19) ray::RayTransformFileProcessor.__init__() (pid=273, ip=10.244.1.19, actor_id=777db29ec2b2a2ac5d1202fd02000000, repr=<data_processing_ray.runtime.ray.transform_file_processor.RayTransformFileProcessor object at 0x7f45b4d5af20>)
(RayTransformFileProcessor pid=273, ip=10.244.1.19)   File "/home/ray/anaconda3/lib/python3.10/site-packages/data_processing_ray/runtime/ray/transform_file_processor.py", line 48, in __init__

running make workflow-test under code/code_quality gives the following error:

(orchestrate pid=273, ip=10.244.1.16) 10:25:52 INFO - Number of workers - 2 with {'num_cpus': 0.8, 'max_restarts': -1} each
(RayTransformFileProcessor pid=272, ip=10.244.2.12) None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
(RayTransformFileProcessor pid=273, ip=10.244.2.12) None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
(RayTransformFileProcessor pid=272, ip=10.244.2.12) 10:25:55 ERROR - Exception creating transform  codeparrot/codeparrot is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
(RayTransformFileProcessor pid=272, ip=10.244.2.12) If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
(RayTransformFileProcessor pid=272, ip=10.244.2.12) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RayTransformFileProcessor.__init__() (pid=272, ip=10.244.2.12, actor_id=748e3fe7abff912a43a93f8c02000000, repr=<data_processing_ray.runtime.ray.transform_file_processor.RayTransformFileProcessor object at 0x7f363d10f1f0>)
(RayTransformFileProcessor pid=272, ip=10.244.2.12)   File "/home/ray/anaconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
(RayTransformFileProcessor pid=272, ip=10.244.2.12)     raise HTTPError(http_error_msg, response=self)
(RayTransformFileProcessor pid=272, ip=10.244.2.12) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/codeparrot/codeparrot/resolve/main/tokenizer_config.json
(RayTransformFileProcessor pid=272, ip=10.244.2.12)
(RayTransformFileProcessor pid=272, ip=10.244.2.12) The above exception was the direct cause of the following exception:
(RayTransformFileProcessor pid=272, ip=10.244.2.12)
(RayTransformFileProcessor pid=272, ip=10.244.2.12) ray::RayTransformFileProcessor.__init__() (pid=272, ip=10.244.2.12, actor_id=748e3fe7abff912a43a93f8c02000000, repr=<data_processing_ray.runtime.ray.transform_file_processor.RayTransformFileProcessor object at 0x7f363d10f1f0>)

Reproduction script

running make workflow-test under language/lang_id and code/code_quality

Anything else

No response

OS

Ubuntu

Python

3.11.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions