Skip to content

Compatibility issues of dropbox destination connector on Windows #75

Open
@WHALEEYE

Description

@WHALEEYE

When uploading a file to dropbox, I saw in https://github.com/Unstructured-IO/unstructured/blob/4e61acc1c6b924a4ff643a1cc145a3f59980d18c/unstructured/ingest/connector/fsspec/fsspec.py#L338 it uses PurePath to process the output dir, which will replace all the /s to \ on Windows, and this will cause the failure of path validation in dropbox api.

Here is a traceback of this problem.

2024-07-25 18:45:39,658 SpawnPoolWorker-9 INFO     writing embeddings content to C:\Users\Isaac\.cache\unstructured\ingest\pipeline\embedded\95a2071561d93868fb84512ba24f9eb6.json
2024-07-25 18:45:39,701 MainProcess INFO     Calling Copier with 1 docs
2024-07-25 18:45:39,701 MainProcess INFO     Running copy node to move content to desired output location
2024-07-25 18:45:41,580 SpawnPoolWorker-11 INFO     Copying C:\Users\Isaac\.cache\unstructured\ingest\pipeline\embedded\95a2071561d93868fb84512ba24f9eb6.json -> local-output-to-dropbox\options.ini.json.json
2024-07-25 18:45:41,598 MainProcess INFO     uploading elements from 1 document(s) to the destination
2024-07-25 18:45:41,598 MainProcess INFO     Calling Writer with 1 docs
2024-07-25 18:45:41,598 MainProcess INFO     Running write node to upload content. Destination connector: {"write_config": {"write_text_config": null}, "connector_config": {"remote_url": "dropbox:///unstructured/", "uncompress": false, "recursive": false, "file_glob": null, "access_config": {"token": "*******"}, "protocol": "dropbox", "path_without_protocol": "/unstructured/", "dir_path": "unstructured", "file_path": ""}}]
2024-07-25 18:45:41,677 MainProcess INFO     checking connection for destination /unstructured
2024-07-25 18:45:41,879 MainProcess DEBUG    uploading content from local-output-to-dropbox\options.ini.json.json
2024-07-25 18:45:41,879 MainProcess INFO     Writing content using filesystem: DropboxDriveFileSystem
2024-07-25 18:45:41,879 MainProcess DEBUG    uploading content to dropbox://\unstructured\local-output-to-dropbox\options.ini.json.json
Traceback (most recent call last):
  File "D:\Documents\camel\camel\loaders\dropbox_upload.py", line 54, in <module>
    runner.run()
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\runner\base_runner.py", line 41, in run
    self.process_documents(
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\runner\base_runner.py", line 80, in process_documents
    process_documents(
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\processor.py", line 93, in process_documents
    pipeline.run()
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\pipeline.py", line 114, in run
    self.write_node(iterable=partitioned_jsons)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\interfaces.py", line 65, in __call__
    self.result = self.run(iterable)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\write.py", line 18, in run
    self.dest_doc_connector.write(docs=ingest_docs)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\connector\fsspec\fsspec.py", line 359, in write
    self.write_dict(elements_dict=json_list, filename=filename)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\connector\fsspec\fsspec.py", line 342, in write_dict
    fs.write_text(
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 750, in write_text
    with self.open(
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 2035, in close
    self.flush(force=True)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 1894, in flush
    self._initiate_upload()
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\dropboxdrivefs\core.py", line 285, in _initiate_upload
    self.commit = dropbox.files.CommitInfo(
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\dropbox\files.py", line 376, in __init__
    self.path = path
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\stone\backends\python_rsrc\stone_base.py", line 81, in __set__
    value = self.validator.validate(value)
  File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\stone\backends\python_rsrc\stone_validators.py", line 345, in validate
    raise ValidationError("'%s' did not match pattern '%s'"
stone.backends.python_rsrc.stone_validators.ValidationError: '\unstructured\local-output-to-dropbox\options.ini.json.json' did not match pattern '(/(.|[\r\n])*)|(ns:[0-9]+(/.*)?)|(id:.*)'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions