Open
Description
When uploading a file to dropbox, I saw in https://github.com/Unstructured-IO/unstructured/blob/4e61acc1c6b924a4ff643a1cc145a3f59980d18c/unstructured/ingest/connector/fsspec/fsspec.py#L338 it uses PurePath
to process the output dir, which will replace all the /
s to \
on Windows, and this will cause the failure of path validation in dropbox api.
Here is a traceback of this problem.
2024-07-25 18:45:39,658 SpawnPoolWorker-9 INFO writing embeddings content to C:\Users\Isaac\.cache\unstructured\ingest\pipeline\embedded\95a2071561d93868fb84512ba24f9eb6.json
2024-07-25 18:45:39,701 MainProcess INFO Calling Copier with 1 docs
2024-07-25 18:45:39,701 MainProcess INFO Running copy node to move content to desired output location
2024-07-25 18:45:41,580 SpawnPoolWorker-11 INFO Copying C:\Users\Isaac\.cache\unstructured\ingest\pipeline\embedded\95a2071561d93868fb84512ba24f9eb6.json -> local-output-to-dropbox\options.ini.json.json
2024-07-25 18:45:41,598 MainProcess INFO uploading elements from 1 document(s) to the destination
2024-07-25 18:45:41,598 MainProcess INFO Calling Writer with 1 docs
2024-07-25 18:45:41,598 MainProcess INFO Running write node to upload content. Destination connector: {"write_config": {"write_text_config": null}, "connector_config": {"remote_url": "dropbox:///unstructured/", "uncompress": false, "recursive": false, "file_glob": null, "access_config": {"token": "*******"}, "protocol": "dropbox", "path_without_protocol": "/unstructured/", "dir_path": "unstructured", "file_path": ""}}]
2024-07-25 18:45:41,677 MainProcess INFO checking connection for destination /unstructured
2024-07-25 18:45:41,879 MainProcess DEBUG uploading content from local-output-to-dropbox\options.ini.json.json
2024-07-25 18:45:41,879 MainProcess INFO Writing content using filesystem: DropboxDriveFileSystem
2024-07-25 18:45:41,879 MainProcess DEBUG uploading content to dropbox://\unstructured\local-output-to-dropbox\options.ini.json.json
Traceback (most recent call last):
File "D:\Documents\camel\camel\loaders\dropbox_upload.py", line 54, in <module>
runner.run()
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\runner\base_runner.py", line 41, in run
self.process_documents(
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\runner\base_runner.py", line 80, in process_documents
process_documents(
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\processor.py", line 93, in process_documents
pipeline.run()
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\pipeline.py", line 114, in run
self.write_node(iterable=partitioned_jsons)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\interfaces.py", line 65, in __call__
self.result = self.run(iterable)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\pipeline\write.py", line 18, in run
self.dest_doc_connector.write(docs=ingest_docs)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\connector\fsspec\fsspec.py", line 359, in write
self.write_dict(elements_dict=json_list, filename=filename)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\unstructured\ingest\connector\fsspec\fsspec.py", line 342, in write_dict
fs.write_text(
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 750, in write_text
with self.open(
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 2035, in close
self.flush(force=True)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\fsspec\spec.py", line 1894, in flush
self._initiate_upload()
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\dropboxdrivefs\core.py", line 285, in _initiate_upload
self.commit = dropbox.files.CommitInfo(
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\dropbox\files.py", line 376, in __init__
self.path = path
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\stone\backends\python_rsrc\stone_base.py", line 81, in __set__
value = self.validator.validate(value)
File "D:\Cache\pypoetry\virtualenvs\camel-ai-C9nw6eu8-py3.10\lib\site-packages\stone\backends\python_rsrc\stone_validators.py", line 345, in validate
raise ValidationError("'%s' did not match pattern '%s'"
stone.backends.python_rsrc.stone_validators.ValidationError: '\unstructured\local-output-to-dropbox\options.ini.json.json' did not match pattern '(/(.|[\r\n])*)|(ns:[0-9]+(/.*)?)|(id:.*)'