Skip to content

Commit 2e97494

Browse files
fix: fsspec connectors returning data source version as integer (#2427)
Connector data source versions should always be string values, however we were using the integer checksum value for the version for fsspec connectors. This casts that value to a string. ## Changes * Cast the checksum value to a string when assigning the version value for fsspec connectors. * Adds test to validate that these connectors will assign a string value when an integer checksum is fetched. ## Testing Unit test added.
1 parent 7378a37 commit 2e97494

File tree

3 files changed

+27
-1
lines changed

3 files changed

+27
-1
lines changed

Diff for: CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
* **Fix documentation and sample code for Chroma.** Was pointing to wrong examples..
3232
* **Fix flatten_dict to be able to flatten tuples inside dicts** Update flatten_dict function to support flattening tuples inside dicts. This is necessary for objects like Coordinates, when the object is not written to the disk, therefore not being converted to a list before getting flattened (still being a tuple).
3333
* **Fix the serialization of the Chroma destination connector.** Presence of the ChromaCollection object breaks serialization due to TypeError: cannot pickle 'module' object. This removes that object before serialization.
34+
* **Fix fsspec connectors returning version as integer.** Connector data source versions should always be string values, however we were using the integer checksum value for the version for fsspec connectors. This casts that value to a string.
3435

3536
## 0.12.0
3637

Diff for: test_unstructured_ingest/unit/test_fsspec.py

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
from unittest.mock import MagicMock, patch
2+
3+
from fsspec import AbstractFileSystem
4+
5+
from unstructured.ingest.connector.fsspec.fsspec import FsspecIngestDoc, SimpleFsspecConfig
6+
from unstructured.ingest.interfaces import ProcessorConfig, ReadConfig
7+
8+
9+
@patch("fsspec.get_filesystem_class")
10+
def test_version_is_string(mock_get_filesystem_class):
11+
"""
12+
Test that the version is a string even when the filesystem checksum is an integer.
13+
"""
14+
mock_fs = MagicMock(spec=AbstractFileSystem)
15+
mock_fs.checksum.return_value = 1234567890
16+
mock_fs.info.return_value = {"etag": ""}
17+
mock_get_filesystem_class.return_value = lambda **kwargs: mock_fs
18+
config = SimpleFsspecConfig("s3://my-bucket", access_config={})
19+
doc = FsspecIngestDoc(
20+
processor_config=ProcessorConfig(),
21+
read_config=ReadConfig(),
22+
connector_config=config,
23+
remote_file_path="test.txt",
24+
)
25+
assert isinstance(doc.source_metadata.version, str)

Diff for: unstructured/ingest/connector/fsspec/fsspec.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ def update_source_metadata(self):
134134
self.source_metadata = SourceMetadata(
135135
date_created=date_created,
136136
date_modified=date_modified,
137-
version=version,
137+
version=str(version),
138138
source_url=f"{self.connector_config.protocol}://{self.remote_file_path}",
139139
exists=file_exists,
140140
)

0 commit comments

Comments
 (0)