|
| 1 | +## 0.2.2-dev0 |
| 2 | + |
| 3 | +### Fixes |
| 4 | + |
| 5 | +- **Fix Notion Pagination** Iterate on Notion paginated results using the `next_cursor` and `start_cursor` properties. |
| 6 | + |
1 | 7 | ## 0.2.1
|
2 | 8 |
|
3 | 9 | ### Enhancements
|
4 | 10 |
|
5 |
| -* **File system based indexers return a record display name** |
6 |
| -* **Add singlestore source connector** |
7 |
| -* **Astra DB V2 Source Connector** Create a v2 version of the Astra DB Source Connector. |
| 11 | +- **File system based indexers return a record display name** |
| 12 | +- **Add singlestore source connector** |
| 13 | +- **Astra DB V2 Source Connector** Create a v2 version of the Astra DB Source Connector. |
8 | 14 |
|
9 | 15 | ### Fixes
|
10 | 16 |
|
11 |
| -* **Fix Databricks Volumes file naming** Add .json to end of upload file. |
| 17 | +- **Fix Databricks Volumes file naming** Add .json to end of upload file. |
12 | 18 |
|
13 | 19 | ## 0.2.0
|
14 | 20 |
|
15 | 21 | ### Enhancements
|
16 | 22 |
|
17 |
| -* **Add snowflake source and destination connectors** |
18 |
| -* **Migrate Slack Source Connector to V2** |
19 |
| -* **Migrate Slack Source Connector to V2** |
20 |
| -* **Add Delta Table destination to v2** |
21 |
| -* **Migrate Slack Source Connector to V2** |
| 23 | +- **Add snowflake source and destination connectors** |
| 24 | +- **Migrate Slack Source Connector to V2** |
| 25 | +- **Migrate Slack Source Connector to V2** |
| 26 | +- **Add Delta Table destination to v2** |
| 27 | +- **Migrate Slack Source Connector to V2** |
22 | 28 |
|
23 | 29 | ## 0.1.1
|
24 | 30 |
|
25 | 31 | ### Enhancements
|
26 | 32 |
|
27 |
| -* **Update KDB.AI vectorstore integration to 1.4** |
28 |
| -* **Add sqlite and postgres source connectors** |
29 |
| -* **Add sampling functionality for indexers in fsspec connectors** |
| 33 | +- **Update KDB.AI vectorstore integration to 1.4** |
| 34 | +- **Add sqlite and postgres source connectors** |
| 35 | +- **Add sampling functionality for indexers in fsspec connectors** |
30 | 36 |
|
31 | 37 | ### Fixes
|
32 | 38 |
|
33 |
| -* **Fix Databricks Volumes destination** Fix for filenames to not be hashes. |
| 39 | +- **Fix Databricks Volumes destination** Fix for filenames to not be hashes. |
34 | 40 |
|
35 | 41 | ## 0.1.0
|
36 | 42 |
|
37 | 43 | ### Enhancements
|
38 | 44 |
|
39 |
| -* **Move default API URL parameter value to serverless API** |
40 |
| -* **Add check that access config always wrapped in Secret** |
41 |
| -* **Add togetherai embedder support** |
42 |
| -* **Refactor sqlite and postgres to be distinct connectors to support better input validation** |
43 |
| -* **Added MongoDB source V2 connector** |
44 |
| -* **Support optional access configs on connection configs** |
45 |
| -* **Refactor databricks into distinct connectors based on auth type** |
| 45 | +- **Move default API URL parameter value to serverless API** |
| 46 | +- **Add check that access config always wrapped in Secret** |
| 47 | +- **Add togetherai embedder support** |
| 48 | +- **Refactor sqlite and postgres to be distinct connectors to support better input validation** |
| 49 | +- **Added MongoDB source V2 connector** |
| 50 | +- **Support optional access configs on connection configs** |
| 51 | +- **Refactor databricks into distinct connectors based on auth type** |
46 | 52 |
|
47 | 53 | ### Fixes
|
48 | 54 |
|
|
52 | 58 |
|
53 | 59 | ### Enhancements
|
54 | 60 |
|
55 |
| -* **Support pinecone namespace on upload** |
56 |
| -* **Migrate Outlook Source Connector to V2** |
57 |
| -* **Support for Databricks Volumes source connector** |
| 61 | +- **Support pinecone namespace on upload** |
| 62 | +- **Migrate Outlook Source Connector to V2** |
| 63 | +- **Support for Databricks Volumes source connector** |
58 | 64 |
|
59 | 65 | ### Fixes
|
60 | 66 |
|
61 |
| -* **Update Sharepoint Creds and Expected docs** |
| 67 | +- **Update Sharepoint Creds and Expected docs** |
62 | 68 |
|
63 | 69 | ## 0.0.24
|
64 | 70 |
|
65 | 71 | ### Enhancements
|
66 | 72 |
|
67 |
| -* **Support dynamic metadata mapping in Pinecone uploader** |
| 73 | +- **Support dynamic metadata mapping in Pinecone uploader** |
68 | 74 |
|
69 | 75 | ## 0.0.23
|
70 | 76 |
|
71 | 77 | ### Fixes
|
72 | 78 |
|
73 |
| -* **Remove check for langchain dependency in embedders** |
| 79 | +- **Remove check for langchain dependency in embedders** |
74 | 80 |
|
75 | 81 | ## 0.0.22
|
76 | 82 |
|
77 | 83 | ### Enhancements
|
78 | 84 |
|
79 |
| -* **Add documentation for developing sources/destinations** |
| 85 | +- **Add documentation for developing sources/destinations** |
80 | 86 |
|
81 |
| -* **Leverage `uv` for pip compile** |
| 87 | +- **Leverage `uv` for pip compile** |
82 | 88 |
|
83 |
| -* **Use incoming fsspec data to populate metadata** Rather than make additional calls to collect metadata after initial file list, use connector-specific data to populate the metadata. |
| 89 | +- **Use incoming fsspec data to populate metadata** Rather than make additional calls to collect metadata after initial file list, use connector-specific data to populate the metadata. |
84 | 90 |
|
85 |
| -* **Drop langchain as dependency for embedders** |
| 91 | +- **Drop langchain as dependency for embedders** |
86 | 92 |
|
87 | 93 | ## 0.0.21
|
88 | 94 |
|
89 | 95 | ### Fixes
|
90 | 96 |
|
91 |
| -* **Fix forward compatibility issues with `unstructured-client==0.26.0`.** Update syntax and create a new SDK util file for reuse in the Partitioner and Chunker |
| 97 | +- **Fix forward compatibility issues with `unstructured-client==0.26.0`.** Update syntax and create a new SDK util file for reuse in the Partitioner and Chunker |
92 | 98 |
|
93 |
| -* **Update Databricks CI Test** Update to use client_id and client_secret auth. Also return files.upload method to one from open source. |
| 99 | +- **Update Databricks CI Test** Update to use client_id and client_secret auth. Also return files.upload method to one from open source. |
94 | 100 |
|
95 |
| -* **Fix astra src bug** V1 source connector was updated to work with astrapy 1.5.0 |
| 101 | +- **Fix astra src bug** V1 source connector was updated to work with astrapy 1.5.0 |
96 | 102 |
|
97 | 103 | ## 0.0.20
|
98 | 104 |
|
99 | 105 | ### Enhancements
|
100 | 106 |
|
101 |
| -* **Support for latest AstraPy API** Add support for the modern AstraPy client interface for the Astra DB Connector. |
| 107 | +- **Support for latest AstraPy API** Add support for the modern AstraPy client interface for the Astra DB Connector. |
102 | 108 |
|
103 | 109 | ## 0.0.19
|
104 | 110 |
|
105 | 111 | ### Fixes
|
106 | 112 |
|
107 |
| -* **Use validate_default to instantiate default pydantic secrets** |
| 113 | +- **Use validate_default to instantiate default pydantic secrets** |
108 | 114 |
|
109 | 115 | ## 0.0.18
|
110 | 116 |
|
111 | 117 | ### Enhancements
|
112 | 118 |
|
113 |
| -* **Better destination precheck for blob storage** Write an empty file to the destination location when running fsspec-based precheck |
| 119 | +- **Better destination precheck for blob storage** Write an empty file to the destination location when running fsspec-based precheck |
114 | 120 |
|
115 | 121 | ## 0.0.17
|
116 | 122 |
|
117 | 123 | ### Fixes
|
118 | 124 |
|
119 |
| -* **Drop use of unstructued in embed** Remove remnant import from unstructured dependency in embed implementations. |
120 |
| - |
| 125 | +- **Drop use of unstructued in embed** Remove remnant import from unstructured dependency in embed implementations. |
121 | 126 |
|
122 | 127 | ## 0.0.16
|
123 | 128 |
|
124 | 129 | ### Fixes
|
125 | 130 |
|
126 |
| -* **Add constraint on pydantic** Make sure the version of pydantic being used with this repo pulls in the earliest version that introduces generic Secret, since this is used heavily. |
| 131 | +- **Add constraint on pydantic** Make sure the version of pydantic being used with this repo pulls in the earliest version that introduces generic Secret, since this is used heavily. |
127 | 132 |
|
128 | 133 | ## 0.0.15
|
129 | 134 |
|
130 | 135 | ### Fixes
|
131 | 136 |
|
132 |
| -* **Model serialization with nested models** Logic updated to properly handle serializing pydantic models that have nested configs with secret values. |
133 |
| -* **Sharepoint permission config requirement** The sharepoint connector was expecting the permission config, even though it should have been optional. |
134 |
| -* **Sharepoint CLI permission params made optional |
| 137 | +- **Model serialization with nested models** Logic updated to properly handle serializing pydantic models that have nested configs with secret values. |
| 138 | +- **Sharepoint permission config requirement** The sharepoint connector was expecting the permission config, even though it should have been optional. |
| 139 | +- \*\*Sharepoint CLI permission params made optional |
135 | 140 |
|
136 | 141 | ### Enhancements
|
137 | 142 |
|
138 |
| -* **Migrate airtable connector to v2** |
139 |
| -* **Support iteratively deleting cached content** Add a flag to delete cached content once it's no longer needed for systems that are limited in memory. |
| 143 | +- **Migrate airtable connector to v2** |
| 144 | +- **Support iteratively deleting cached content** Add a flag to delete cached content once it's no longer needed for systems that are limited in memory. |
140 | 145 |
|
141 | 146 | ## 0.0.14
|
142 | 147 |
|
143 | 148 | ### Enhancements
|
144 | 149 |
|
145 |
| -* **Support async batch uploads for pinecone connector** |
146 |
| -* **Migrate embedders** Move embedder implementations from the open source unstructured repo into this one. |
| 150 | +- **Support async batch uploads for pinecone connector** |
| 151 | +- **Migrate embedders** Move embedder implementations from the open source unstructured repo into this one. |
147 | 152 |
|
148 | 153 | ### Fixes
|
149 | 154 |
|
150 |
| -* **Misc. Onedrive connector fixes** |
| 155 | +- **Misc. Onedrive connector fixes** |
151 | 156 |
|
152 | 157 | ## 0.0.13
|
153 | 158 |
|
154 | 159 | ### Fixes
|
155 | 160 |
|
156 |
| -* **Pinecone payload size fixes** Pinecone destination now has a limited set of properties it will publish as well as dynamically handles batch size to stay under 2MB pinecone payload limit. |
| 161 | +- **Pinecone payload size fixes** Pinecone destination now has a limited set of properties it will publish as well as dynamically handles batch size to stay under 2MB pinecone payload limit. |
157 | 162 |
|
158 | 163 | ## 0.0.12
|
159 | 164 |
|
160 | 165 | ### Enhancements
|
161 | 166 |
|
162 | 167 | ### Fixes
|
163 | 168 |
|
164 |
| -* **Fix invalid `replace()` calls in uncompress** - `replace()` calls meant to be on `str` versions of the path were instead called on `Path` causing errors with parameters. |
| 169 | +- **Fix invalid `replace()` calls in uncompress** - `replace()` calls meant to be on `str` versions of the path were instead called on `Path` causing errors with parameters. |
165 | 170 |
|
166 | 171 | ## 0.0.11
|
167 | 172 |
|
168 | 173 | ### Enhancements
|
169 | 174 |
|
170 |
| -* **Fix OpenSearch connector** OpenSearch connector did not work when `http_auth` was not provided |
| 175 | +- **Fix OpenSearch connector** OpenSearch connector did not work when `http_auth` was not provided |
171 | 176 |
|
172 | 177 | ## 0.0.10
|
173 | 178 |
|
174 | 179 | ### Enhancements
|
175 | 180 |
|
176 |
| -* "Fix tar extraction" - tar extraction function assumed archive was gzip compressed which isn't true for supported `.tar` archives. Updated to work for both compressed and uncompressed tar archives. |
| 181 | +- "Fix tar extraction" - tar extraction function assumed archive was gzip compressed which isn't true for supported `.tar` archives. Updated to work for both compressed and uncompressed tar archives. |
177 | 182 |
|
178 | 183 | ## 0.0.9
|
179 | 184 |
|
180 | 185 | ### Enhancements
|
181 | 186 |
|
182 |
| -* **Chroma dict settings should allow string inputs** |
183 |
| -* **Move opensearch non-secret fields out of access config** |
184 |
| -* **Support string inputs for dict type model fields** Use the `BeforeValidator` support from pydantic to map a string value to a dict if that's provided. |
185 |
| -* **Move opensearch non-secret fields out of access config |
| 187 | +- **Chroma dict settings should allow string inputs** |
| 188 | +- **Move opensearch non-secret fields out of access config** |
| 189 | +- **Support string inputs for dict type model fields** Use the `BeforeValidator` support from pydantic to map a string value to a dict if that's provided. |
| 190 | +- \*\*Move opensearch non-secret fields out of access config |
186 | 191 |
|
187 | 192 | ### Fixes
|
188 | 193 |
|
189 |
| -**Fix uncompress logic** Use of the uncompress process wasn't being leveraged in the pipeline correctly. Updated to use the new loca download path for where the partitioned looks for the new file. |
190 |
| - |
| 194 | +**Fix uncompress logic** Use of the uncompress process wasn't being leveraged in the pipeline correctly. Updated to use the new loca download path for where the partitioned looks for the new file. |
191 | 195 |
|
192 | 196 | ## 0.0.8
|
193 | 197 |
|
194 | 198 | ### Enhancements
|
195 | 199 |
|
196 |
| -* **Add fields_to_include option for Milvus Stager** Adds support for filtering which fields will remain in the document so user can align document structure to collection schema. |
197 |
| -* **Add flatten_metadata option for Milvus Stager** Flattening metadata is now optional (enabled by default) step in processing the document. |
| 200 | +- **Add fields_to_include option for Milvus Stager** Adds support for filtering which fields will remain in the document so user can align document structure to collection schema. |
| 201 | +- **Add flatten_metadata option for Milvus Stager** Flattening metadata is now optional (enabled by default) step in processing the document. |
198 | 202 |
|
199 | 203 | ## 0.0.7
|
200 | 204 |
|
201 | 205 | ### Enhancements
|
202 | 206 |
|
203 |
| -* **support sharing parent multiprocessing for uploaders** If an uploader needs to fan out it's process using multiprocessing, support that using the parent pipeline approach rather than handling it explicitly by the connector logic. |
204 |
| -* **OTEL support** If endpoint supplied, publish all traces to an otel collector. |
| 207 | +- **support sharing parent multiprocessing for uploaders** If an uploader needs to fan out it's process using multiprocessing, support that using the parent pipeline approach rather than handling it explicitly by the connector logic. |
| 208 | +- **OTEL support** If endpoint supplied, publish all traces to an otel collector. |
205 | 209 |
|
206 | 210 | ### Fixes
|
207 | 211 |
|
208 |
| -* **Weaviate access configs access** Weaviate access config uses pydantic Secret and it needs to be resolved to the secret value when being used. This was fixed. |
209 |
| -* **unstructured-client compatibility fix** Fix an error when accessing the fields on `PartitionParameters` in the new 0.26.0 Python client. |
| 212 | +- **Weaviate access configs access** Weaviate access config uses pydantic Secret and it needs to be resolved to the secret value when being used. This was fixed. |
| 213 | +- **unstructured-client compatibility fix** Fix an error when accessing the fields on `PartitionParameters` in the new 0.26.0 Python client. |
210 | 214 |
|
211 | 215 | ## 0.0.6
|
212 | 216 |
|
213 | 217 | ### Fixes
|
214 | 218 |
|
215 |
| -* **unstructured-client compatibility fix** Update the calls to `unstructured_client.general.partition` to avoid a breaking change in the newest version. |
| 219 | +- **unstructured-client compatibility fix** Update the calls to `unstructured_client.general.partition` to avoid a breaking change in the newest version. |
216 | 220 |
|
217 | 221 | ## 0.0.5
|
218 | 222 |
|
219 | 223 | ### Enhancements
|
220 | 224 |
|
221 |
| -* **Add Couchbase Source Connector** Adds support for reading artifacts from Couchbase DB for processing in unstructured |
222 |
| -* **Drop environment from pinecone as part of v2 migration** environment is no longer required by the pinecone SDK, so that field has been removed from the ingest CLI/SDK/ |
223 |
| -* **Add KDBAI Destination Connector** Adds support for writing elements and their embeddings to KDBAI DB. |
| 225 | +- **Add Couchbase Source Connector** Adds support for reading artifacts from Couchbase DB for processing in unstructured |
| 226 | +- **Drop environment from pinecone as part of v2 migration** environment is no longer required by the pinecone SDK, so that field has been removed from the ingest CLI/SDK/ |
| 227 | +- **Add KDBAI Destination Connector** Adds support for writing elements and their embeddings to KDBAI DB. |
224 | 228 |
|
225 | 229 | ### Fixes
|
226 | 230 |
|
227 |
| -* **AstraDB connector configs** Configs had dataclass annotation removed since they're now pydantic data models. |
228 |
| -* **Local indexer recursive behavior** Local indexer was indexing directories as well as files. This was filtered out. |
| 231 | +- **AstraDB connector configs** Configs had dataclass annotation removed since they're now pydantic data models. |
| 232 | +- **Local indexer recursive behavior** Local indexer was indexing directories as well as files. This was filtered out. |
229 | 233 |
|
230 | 234 | ## 0.0.4
|
231 | 235 |
|
232 | 236 | ### Enhancements
|
233 | 237 |
|
234 |
| -* **Add Couchbase Destination Connector** Adds support for storing artifacts in Couchbase DB for Vector Search |
235 |
| -* **Leverage pydantic base models** All user-supplied configs are now derived from pydantic base models to leverage better type checking and add built in support for sensitive fields. |
236 |
| -* **Autogenerate click options from base models** Leverage the pydantic base models for all configs to autogenerate the cli options exposed when running ingest as a CLI. |
237 |
| -* **Drop required Unstructured dependency** Unstructured was moved to an extra dependency to only be imported when needed for functionality such as local partitioning/chunking. |
238 |
| -* **Rebrand Astra to Astra DB** The Astra DB integration was re-branded to be consistent with DataStax standard branding. |
| 238 | +- **Add Couchbase Destination Connector** Adds support for storing artifacts in Couchbase DB for Vector Search |
| 239 | +- **Leverage pydantic base models** All user-supplied configs are now derived from pydantic base models to leverage better type checking and add built in support for sensitive fields. |
| 240 | +- **Autogenerate click options from base models** Leverage the pydantic base models for all configs to autogenerate the cli options exposed when running ingest as a CLI. |
| 241 | +- **Drop required Unstructured dependency** Unstructured was moved to an extra dependency to only be imported when needed for functionality such as local partitioning/chunking. |
| 242 | +- **Rebrand Astra to Astra DB** The Astra DB integration was re-branded to be consistent with DataStax standard branding. |
239 | 243 |
|
240 | 244 | ## 0.0.3
|
241 | 245 |
|
242 | 246 | ### Enhancements
|
243 | 247 |
|
244 |
| -* **Improve documentation** Update the README's. |
245 |
| -* **Explicit Opensearch classes** For the connector registry entries for opensearch, use only opensearch specific classes rather than any elasticsearch ones. |
246 |
| -* **Add missing fsspec destination precheck** check connection in precheck for all fsspec-based destination connectors |
| 248 | +- **Improve documentation** Update the README's. |
| 249 | +- **Explicit Opensearch classes** For the connector registry entries for opensearch, use only opensearch specific classes rather than any elasticsearch ones. |
| 250 | +- **Add missing fsspec destination precheck** check connection in precheck for all fsspec-based destination connectors |
247 | 251 |
|
248 | 252 | ## 0.0.2
|
249 | 253 |
|
250 | 254 | ### Enhancements
|
251 | 255 |
|
252 |
| -* **Use uuid for s3 identifiers** Update unique id to use uuid derived from file path rather than the filepath itself. |
253 |
| -* **V2 connectors precheck support** All steps in the v2 pipeline support an optional precheck call, which encompasses the previous check connection functionality. |
254 |
| -* **Filter Step** Support dedicated step as part of the pipeline to filter documents. |
| 256 | +- **Use uuid for s3 identifiers** Update unique id to use uuid derived from file path rather than the filepath itself. |
| 257 | +- **V2 connectors precheck support** All steps in the v2 pipeline support an optional precheck call, which encompasses the previous check connection functionality. |
| 258 | +- **Filter Step** Support dedicated step as part of the pipeline to filter documents. |
255 | 259 |
|
256 | 260 | ## 0.0.1
|
257 | 261 |
|
258 | 262 | ### Enhancements
|
259 | 263 |
|
260 | 264 | ### Features
|
261 | 265 |
|
262 |
| -* **Add Milvus destination connector** Adds support storing artifacts in Milvus vector database. |
| 266 | +- **Add Milvus destination connector** Adds support storing artifacts in Milvus vector database. |
263 | 267 |
|
264 | 268 | ### Fixes
|
265 | 269 |
|
266 |
| -* **Remove old repo references** Any mention of the repo this project came from was removed. |
| 270 | +- **Remove old repo references** Any mention of the repo this project came from was removed. |
267 | 271 |
|
268 | 272 | ## 0.0.0
|
269 | 273 |
|
270 | 274 | ### Features
|
271 | 275 |
|
272 |
| -* **Initial Migration** Create the structure of this repo from the original code in the [Unstructured](https://github.com/Unstructured-IO/unstructured) project. |
| 276 | +- **Initial Migration** Create the structure of this repo from the original code in the [Unstructured](https://github.com/Unstructured-IO/unstructured) project. |
273 | 277 |
|
274 | 278 | ### Fixes
|
0 commit comments