Skip to content

Commit 80c0fab

Browse files
authored
build: new release (#249)
Cut a release that has the unstructured-ingest command line included in the unstructured package. Bonus tweak to the Ingest checklist.
1 parent 60abac2 commit 80c0fab

File tree

3 files changed

+6
-2
lines changed

3 files changed

+6
-2
lines changed

Diff for: CHANGELOG.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.4.12-dev2
1+
## 0.4.12
22

33
* Adds console_entrypoint for unstructured-ingest, other structure/doc updates related to ingest.
44
* Add `parser` parameter to `partition_html`.

Diff for: Ingest.md

+4
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ In checklist form, the above steps are summarized as:
6161
- [ ] Add a script test_unstructured_ingest/test-ingest-\<the-new-data-source\>.sh. It's json output files should have a total of no more than 100K.
6262
- [ ] Git add the expected outputs under test_unstructured_ingest/expected-structured-output/\<folder-name-relevant-to-your-dataset\> so the above test passes in CI.
6363
- [ ] Add a line to [test_unstructured_ingest/test-ingest.sh](test_unstructured_ingest/test-ingest.sh) invoking the new test script.
64+
- [ ] If additional python dependencies are needed for the new connector:
65+
- [ ] Add them as an extra to [setup.py](unstructured/setup.py).
66+
- [ ] Update the Makefile, adding a target for `install-ingest-<name>` and adding another `pip-compile` line to the `pip-compile` make target. See [this commit](https://github.com/Unstructured-IO/unstructured/commit/ab542ca3c6274f96b431142262d47d727f309e37) for a reference.
67+
- [ ] The added dependencies should be imported at runtime when the new connector is invoked, rather than as top-level imports.
6468
- [ ] Honors the conventions of `BaseConnectorConfig` defined in [unstructured/ingest/interfaces.py](unstructured/ingest/interfaces.py) which is passed through [the CLI](unstructured/ingest/main.py):
6569
- [ ] If running with an `.output_dir` where structured outputs already exists for a given file, the file content is not re-downloaded from the data source nor is it reprocessed. This is made possible by implementing the call to `MyIngestDoc.has_output()` which is invoked in [MainProcess._filter_docs_with_outputs](ingest-prep-for-many/unstructured/ingest/main.py).
6670
- [ ] Unless `.reprocess` is `True`, then documents are always reprocessed.

Diff for: unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.4.12-dev2" # pragma: no cover
1+
__version__ = "0.4.12" # pragma: no cover

0 commit comments

Comments
 (0)