Skip to content

Commit c82bad1

Browse files
authored
build(deps): avoid version conflicts (#636)
Addresses #631. * Uses constraints to keep dependency versions more consistent. * Moves all dependencies to .in files which are then ingested by setup.py. * Adds script to check consistency of all extras. * Adds consistency check to CI. I should note that while it shouldn't be possible to cause a conflict between base.txt and any of the extras (because base.txt constrains all the extras) it is possible to get a conflict between two of the extras files. There are ways of trying to avoid that (like constraining each file by all the files that have already been processed before it in the order given in the make pip-compile target) but the ones I could think of seemed a little overwrought, and come with problems of their own. If a conflict arises, it should be flagged by CI or locally with make check-deps. When/if that happens, you can resolve the conflict by adding appropriate global constraints in requirements/constraints.txt. Also note that if fileA.in is constrained by fileB.txt, then fileB.in should be compiled before fileA.in in the make pip-compile target. Otherwise fileA.in will be compiled with the old version of fileB.txt which can cause conflicts or keep dependencies from being updated properly.
1 parent a1fed6d commit c82bad1

39 files changed

+557
-2108
lines changed

Diff for: .github/workflows/ci.yml

+30
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,36 @@ jobs:
3838
source .venv/bin/activate
3939
make install-ci
4040
41+
check-deps:
42+
strategy:
43+
matrix:
44+
python-version: ["3.8","3.9","3.10"]
45+
runs-on: ubuntu-latest
46+
needs: setup
47+
steps:
48+
- uses: actions/checkout@v3
49+
- uses: actions/cache@v3
50+
id: virtualenv-cache
51+
with:
52+
path: .venv
53+
key: unstructured-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('requirements/*.txt') }}
54+
# NOTE(robinson) - This is a fallback in case the lint job does not find the cache.
55+
# We can take this out when we implement the fix in CORE-99
56+
- name: Set up Python ${{ matrix.python-version }}
57+
uses: actions/setup-python@v4
58+
with:
59+
python-version: ${{ matrix.python-version }}
60+
- name: Setup virtual environment (no cache hit)
61+
if: steps.virtualenv-cache.outputs.cache-hit != 'true'
62+
run: |
63+
python${{ matrix.python-version }} -m venv .venv
64+
source .venv/bin/activate
65+
make install-base-pip-packages
66+
- name: Check for dependency conflicts
67+
run: |
68+
source .venv/bin/activate
69+
make check-deps
70+
4171
lint:
4272
strategy:
4373
matrix:

Diff for: CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
## 0.6.9-dev2
1+
## 0.6.9
22

33
### Enhancements
44

55
* fast strategy for pdf now keeps element bounding box data
6+
* setup.py refactor
67

78
### Features
89

Diff for: MANIFEST.in

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
include requirements/base.in
2+
include requirements/huggingface.in
3+
include requirements/local-inference.in
4+
include requirements/ingest-s3.in
5+
include requirements/ingest-azure.in
6+
include requirements/ingest-discord.in
7+
include requirements/ingest-github.in
8+
include requirements/ingest-gitlab.in
9+
include requirements/ingest-reddit.in
10+
include requirements/ingest-slack.in
11+
include requirements/ingest-wikipedia.in
12+
include requirements/ingest-google-drive.in

Diff for: Makefile

+18-13
Original file line numberDiff line numberDiff line change
@@ -108,28 +108,28 @@ install-local-inference: install install-unstructured-inference install-detectro
108108
## pip-compile: compiles all base/dev/test requirements
109109
.PHONY: pip-compile
110110
pip-compile:
111-
pip-compile --upgrade -o requirements/base.txt
111+
pip-compile --upgrade requirements/base.in
112112
# Extra requirements for huggingface staging functions
113-
pip-compile --upgrade --extra huggingface -o requirements/huggingface.txt
113+
pip-compile --upgrade requirements/huggingface.in
114114
# NOTE(robinson) - We want the dependencies for detectron2 in the requirements.txt, but not
115115
# the detectron2 repo itself. If detectron2 is in the requirements.txt file, an order of
116116
# operations issue related to the torch library causes the install to fail
117-
pip-compile --upgrade requirements/dev.in
118117
pip-compile --upgrade requirements/test.in
118+
pip-compile --upgrade requirements/dev.in
119119
pip-compile --upgrade requirements/build.in
120-
pip-compile --upgrade --extra local-inference -o requirements/local-inference.txt
120+
pip-compile --upgrade requirements/local-inference.in
121121
# NOTE(robinson) - doc/requirements.txt is where the GitHub action for building
122122
# sphinx docs looks for additional requirements
123123
cp requirements/build.txt docs/requirements.txt
124-
pip-compile --upgrade --extra=s3 --output-file=requirements/ingest-s3.txt requirements/base.txt setup.py
125-
pip-compile --upgrade --extra=azure --output-file=requirements/ingest-azure.txt requirements/base.txt setup.py
126-
pip-compile --upgrade --extra=discord --output-file=requirements/ingest-azure.txt requirements/base.txt setup.py
127-
pip-compile --upgrade --extra=reddit --output-file=requirements/ingest-reddit.txt requirements/base.txt setup.py
128-
pip-compile --upgrade --extra=github --output-file=requirements/ingest-github.txt requirements/base.txt setup.py
129-
pip-compile --upgrade --extra=gitlab --output-file=requirements/ingest-gitlab.txt requirements/base.txt setup.py
130-
pip-compile --upgrade --extra=slack --output-file=requirements/ingest-slack.txt requirements/base.txt setup.py
131-
pip-compile --upgrade --extra=wikipedia --output-file=requirements/ingest-wikipedia.txt requirements/base.txt setup.py
132-
pip-compile --upgrade --extra=google-drive --output-file=requirements/ingest-google-drive.txt requirements/base.txt setup.py
124+
pip-compile --upgrade requirements/ingest-s3.in
125+
pip-compile --upgrade requirements/ingest-azure.in
126+
pip-compile --upgrade requirements/ingest-discord.in
127+
pip-compile --upgrade requirements/ingest-reddit.in
128+
pip-compile --upgrade requirements/ingest-github.in
129+
pip-compile --upgrade requirements/ingest-gitlab.in
130+
pip-compile --upgrade requirements/ingest-slack.in
131+
pip-compile --upgrade requirements/ingest-wikipedia.in
132+
pip-compile --upgrade requirements/ingest-google-drive.in
133133

134134
## install-project-local: install unstructured into your local python environment
135135
.PHONY: install-project-local
@@ -198,6 +198,11 @@ version-sync:
198198
check-coverage:
199199
coverage report --fail-under=95
200200

201+
## check-deps: check consistency of dependencies
202+
.PHONY: check-deps
203+
check-deps:
204+
scripts/consistent-deps.sh
205+
201206
##########
202207
# Docker #
203208
##########

Diff for: docs/requirements.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ babel==2.12.1
1010
# via sphinx
1111
beautifulsoup4==4.12.2
1212
# via furo
13-
certifi==2022.12.7
13+
certifi==2023.5.7
1414
# via
1515
# -r requirements/build.in
1616
# requests
@@ -20,7 +20,7 @@ docutils==0.18.1
2020
# via
2121
# sphinx
2222
# sphinx-rtd-theme
23-
furo==2023.3.27
23+
furo==2023.5.20
2424
# via -r requirements/build.in
2525
idna==3.4
2626
# via requests
@@ -40,7 +40,7 @@ pygments==2.15.1
4040
# sphinx
4141
pytz==2023.3
4242
# via babel
43-
requests==2.30.0
43+
requests==2.31.0
4444
# via sphinx
4545
snowballstemmer==2.2.0
4646
# via sphinx

Diff for: requirements/base.in

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
-c "constraints.in"
2+
argilla
3+
chardet
4+
lxml
5+
msg_parser
6+
nltk
7+
openpyxl
8+
pandas
9+
pdfminer.six
10+
pillow
11+
pypandoc
12+
python-docx
13+
python-pptx
14+
python-magic
15+
markdown
16+
requests

Diff for: requirements/base.txt

+32-25
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22
# This file is autogenerated by pip-compile with Python 3.8
33
# by the following command:
44
#
5-
# pip-compile --output-file=requirements/base.txt
5+
# pip-compile requirements/base.in
66
#
77
anyio==3.6.2
88
# via httpcore
9-
argilla==1.6.0
10-
# via unstructured (setup.py)
9+
argilla==1.7.0
10+
# via -r requirements/base.in
1111
backoff==2.2.1
1212
# via argilla
13-
certifi==2022.12.7
13+
certifi==2023.5.7
1414
# via
15+
# -c requirements/constraints.in
1516
# httpcore
1617
# httpx
1718
# requests
18-
# unstructured (setup.py)
1919
cffi==1.15.1
2020
# via cryptography
2121
chardet==5.1.0
@@ -25,7 +25,9 @@ charset-normalizer==3.1.0
2525
# pdfminer-six
2626
# requests
2727
click==8.1.3
28-
# via nltk
28+
# via
29+
# nltk
30+
# typer
2931
commonmark==0.9.1
3032
# via rich
3133
cryptography==40.0.2
@@ -51,59 +53,59 @@ joblib==1.2.0
5153
# via nltk
5254
lxml==4.9.2
5355
# via
56+
# -r requirements/base.in
5457
# python-docx
5558
# python-pptx
56-
# unstructured (setup.py)
5759
markdown==3.4.3
58-
# via unstructured (setup.py)
60+
# via -r requirements/base.in
5961
monotonic==1.6
6062
# via argilla
6163
msg-parser==1.2.0
62-
# via unstructured (setup.py)
64+
# via -r requirements/base.in
6365
nltk==3.8.1
64-
# via unstructured (setup.py)
66+
# via -r requirements/base.in
6567
numpy==1.23.5
6668
# via
6769
# argilla
6870
# pandas
6971
olefile==0.46
7072
# via msg-parser
7173
openpyxl==3.1.2
72-
# via unstructured (setup.py)
74+
# via -r requirements/base.in
7375
packaging==23.1
7476
# via argilla
7577
pandas==1.5.3
7678
# via
79+
# -r requirements/base.in
7780
# argilla
78-
# unstructured (setup.py)
7981
pdfminer-six==20221105
80-
# via unstructured (setup.py)
82+
# via -r requirements/base.in
8183
pillow==9.5.0
8284
# via
85+
# -r requirements/base.in
8386
# python-pptx
84-
# unstructured (setup.py)
8587
pycparser==2.21
8688
# via cffi
87-
pydantic==1.10.7
89+
pydantic==1.10.8
8890
# via argilla
8991
pygments==2.15.1
9092
# via rich
9193
pypandoc==1.11
92-
# via unstructured (setup.py)
94+
# via -r requirements/base.in
9395
python-dateutil==2.8.2
9496
# via pandas
9597
python-docx==0.8.11
96-
# via unstructured (setup.py)
98+
# via -r requirements/base.in
9799
python-magic==0.4.27
98-
# via unstructured (setup.py)
100+
# via -r requirements/base.in
99101
python-pptx==0.6.21
100-
# via unstructured (setup.py)
102+
# via -r requirements/base.in
101103
pytz==2023.3
102104
# via pandas
103105
regex==2023.5.5
104106
# via nltk
105-
requests==2.30.0
106-
# via unstructured (setup.py)
107+
requests==2.31.0
108+
# via -r requirements/base.in
107109
rfc3986[idna2008]==1.5.0
108110
# via httpx
109111
rich==13.0.1
@@ -119,17 +121,22 @@ tqdm==4.65.0
119121
# via
120122
# argilla
121123
# nltk
122-
typing-extensions==4.5.0
124+
typer==0.9.0
125+
# via argilla
126+
typing-extensions==4.6.0
123127
# via
124128
# pydantic
125129
# rich
126-
urllib3==2.0.2
127-
# via requests
130+
# typer
131+
urllib3==1.26.16
132+
# via
133+
# -c requirements/constraints.in
134+
# requests
128135
wrapt==1.14.1
129136
# via
130137
# argilla
131138
# deprecated
132-
xlsxwriter==3.1.0
139+
xlsxwriter==3.1.1
133140
# via python-pptx
134141
zipp==3.15.0
135142
# via importlib-metadata

Diff for: requirements/build.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ babel==2.12.1
1010
# via sphinx
1111
beautifulsoup4==4.12.2
1212
# via furo
13-
certifi==2022.12.7
13+
certifi==2023.5.7
1414
# via
1515
# -r requirements/build.in
1616
# requests
@@ -20,7 +20,7 @@ docutils==0.18.1
2020
# via
2121
# sphinx
2222
# sphinx-rtd-theme
23-
furo==2023.3.27
23+
furo==2023.5.20
2424
# via -r requirements/build.in
2525
idna==3.4
2626
# via requests
@@ -40,7 +40,7 @@ pygments==2.15.1
4040
# sphinx
4141
pytz==2023.3
4242
# via babel
43-
requests==2.30.0
43+
requests==2.31.0
4444
# via sphinx
4545
snowballstemmer==2.2.0
4646
# via sphinx

Diff for: requirements/cache.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
a
1+
# a

Diff for: requirements/constraints.in

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
####################################################################################################
2+
# This file can house global constraints that aren't *direct* requirements of the package or any
3+
# extras. Putting a dependency here will only affect dependency sets that contain them -- in other
4+
# words, if something does not require a constraint, it will not be installed.
5+
####################################################################################################
6+
# NOTE(alan): Pinning to avoid conflicts with downstream ingest-s3
7+
urllib3<1.27, >=1.25.4
8+
# consistency with local-inference-pin
9+
protobuf<3.21
10+
# NOTE(robinson) - Required pins for security scans
11+
jupyter-core>=4.11.2
12+
wheel>=0.38.1
13+
# NOTE(robinson) - The following pins are to address
14+
# vulnerabilities in dependency scans
15+
certifi>=2022.12.07

Diff for: requirements/dev.in

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1+
-c constraints.in
2+
-c base.txt
3+
-c test.txt
14
jupyter
25
ipython
36
pip-tools
47
pre-commit
5-
# NOTE(robinson) - Required pins for security scans
6-
jupyter-core>=4.11.2
7-
wheel>=0.38.1

0 commit comments

Comments
 (0)