Skip to content

Commit 06d4a24

Browse files
Marc ShadeMarc Shade
authored andcommitted
Fix ingest function and update test assertions
1 parent f1031d4 commit 06d4a24

File tree

5 files changed

+32
-27
lines changed

5 files changed

+32
-27
lines changed

.windsurfrules

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
python3 -m pip install -r requirements.txt

README.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,21 +53,27 @@ pip install docsingest
5353
git clone https://github.com/marc-shade/docsingest.git
5454
cd docsingest
5555

56-
# Recommended: Create and activate a virtual environment
57-
python3 -m venv venv
58-
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
56+
# Highly Recommended: Create and activate a virtual environment
57+
# You can use the provided setup script for this:
58+
./scripts/install_dependencies.sh
59+
# Or you can manually create and activate a virtual environment:
60+
# python3 -m venv venv
61+
# source venv/bin/activate # On Windows, use `venv\Scripts\activate`
5962

6063
# Install dependencies
61-
pip install -r requirements.txt
64+
# pip install -r requirements.txt # not needed if using the setup script
6265

6366
# Install the package in editable mode
6467
pip install -e .
6568
```
6669

6770
#### Requirements
6871
- **Python Version**: 3.7 - 3.12 recommended
69-
- **Dependencies**: All dependencies will be automatically installed via pip
70-
- **System Requirements**:
72+
- **Dependencies**:
73+
- `spacy==3.6.1`
74+
- `en_core_web_sm==3.6.0`
75+
- All other dependencies will be automatically installed via pip
76+
- **System Requirements**:
7177
- Basic Python development tools
7278
- pip package manager
7379
- Internet connection for initial setup

requirements.txt

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
1-
python-docx>=0.8
2-
openpyxl>=3.0
3-
PyPDF2>=2.0
4-
markdown>=3.3
5-
lxml>=4.6
6-
xlrd>=1.2
7-
python-pptx>=0.6
8-
nltk>=3.5
9-
tiktoken>=0.3
10-
chardet>=3.0
11-
requests>=2.25
12-
spacy>=3.6,<4.0
13-
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
14-
regex>=2022.1
1+
python-docx
2+
openpyxl
3+
PyPDF2
4+
markdown
5+
lxml
6+
xlrd
7+
python-pptx
8+
nltk
9+
tiktoken
10+
chardet
11+
requests
12+
spacy

scripts/install_dependencies.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ echo "Installing development dependencies..."
3333
pip install -r "$PROJECT_ROOT/requirements-dev.txt"
3434

3535
# Download SpaCy language model
36-
python3 -m spacy download en_core_web_sm
36+
# python3 -m spacy download en_core_web_sm
3737

3838
# Deactivate virtual environment
39-
deactivate
39+
# deactivate
4040

4141
echo "Dependencies installed successfully!"

tests/test_ingest.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ def test_ingest_directory():
99
output_file = "/Volumes/FILES/code/content_ingest/test_output.md"
1010

1111
summary, tree, content, _ = ingest(
12-
directory=test_dir,
12+
input_directory=test_dir,
1313
agent_prompt="Test Compliance Officer",
1414
output_file=output_file,
1515
)
1616

1717
# Validate summary
18-
assert "**Total Files**" in summary
19-
assert "**Total Tokens**" in summary
18+
assert "- **Total Files Processed**:" in summary
19+
assert "- **Total Tokens**:" in summary
2020

2121
# Validate tree
2222
assert len(tree.split("\n")) > 0
@@ -32,9 +32,9 @@ def test_empty_directory():
3232
test_dir = "/tmp/empty_test_dir"
3333
os.makedirs(test_dir, exist_ok=True)
3434

35-
summary, tree, content, _ = ingest(directory=test_dir)
35+
summary, tree, content, _ = ingest(input_directory=test_dir)
3636

37-
assert "**Total Files**: 0" in summary
37+
assert "- **Total Files Processed**: 0" in summary
3838

3939
# Clean up
4040
os.rmdir(test_dir)

0 commit comments

Comments
 (0)