Skip to content

Commit d441ba4

Browse files
committed
Updated README.md with instructions for Windows setup.
1 parent 6602c14 commit d441ba4

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,57 @@ Install ExtractThinker using pip:
3737
pip install extract_thinker
3838
```
3939

40+
## 🪟 Windows Setup Guide
41+
42+
If you're using Windows, follow these additional steps to set up the required dependencies:
43+
44+
### 1. Install Tesseract OCR
45+
1. Download the Tesseract installer from [UB Mannheim's GitHub repository](https://github.com/UB-Mannheim/tesseract/wiki)
46+
2. Choose the appropriate installer:
47+
- For 64-bit Windows: `tesseract-ocr-w64-setup-xxx.exe`
48+
- For 32-bit Windows: `tesseract-ocr-w32-setup-xxx.exe`
49+
3. During installation:
50+
- Choose the default installation path (`C:\Program Files\Tesseract-OCR`)
51+
- **Important**: Check the box for "Add to system PATH"
52+
- Complete the installation
53+
54+
### 2. Set Up Environment Variables
55+
1. Create a `.env` file in your project's root directory (same level as `pyproject.toml`)
56+
2. Add the following lines to your `.env` file:
57+
```
58+
TESSERACT_PATH="C:\Program Files\Tesseract-OCR\tesseract.exe"
59+
OPENAI_API_KEY="your-openai-api-key-here"
60+
```
61+
3. Replace `your-openai-api-key-here` with your actual OpenAI API key
62+
63+
### 3. Verify Installation
64+
1. Open a new PowerShell window (the PATH changes require a new terminal session)
65+
2. Verify Tesseract installation:
66+
```powershell
67+
where.exe tesseract
68+
```
69+
3. You should see the path to your Tesseract executable
70+
71+
### 4. Running Examples
72+
1. Make sure you're in your project's virtual environment:
73+
```powershell
74+
.\.venv\Scripts\Activate.ps1
75+
```
76+
2. If you get a security error about running scripts, run:
77+
```powershell
78+
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process
79+
```
80+
3. Run your example script:
81+
```powershell
82+
python examples/extractor_basic.py
83+
```
84+
85+
### Common Issues and Solutions
86+
- **Tesseract not found**: Make sure Tesseract is installed and added to PATH. Try restarting your terminal.
87+
- **Environment variables not loading**: Ensure your `.env` file is in the correct location and has the correct format.
88+
- **Script execution policy**: If you get a security error, use the `Set-ExecutionPolicy` command shown above.
89+
- **Virtual environment issues**: Make sure you're using the correct Python version (3.9+) and have activated the virtual environment.
90+
4091
## 🛠️ Usage
4192

4293
### Basic Extraction Example

0 commit comments

Comments
 (0)