Skip to content

Commit 55fb7ad

Browse files
authored
feat: Add ocr command to extract text from images (#13)
1 parent a8dd129 commit 55fb7ad

File tree

6 files changed

+99
-878
lines changed

6 files changed

+99
-878
lines changed

README.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131

3232
#### Currently Available
3333
- **Images** → PDF (supports PNG, JPG)
34+
- **Images** → Text (TXT) - Extract text from images using OCR
3435
- **PDF** → Word (DOCX)
3536
- **PDF** → PowerPoint (PPTX)
3637
- **PDF** → Images (JPG)
@@ -46,6 +47,15 @@
4647
- PDF text extraction
4748

4849
## Installation
50+
51+
### Prerequisites
52+
53+
#### Tesseract OCR Installation
54+
Filto uses Tesseract OCR for text extraction from images. You'll need to install Tesseract on your system first.
55+
56+
> For more detailed installation instructions and additional language packs, please refer to the [Tesseract documentation](https://github.com/tesseract-ocr/tessdoc/blob/main/Installation.md).
57+
58+
---
4959
### Quick Install
5060
Install Filto using npm:
5161

@@ -159,7 +169,23 @@ filto extract input.pdf -o extracted.txt
159169
filto extract input.pdf --page 1 --output page1.txt
160170
```
161171
---
162-
### 5. Merge PDF Files
172+
173+
### 5. Extract Text from Images (OCR)
174+
Extract text from images using OCR. Supports multiple languages.
175+
176+
```bash
177+
# Basic usage
178+
filto ocr image.png -l eng -o output.txt
179+
```
180+
#### Specify language (e.g., for Arabic text)
181+
```
182+
filto ocr documento.png -l ara -o texto_extraido.txt
183+
```
184+
185+
**Note:** The `-l` or `--language` flag is required. Common language codes: `eng` (English), `ara` (Arabic), `fra` (French), `deu` (German), etc.
186+
187+
---
188+
### 6. Merge PDF Files
163189
Merge multiple PDF files into a single PDF
164190

165191
**Basic usage:**

0 commit comments

Comments
 (0)