Skip to content

Commit 3cab674

Browse files
authored
SITE-5518 update tika 3 documentation (#9890)
* SITE-5518 update tika 3 documentation * modifying language
1 parent 9fd998b commit 3cab674

File tree

2 files changed

+9
-0
lines changed

2 files changed

+9
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,4 @@ out
6161
/public/current-sha.txt
6262

6363
pnpm-lock.yaml
64+
.idea

src/source/content/external-libraries.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,14 @@ tika_version: 3
6161

6262
Valid values are `3` or `none` (to disable Tika).
6363

64+
#### OCR in Tika 3
65+
66+
Tika 3.x defaults to `AUTO` OCR mode, which can significantly increase PDF processing times when Tesseract is used for OCR. A `tika-config.xml` that disables OCR is available at `/opt/pantheon/tika/tika-config.xml` and can be passed to Tika using the `--config` flag:
67+
68+
```bash
69+
/opt/pantheon/tika/tika.jar --config=/opt/pantheon/tika/tika-config.xml
70+
```
71+
6472
</Tab>
6573
<Tab title="PHP Runtime Generation 1" id="tab-2-id">
6674

0 commit comments

Comments
 (0)