You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-5Lines changed: 25 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,30 @@
1
-
# `ocr-api`
1
+
s-# `ocr-api`
2
2
3
3
A microservice to extract text from images. This uses Tess4J which itself is a small (Java Native Access) wrapper around Tesseract. As well as returning the extracted text some metadata relating to this service is also returned [data returned](src/main/java/uk/gov/companieshouse/ocr/api/image/extracttext/ExtractTextResultDto.java).
4
4
5
5
The `ocr-api` has one thread pool (with a blocking queue) that protects the system from being overloaded (implemented by a ThreadPoolTaskExecutor). In the normal running of this microservice this queue should have very few entries on it.
6
6
7
7
Supported images types: TIFF
8
8
9
+
## TLTR for updating for dependency changes
10
+
11
+
This project has not had any significant changes since it's release in 2021 but needs updates to its dependencies for security
12
+
fixes. This needs testing within a Docker volume.
13
+
14
+
We are also now updating how we deploy it (see second confluence document below). Until this is done you should run tests for
15
+
OCR conversion locally against a newly downloaded docker image or one that you have created yourself
16
+
17
+
## Confluence Documentation
18
+
19
+
-[System overview for live running](https://companieshouse.atlassian.net/wiki/spaces/IncVal/pages/2699755729/OCR+Service+Live)
20
+
-[Migration from EC2 to Fargate - WIP](https://companieshouse.atlassian.net/wiki/spaces/IncVal/pages/3067346945/Automated+builds+of+the+ocr-api+to+staging+and+live)** MUST READ UNTIL WE COMPLETE
21
+
-[Testing in AWS](https://companieshouse.atlassian.net/wiki/spaces/IncVal/pages/3396206692/Environments+and+Testing) -
@@ -59,6 +75,7 @@ To activate this project in development mode, run the following command before r
59
75
- Run `chs-dev development enable ocr-api`
60
76
61
77
The ocr-api should be assessable via http://api.chs.local/ocr-api/
78
+
62
79
## Tesseract Training data
63
80
64
81
This is used by the Tesseract engine to help in the text recognition. We store the currently used data within configuration management for consistency and speed of the docker build.
0 commit comments