Skip to content

Commit 1fcb7d8

Browse files
Revert "Merge pull request #852 from project-anuvaad/develop"
This reverts commit 845297e, reversing changes made to 4a0660c.
1 parent 456d721 commit 1fcb7d8

File tree

2 files changed

+25
-55
lines changed
  • anuvaad-etl/anuvaad-extractor/document-processor/ocr/tesseract_ulca_v2

2 files changed

+25
-55
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,29 @@
11

2-
# Anuvaad OCR
32

4-
Open source OCR models for Indic Languages (Printed), developed and used as part of project Anuvaad.
5-
Repo contains tesseract service with REST interface, which is ULCA compliant:
63

4+
A tesseract service with rest interface:
75
input : image url
86
ouput : [sentences]
97

108
Hindi and Tamil use custom weights
119

12-
detection of language and downloads tess-best weights if not already avilable
10+
detection of language and downloading tess-best weights if not already avilable
1311

14-
**Sample curl** :
12+
sample curl :
1513

16-
17-
18-
curl --location 'http://localhost:5000/anuvaad/ocr/v0/ulca-ocr' \
19-
--header 'Content-Type: application/json' \
20-
--data '{
21-
"image" : [
22-
{
23-
"imageUri": "https://anuvaad-raw-datasets.s3-us-west-2.amazonaws.com/anuvaad_ocr_hindi.jpg"
24-
}
25-
],
26-
"config": {
27-
"languages": [{
28-
"sourceLanguage" : "hi"
29-
}]
30-
}
31-
}'
32-
'
33-
34-
**Sample Response:**
35-
```json
36-
{
37-
"output" : [
38-
{
39-
"source" : "बिपिन रावत का एक माचिस की डिबिया के कारण हुआ था"
14+
curl --location --request POST 'http://0.0.0.0:5000/anuvaad/ocr/v0/ulca-ocr' \
15+
--header 'Content-Type: application/json' \
16+
--data-raw '{
17+
"config": {
18+
"language": {
19+
"sourceLanguage": "en"
4020
}
41-
],
42-
"status" : {
43-
"statusCode" : 200 ,
44-
"message" : "success"
21+
},
22+
"imageUri": ["https://anuvaad-raw-datasets.s3-us-west-2.amazonaws.com/anuvaad_ocr_english.jpg"
23+
24+
]
4525
}
46-
}
47-
48-
```
49-
**Deployment**
50-
## **Deployment**
51-
52-
53-
```shell
26+
'
5427

55-
docker build -t anuvaad_ocr_ulca_v2 .
56-
docker run --name anuvaad_ocr_ulca_v2 -d --network host anuvaad_ocr_ulca_v2
57-
```
5828

5929

anuvaad-etl/anuvaad-extractor/document-processor/ocr/tesseract_ulca_v2/start.sh

+11-11
Original file line numberDiff line numberDiff line change
@@ -13,32 +13,32 @@ curl -L -o /usr/share/tesseract-ocr/4.00/tessdata/Gujarati.traineddata https://g
1313
curl -L -o /usr/share/tesseract-ocr/4.00/tessdata/Oriya.traineddata https://github.com/tesseract-ocr/tessdata_best/blob/main/script/Oriya.traineddata?raw=true
1414

1515
tam_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_tam.traineddata'
16-
#url_tam='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_tam.traineddata'
17-
url_tam='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvad_tam_scene_text_real.traineddata'
16+
#url_tam='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_tam.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=X6%2BwKdeOyOUFlOFs%2B7eRmzhziZ0%3D&Expires=1693557258'
17+
url_tam='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvad_tam_scene_text_real.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=J1NEp22bhsW7dO3kd8iN1VX7XtI%3D&Expires=1711538482'
1818
hin_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_hin.traineddata'
1919
hin_scene_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvad_hin_scene_text_real.traineddata'
20-
url_hin='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_hin.traineddata'
21-
url_hin_scene='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvad_hin_scene_text_real.traineddata'
20+
url_hin='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_hin.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=2l%2F0OwWQrD%2FIvogfijATPufjMLA%3D&Expires=1693557740'
21+
url_hin_scene='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvad_hin_scene_text_real.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=FZ6Whiiv8uTYDkPGUvMzqoOKPOI%3D&Expires=1709212126'
2222
kan_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_kan.traineddata'
23-
url_kan='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_kan.traineddata'
23+
url_kan='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_kan.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=gDiNsqrV0n2%2BWZSMwesyqkLOYZ8%3D&Expires=1694149503'
2424

2525
ben_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_ben.traineddata'
26-
url_ben='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_ben.traineddata'
26+
url_ben='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_ben.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=ku%2FdynTtJVvaf55dwYC%2FMt3pKqo%3D&Expires=1698743313'
2727

2828
mal_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_mal.traineddata'
29-
url_mal='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_mal.traineddata'
29+
url_mal='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_mal.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=hX%2Bo%2BTTvwoN7IBcX%2FIgFTwMHoGs%3D&Expires=1698743610'
3030

3131
mar_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_mar.traineddata'
32-
url_mar='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_mar.traineddata'
32+
url_mar='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_mar.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=aTu5Ps9hL90clfPMZIVOEPx5%2Fl0%3D&Expires=1698743699'
3333

3434
ori_modelpath='/usr/share/tesseract-ocr/4.00/tessdata/anuvaad_ori.traineddata'
35-
url_ori='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_ori.traineddata'
35+
url_ori='https://anuvaad-pubnet-weights.s3.amazonaws.com/anuvaad_ori.traineddata?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=5aqEjjOryEhE4ElV2i8oHgVY%2F7I%3D&Expires=1698743792'
3636

3737
scene_text_line_detection_modelpath='./src/utilities/primalinenet/scene_text_judgement_line_detection_v1_model.pth'
38-
url_scene_text_line_detection_modelpath='https://anuvaad-pubnet-weights.s3.amazonaws.com/scene_text_judgement_line_detection_v1_model.pth'
38+
url_scene_text_line_detection_modelpath='https://anuvaad-pubnet-weights.s3.amazonaws.com/scene_text_judgement_line_detection_v1_model.pth?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=zTv5bP4Pt6NoLN%2FLUC7JrLBBrxs%3D&Expires=1705824951'
3939

4040
scene_text_east_angle_detection_modelpath='./src/utilities/east/east-model.ckpt-49491.data-00000-of-00001'
41-
url_scene_text_east_angle_detection_modelpath='https://anuvaad-pubnet-weights.s3.amazonaws.com/east-model.ckpt-49491.data-00000-of-00001'
41+
url_scene_text_east_angle_detection_modelpath='https://anuvaad-pubnet-weights.s3.amazonaws.com/east-model.ckpt-49491.data-00000-of-00001?AWSAccessKeyId=AKIAXX2AMEIRJY2GNYVZ&Signature=XbR8OnEhYISllPYYuYkzFhmovUY%3D&Expires=1707278033'
4242

4343

4444

0 commit comments

Comments
 (0)