You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+20-6
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,8 @@ This endpoint returns 20 results matching the `<query>` within a specific countr
19
19
20
20
# Input data.tsv format
21
21
22
-
This service accepts only TSV file named `data.tsv` with the specific order of columns:
22
+
This service accepts only TSV file named `data.tsv` (or gzip-ed version named `data.tsv.gz`)
23
+
with the specific order of columns:
23
24
24
25
```
25
26
name
@@ -47,6 +48,7 @@ wikidata
47
48
wikipedia
48
49
```
49
50
51
+
The source data should be **sorted by the column `importance`**.
50
52
For the description of these columns, see [data format in geometalab/OSMNames repository](https://github.com/geometalab/OSMNames#data-format-of-tsv-export-of-osmnames).
51
53
52
54
@@ -56,24 +58,24 @@ This docker image consists from internaly connected and setup OSMNames Websearch
56
58
57
59
Whole service can be run from command-line with one command:
58
60
59
-
Run with demo data (10 lines) only
61
+
Run with demo data (sample of 100k lines from [geometalab/OSMNames](https://github.com/OSMNames/OSMNames/releases/tag/v1.1)) only
60
62
61
63
```
62
64
docker run -d --name klokantech-osmnames-sphinxsearch -p 80:80 klokantech/osmnames-sphinxsearch
63
65
```
64
66
65
-
You can attach your file `data.tsv`, which has to be located in the internal path `/data/input/data.tsv`:
67
+
You can attach your file `data.tsv` (or `data.tsv.gz`), which has to be located in the internal path `/data/input/data.tsv` (or `/data/input/data.tsv.gz`):
66
68
67
69
```
68
70
docker run -d --name klokantech-osmnames-sphinxsearch \
69
-
-v /path/to/folder/data.tsv:/data/input/ \
71
+
-v /path/to/folder/:/data/input/ \
70
72
-p 80:80 \
71
73
klokantech/osmnames-sphinxsearch
72
74
```
73
75
74
76
This file will be indexed on the first run or if index files are missing.
75
77
76
-
You can specify path for index folder as well:
78
+
You can specify a path for index folder as well:
77
79
78
80
```
79
81
docker run -d --name klokantech-osmnames-sphinxsearch \
@@ -88,7 +90,7 @@ You can attach your path with the following folder structure:
88
90
```
89
91
/path/to/folder/
90
92
- input/
91
-
- data.tsv
93
+
- data.tsv (or data.tsv.gz)
92
94
- index/
93
95
```
94
96
@@ -97,3 +99,15 @@ directly with simple command:
97
99
```
98
100
docker run -d -v /path/to/folder/:/data/ -p 80:80 klokantech/osmnames-sphinxsearch
99
101
```
102
+
103
+
# Index storage space
104
+
105
+
The service for full-text search SphinxSearch requires indexing a source data.
106
+
The index operation is required only for the first time or if source data has been changed.
107
+
This operation takes longer on a source with more lines, and requires more space storage as well.
108
+
109
+
A demo sample data with 100 000 lines has **9.3 MiB gzip**-ed source data file and requires storage space of **133.8 MiB for the index** folder. The operation tooks in an average 15 seconds.
110
+
111
+
The [full planet source data](https://github.com/OSMNames/OSMNames/releases/download/v1.1/planet-latest.tsv.gz) with 21 million lines requires storage space of **27.4 GiB for the index** folder. The operation tooks in an average 48 minutes.
112
+
113
+
The index operation is done automatically if certain the index file is missing via the prepared script `sphinx-reindex.sh`. You can use this script to force index operation as well: `$ time bash sphinx-reindex.sh force`.
0 commit comments