Skip to content

Commit 374bab5

Browse files
committed
Updated README - added section for index storage space
1 parent 48f348c commit 374bab5

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

README.md

+20-6
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ This endpoint returns 20 results matching the `<query>` within a specific countr
1919

2020
# Input data.tsv format
2121

22-
This service accepts only TSV file named `data.tsv` with the specific order of columns:
22+
This service accepts only TSV file named `data.tsv` (or gzip-ed version named `data.tsv.gz`)
23+
with the specific order of columns:
2324

2425
```
2526
name
@@ -47,6 +48,7 @@ wikidata
4748
wikipedia
4849
```
4950

51+
The source data should be **sorted by the column `importance`**.
5052
For the description of these columns, see [data format in geometalab/OSMNames repository](https://github.com/geometalab/OSMNames#data-format-of-tsv-export-of-osmnames).
5153

5254

@@ -56,24 +58,24 @@ This docker image consists from internaly connected and setup OSMNames Websearch
5658

5759
Whole service can be run from command-line with one command:
5860

59-
Run with demo data (10 lines) only
61+
Run with demo data (sample of 100k lines from [geometalab/OSMNames](https://github.com/OSMNames/OSMNames/releases/tag/v1.1)) only
6062

6163
```
6264
docker run -d --name klokantech-osmnames-sphinxsearch -p 80:80 klokantech/osmnames-sphinxsearch
6365
```
6466

65-
You can attach your file `data.tsv`, which has to be located in the internal path `/data/input/data.tsv`:
67+
You can attach your file `data.tsv` (or `data.tsv.gz`), which has to be located in the internal path `/data/input/data.tsv` (or `/data/input/data.tsv.gz`):
6668

6769
```
6870
docker run -d --name klokantech-osmnames-sphinxsearch \
69-
-v /path/to/folder/data.tsv:/data/input/ \
71+
-v /path/to/folder/:/data/input/ \
7072
-p 80:80 \
7173
klokantech/osmnames-sphinxsearch
7274
```
7375

7476
This file will be indexed on the first run or if index files are missing.
7577

76-
You can specify path for index folder as well:
78+
You can specify a path for index folder as well:
7779

7880
```
7981
docker run -d --name klokantech-osmnames-sphinxsearch \
@@ -88,7 +90,7 @@ You can attach your path with the following folder structure:
8890
```
8991
/path/to/folder/
9092
- input/
91-
- data.tsv
93+
- data.tsv (or data.tsv.gz)
9294
- index/
9395
```
9496

@@ -97,3 +99,15 @@ directly with simple command:
9799
```
98100
docker run -d -v /path/to/folder/:/data/ -p 80:80 klokantech/osmnames-sphinxsearch
99101
```
102+
103+
# Index storage space
104+
105+
The service for full-text search SphinxSearch requires indexing a source data.
106+
The index operation is required only for the first time or if source data has been changed.
107+
This operation takes longer on a source with more lines, and requires more space storage as well.
108+
109+
A demo sample data with 100 000 lines has **9.3 MiB gzip**-ed source data file and requires storage space of **133.8 MiB for the index** folder. The operation tooks in an average 15 seconds.
110+
111+
The [full planet source data](https://github.com/OSMNames/OSMNames/releases/download/v1.1/planet-latest.tsv.gz) with 21 million lines requires storage space of **27.4 GiB for the index** folder. The operation tooks in an average 48 minutes.
112+
113+
The index operation is done automatically if certain the index file is missing via the prepared script `sphinx-reindex.sh`. You can use this script to force index operation as well: `$ time bash sphinx-reindex.sh force`.

0 commit comments

Comments
 (0)