Skip to content

Commit c8b04b1

Browse files
committed
update readme
1 parent 6e3dc1d commit c8b04b1

1 file changed

Lines changed: 12 additions & 9 deletions

File tree

README.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Running from Docker:
1414
```
1515
docker run -it -e ES=locahost -e INDEX=my-index -e FIELD=id deric/es-dedupe:latest
1616
```
17-
You can either override Docker commad or use ENV variable to pass arguments.
17+
You can either override Docker command or use ENV variable to pass arguments.
1818

1919
## Usage
2020
Use `-h/--help` to see supported options:
@@ -23,12 +23,12 @@ docker run --rm deric/es-dedupe:latest dedupe --help
2323
```
2424

2525
```
26-
python -u dedupe.py -H localhost -P 9200 -i exact-index-name -f Uuid > es_dedupe.log
26+
docker run --rm deric/es-dedupe:latest dedupe -H localhost -P 9200 -i exact-index-name -f Uuid > es_dedupe.log 2>&1
2727
```
2828
will try to find duplicated documents in an index called 'exact-index-name' where documents are grouped by `Uuid` field.
2929

3030
```
31-
python -u dedupe.py -H localhost -P 9200 --all --prefix 'esindexprefix' --prefixseparator '-' --indexexclude '^excludedindex.*' -f fingerprint > es_dedupe.log
31+
docker run --rm deric/es-dedupe:latest dedupe -H localhost -P 9200 --all --prefix 'esindexprefix' --prefixseparator '-' --indexexclude '^excludedindex.*' -f fingerprint > es_dedupe.log 2>&1
3232
```
3333
will try to find duplicated documents in all indices known to the ES instance on localhost:9200, that look akin to 'esindexprefix-\*' while excluding all indices starting with 'excludedindex', where documents are grouped by `fingerprint` field.
3434

@@ -49,8 +49,11 @@ A log file containing documents with unique fields is written into `/tmp/es_dedu
4949
By design ES aggregate queries are not necessarily precise. Depending on your cluster setup, some documents won't be deleted due to
5050
inaccurate shard statistics.
5151

52-
Running `$ python dedupe.py --check_log /tmp/es_dedupe.log --noop` will query for documents found by aggregate and queries check whether were actually
53-
deleted.
52+
`--check_log` will query for documents found by aggregate and queries check whether were actually deleted.
53+
```
54+
docker run --rm deric/es-dedupe:latest dedupe --check_log /tmp/es_dedupe.log --noop
55+
```
56+
5457
```
5558
== Starting ES deduplicator....
5659
PRETENDING to delete:
@@ -80,17 +83,17 @@ Deleted 276673 duplicates, in total 609802. Batch processed in 0:00:08.487847, r
8083
For the installation use the tools provided by your operating system.
8184

8285
On Linux this can be one of the following: yum, dnf, apt, yast, emerge, ..
83-
```
86+
8487
* Install python (2 or 3, both will work)
8588
* Install python*ujson and python*requests for the fitting python version
86-
```
89+
8790

8891
On Windows you are pretty much on your own, but fear not, you can do the following ;-)
89-
```
92+
9093
* Download and install a python version from https://www.python.org/ .
9194
* Open a console terminal and head to the repository copy of es-deduplicator, then run:
9295
pip install -r requirements.txt
93-
```
96+
9497

9598
## History
9699

0 commit comments

Comments
 (0)