You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
will try to find duplicated documents in all indices known to the ES instance on localhost:9200, that look akin to 'esindexprefix-\*' while excluding all indices starting with 'excludedindex', where documents are grouped by `fingerprint` field.
34
34
@@ -49,8 +49,11 @@ A log file containing documents with unique fields is written into `/tmp/es_dedu
49
49
By design ES aggregate queries are not necessarily precise. Depending on your cluster setup, some documents won't be deleted due to
50
50
inaccurate shard statistics.
51
51
52
-
Running `$ python dedupe.py --check_log /tmp/es_dedupe.log --noop` will query for documents found by aggregate and queries check whether were actually
53
-
deleted.
52
+
`--check_log` will query for documents found by aggregate and queries check whether were actually deleted.
53
+
```
54
+
docker run --rm deric/es-dedupe:latest dedupe --check_log /tmp/es_dedupe.log --noop
55
+
```
56
+
54
57
```
55
58
== Starting ES deduplicator....
56
59
PRETENDING to delete:
@@ -80,17 +83,17 @@ Deleted 276673 duplicates, in total 609802. Batch processed in 0:00:08.487847, r
80
83
For the installation use the tools provided by your operating system.
81
84
82
85
On Linux this can be one of the following: yum, dnf, apt, yast, emerge, ..
83
-
```
86
+
84
87
* Install python (2 or 3, both will work)
85
88
* Install python*ujson and python*requests for the fitting python version
86
-
```
89
+
87
90
88
91
On Windows you are pretty much on your own, but fear not, you can do the following ;-)
89
-
```
92
+
90
93
* Download and install a python version from https://www.python.org/ .
91
94
* Open a console terminal and head to the repository copy of es-deduplicator, then run:
0 commit comments