not working with over 800Gb index

There is a large index, size over 800Gb.
There are billions redords in this index, and most records(over 95%) are duplicate records, they are generated by log resend.

ES limit search Batch size less than 10000, and this index is so huge. I tried it with es-dedupe to dedupe it records just during 1 mins, search processing cost 1 hours(I checked the es server, 4G IO per second).

Maybe there is another way to deal with it. 
Read origin index, if the record is unique, write it to a new index. If record is duplicated, skip. If the new index is still huge, limit the new index size, over size then write to another new index named as xxx-001




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not working with over 800Gb index #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

not working with over 800Gb index #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions