Description
When indexing a big dataset it is too common to trigger a too many open files. This error is thrown when indexing. This is likely to be produced by grenad and the extractors of milli that generate a lot of files. The dataset I was using is a 33M lines of JSON that is about 14GiB, sending in one single batch. It should also be triggered by having indexed a lot of documents and changing the settings, this will force the re-indexation of the full dataset.
We designed this new indexation system with @ManyTheFish to reduce the amount of RAM the engine was using and therefore stop the number of crashes (killed by the OS) witnessed by our users. We did a good job even if we can do better (see #3037).
I want to explore a new exploration on our extractors by @loiclec at meilisearch/milli#656. This refactoring should bring more efficient RAM usage by the extractor without using too much memory, speeding up the indexation process and reducing the number of created files.