Too common to trigger a `too many open files` error

When indexing a big dataset it is too common to trigger a _too many open files_. This error is thrown when indexing. This is likely to be produced by grenad and [the extractors of milli that generate a lot of files](https://github.com/meilisearch/milli/blob/e75829aded245a38515129f3796e8bce68fc8ea7/milli/src/update/index_documents/extract/mod.rs#L1-L9). The dataset I was using is a 33M lines of JSON that is about 14GiB, sending in one single batch. It should also be triggered by having indexed a lot of documents and changing the settings, this will force the re-indexation of the full dataset.

We designed this new indexation system with @ManyTheFish to reduce the amount of RAM the engine was using and therefore stop the number of crashes (killed by the OS) witnessed by our users. We did a good job even if we can do better (see #3037).

I want to explore a new exploration on our extractors by @loiclec at https://github.com/meilisearch/milli/pull/656. This refactoring should bring more efficient RAM usage by the extractor without using too much memory, speeding up the indexation process and reducing the number of created files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Too common to trigger a `too many open files` error #3038

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Too common to trigger a too many open files error #3038

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Too common to trigger a `too many open files` error #3038