Skip to content

[ML] Overview of reindex issues with NLP #113948

Open
@maxhniebergall

Description

@maxhniebergall

Background

Reindex allows users to create new indexes with data that is already in elasticsearch. This is especially useful for moving to semantic search because users often have already implemented text search and want to embed their existing data in a new index. Unfortunately, reindex has some flaws that make it difficult or impossible to use for larger datasets and when using machine learning models to produce embeddings.

Problems

Resiliency - Issues with failures and errors

Issues with size

Issues with performance

Issues with scroll

  • Its possible to hit the scroll limit if you have a lot of shards Empty scroll contexts don't count #86407
  • Scroll stores results in memory for a specific amount of time that isn't tied to the completion of the reindex.

Possible solutions in the works?

#27724 (comment)

Metadata

Metadata

Assignees

Labels

:mlMachine learning>bug>featureFeature:NLPFeatures and issues around NLPTeam:MLMeta label for the ML team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions