Skip to content

Add more information to IOContext #14422

Open
@thecoop

Description

@thecoop

Description

Currently Lucene tends to use ReadAdvice.RANDOM quite liberally, in various hard-coded locations. This can cause some problems with recent kernel versions, such as #14408. There are also several other aspects about opening files - such as which file access implementation to use, how to map memory, etc, that are dependent on the context of opening such files, and cannot be determined solely by the codec that is used. For example, you may want to access a flat vector file differently if it’s being used for an exhaustive scan vs being used to access specific vectors.

To solve both these issues, Directory implementations could have information on the context a particular file is being opened with, to help it decide how and with what options it should open the file. However, there is no such information provided to Directory classes in the IOContext, and there is no space to provide such information in an extensible way.

So there needs to be a way to pass additional context-specific information in an IOContext, to help Directory implementations decide on the ReadAdvice to use, and how and with what options it should open a particular file. There are two main ways I can see of doing this:

  1. Make IOContext an interface. This allows custom implementations to be created, which can then be checked for specific types inside the custom Directory implementation to then cast and pull out the required information (prototype can be found here)
  2. Add a Map<String, Object> or Map<String, String> payload to IOContext allowing for arbitrary information to be included with an IOContext, that can then be checked by the Directory implementation

As part of this, some standard patterns or map keys could be used for existing Lucene directories to replace the existing ReadAdvice logic, to provide more information on how files should be opened. A later piece of work could be to remove ReadAdvice from IOContext, and provide some other way Directory implementations could determine the access pattern that is likely to be used.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions