Description
What is the bug?
The client throws an exception when attempting to parse an index which settings include a simple_pattern_split
or simple_pattern
tokenizer.
IndexSettings
cannot be deserialized from settings using either of these tokenizers preventing them from being used in a CreateIndexRequest
. Using the client to make a GetIndexRequest
for an index using these settings throws the same exception.
Exception thrown:
org.opensearch.client.util.MissingRequiredPropertyException: Missing required property 'Builder.<variant kind>'
How can one reproduce the bug?
Reproduce the bug by deserializing from JSON:
String JSON = """
{
"analysis": {
"tokenizer": {
"my_pattern_split_tokenizer": {
"type": "simple_pattern_split",
"pattern": "-"
}
},
"analyzer": {
"my_pattern_split_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_split_tokenizer"
}
}
}
}
""";
JsonpMapper mapper = client._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(JSON));
IndexSettings settings = IndexSettings._DESERIALIZER.deserialize(parser, mapper);
Reproduce the bug by getting an index which was created using these settings:
GetIndexRequest req = new GetIndexRequest.Builder()
.index("test-index")
.build();
GetIndexResponse resp = client.indices().get(req);
What is the expected behavior?
IndexSettings
should be able to be deserialized from these settings because according to the documentation they're still supported tokenizers. The client should be able to get data for an index which uses these settings.
What is your host/environment?
macOS Sequoia 15.3
Do you have any additional context?
These settings work when reaching out to OpenSearch directly and appear to be supported by the High Level Rest Client. I'm encountering this issue now that I'm trying to migrate to the Java client. These tokenizer types aren't present in the TokenizerDefinition.
OpenSearch DSL:
PUT test-index
{
"settings": {
"analysis": {
"tokenizer": {
"my_pattern_split_tokenizer": {
"type": "simple_pattern_split",
"pattern": "-"
}
},
"analyzer": {
"my_pattern_split_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_split_tokenizer"
}
}
}
}
}