Skip to content

Fix regex query to work with field alias #18215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 30, 2025
Merged
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Fixed
- Add task cancellation checks in aggregators ([#18426](https://github.com/opensearch-project/OpenSearch/pull/18426))
- Fix concurrent timings in profiler ([#18540](https://github.com/opensearch-project/OpenSearch/pull/18540))
- Fix regex query from query string query to work with field alias ([#18215](https://github.com/opensearch-project/OpenSearch/issues/18215))
- [Autotagging] Fix delete rule event consumption in InMemoryRuleProcessingService ([#18628](https://github.com/opensearch-project/OpenSearch/pull/18628))
- Cannot communicate with HTTP/2 when reactor-netty is enabled ([#18599](https://github.com/opensearch-project/OpenSearch/pull/18599))
- Fix the visit of sub queries for HasParentQuery and HasChildQuery ([#18621](https://github.com/opensearch-project/OpenSearch/pull/18621))


### Security

[Unreleased 3.x]: https://github.com/opensearch-project/OpenSearch/compare/3.1...main
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
setup:
- skip:
version: " - 3.1.99"
reason: "regex query over field alias support starts 3.2"

- do:
indices.create:
index: test_index
body:
settings:
number_of_shards: 1
number_of_replicas: 0
mappings:
properties:
test:
type: text
test_alias:
type: alias
path: test

- do:
bulk:
refresh: true
body: |
{"index":{"_index":"test_index","_id":"1"}}
{"test":"hello"}
{"index":{"_index":"test_index","_id":"2"}}
{"test":"world"}

---
"regex search on normal field":
- do:
search:
rest_total_hits_as_int: true
index: test_index
body:
query:
query_string:
query: "test: /h[a-z].*/"

- match: {hits.total: 1}
- match: {hits.hits.0._id: "1"}

---
"regex search on alias field":
- do:
search:
rest_total_hits_as_int: true
index: test_index
body:
query:
query_string:
query: "test_alias: /h[a-z].*/"

- match: {hits.total: 1}
- match: {hits.hits.0._id: "1"}
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
import org.apache.lucene.search.SynonymQuery;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.automaton.RegExp;
import org.opensearch.common.lucene.search.Queries;
import org.opensearch.common.regex.Regex;
import org.opensearch.common.unit.Fuzziness;
Expand Down Expand Up @@ -787,8 +788,12 @@ private Query getRegexpQuerySingle(String field, String termStr) throws ParseExc
if (currentFieldType == null) {
return newUnmappedFieldQuery(field);
}
setAnalyzer(getSearchAnalyzer(currentFieldType));
return super.getRegexpQuery(field, termStr);
if (forceAnalyzer != null) {
setAnalyzer(forceAnalyzer);
}
// query string query normalizes search value
termStr = getAnalyzer().normalize(currentFieldType.name(), termStr).utf8ToString();
return currentFieldType.regexpQuery(termStr, RegExp.ALL, 0, getDeterminizeWorkLimit(), getMultiTermRewriteMethod(), context);
} catch (RuntimeException e) {
if (lenient) {
return newLenientFieldQuery(field, e);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -786,6 +786,16 @@ public void testToQueryRegExpQuery() throws Exception {
assertTrue(regexpQuery.toString().contains("/foo*bar/"));
}

public void testRegexpQueryParserWithForceAnalyzer() throws Exception {
QueryStringQueryParser queryParser = new QueryStringQueryParser(createShardContext(), TEXT_FIELD_NAME);
queryParser.setForceAnalyzer(new org.apache.lucene.analysis.standard.StandardAnalyzer());
Query query = queryParser.parse("/aBc.*/");
assertThat(query, instanceOf(RegexpQuery.class));
RegexpQuery regexpQuery = (RegexpQuery) query;
// Standard analyzer normalizes to lowercase, verifying the normalization path with currentFieldType.name() is hit
assertTrue(regexpQuery.toString().contains("abc.*"));
}

public void testToQueryRegExpQueryTooComplex() throws Exception {
QueryStringQueryBuilder queryBuilder = queryStringQuery("/[ac]*a[ac]{50,200}/").defaultField(TEXT_FIELD_NAME);

Expand Down
Loading