-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Query Information
PPL Command/Query:
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats median(id_int)
Expected Result:
The query should successfully compute the median of the cast integer values from the _id
field (expected result: 3).
Actual Result:
{
"error": {
"reason": "Error occurred in OpenSearch engine: all shards failed",
"details": "Shard[0]: java.lang.NullPointerException\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
"type": "SearchPhaseExecutionException"
},
"status": 500
}
Stack Trace:
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:233)
at org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine$ScriptInputGetter.tryConvertDocValue(CalciteScriptEngine.java:214)
at org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine$ScriptInputGetter.field(CalciteScriptEngine.java:207)
at org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitInputRef(RexToLixTranslator.java:1175)
Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (details below)
Index Mapping
{
"mappings": {
"properties": {
"name": { "type": "keyword" },
"value": { "type": "long" },
"category": { "type": "keyword" }
}
}
}
Sample Data
{"_id": "1", "name": "item1", "value": 100, "category": "A"}
{"_id": "2", "name": "item2", "value": 200, "category": "B"}
{"_id": "3", "name": "item3", "value": 300, "category": "A"}
{"_id": "4", "name": "item4", "value": 400, "category": "B"}
{"_id": "5", "name": "item5", "value": 500, "category": "A"}
Bug Description
Issue Summary:
Using cast()
on OpenSearch metadata fields (_id
, _index
) within aggregation functions causes a NullPointerException
. The issue occurs when the cast field is used in any aggregation operation (median, avg, sum, count).
Steps to Reproduce:
- Create test index and insert data:
curl -X PUT "localhost:9200/test-cast-bug" -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"name": { "type": "keyword" },
"value": { "type": "long" },
"category": { "type": "keyword" }
}
}
}'
curl -X POST "localhost:9200/test-cast-bug/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"name":"item1","value":100,"category":"A"}
{"index":{"_id":"2"}}
{"name":"item2","value":200,"category":"B"}
{"index":{"_id":"3"}}
{"name":"item3","value":300,"category":"A"}
{"index":{"_id":"4"}}
{"name":"item4","value":400,"category":"B"}
{"index":{"_id":"5"}}
{"name":"item5","value":500,"category":"A"}
'
- Test working cases:
# Case 1: cast on _id without aggregation - ✅ WORKS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | head 3
# Returns: [{"_id":"1","id_int":1},{"_id":"2","id_int":2},{"_id":"3","id_int":3}]
# Case 2: cast on regular field with aggregation - ✅ WORKS
source=test-cast-bug | eval value_int = cast(value as int) | stats median(value_int)
# Returns: {"median(value_int)": 300}
# Case 3: eval without cast with aggregation - ✅ WORKS
source=test-cast-bug | eval doubled = value * 2 | stats avg(doubled)
# Returns: {"avg(doubled)": 600.0}
# Case 4: regular field aggregation - ✅ WORKS
source=test-cast-bug | stats median(value)
# Returns: {"median(value)": 300}
- Test failing cases:
# Case 5: cast on _id with median - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats median(id_int)
# Error: NullPointerException
# Case 6: cast on _id with avg - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats avg(id_int)
# Error: NullPointerException
# Case 7: cast on _id with count - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats count(id_int)
# Error: NullPointerException
# Case 8: cast on _index with aggregation - ❌ FAILS
source=test-cast-bug | eval idx_len = cast(length(_index) as int) | stats median(idx_len)
# Error: NullPointerException
Impact:
Users cannot perform type conversions on metadata fields when using aggregations, limiting the ability to analyze document IDs or other metadata in statistical operations.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Root Cause Analysis
This is a preliminary analysis and requires further investigation.
The NullPointerException occurs at:
CalciteScriptEngine$ScriptInputGetter.tryConvertDocValue(CalciteScriptEngine.java:214)
Detailed Analysis:
-
Missing Metadata Fields in fieldTypes Map: When aggregations are pushed down in
CalciteLogicalIndexScan.pushDownAggregate()
(line 287), the code retrieves field types using:Map<String, ExprType> fieldTypes = this.osIndex.getFieldTypes();
-
getFieldTypes() Excludes Metadata: The
getFieldTypes()
method (OpenSearchIndex.java:134-148) only returns fields from the index mapping. It does NOT include metadata fields like_id
,_index
,_score
, etc. -
Metadata Fields Are Separate: Metadata fields are stored separately in
getReservedFieldTypes()
(OpenSearchIndex.java:151-153) which returnsMETADATAFIELD_TYPE_MAP
. -
Script Execution Failure: When the aggregation script executes and tries to access
_id
:- Line 190:
ExprType exprType = fieldTypes.get("_id")
returnsnull
- Line 207: This null is passed to
tryConvertDocValue(fieldValueExpr, exprType)
- Line 214: The switch statement cannot handle null, causing
NullPointerException
- Line 190:
Code Locations:
- Bug Location:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
(line 287) - Error Location:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
(lines 190, 207, 214)
Proposed Fix
This is a preliminary analysis and requires further investigation.
Modify CalciteLogicalIndexScan.java
at line 287 to include reserved/metadata fields:
// Current (buggy) code:
Map<String, ExprType> fieldTypes = this.osIndex.getFieldTypes();
// Proposed fix:
Map<String, ExprType> fieldTypes = new LinkedHashMap<>(this.osIndex.getFieldTypes());
fieldTypes.putAll(this.osIndex.getReservedFieldTypes());
This pattern already exists in OpenSearchTypeFactory.convertSchema()
(lines 313-314) and ensures metadata fields are included in the fieldTypes map used for script generation.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status