Skip to content

[BUG] NullPointerException when using cast() on metadata fields (_id, _index) in aggregations #4513

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats median(id_int)

Expected Result:
The query should successfully compute the median of the cast integer values from the _id field (expected result: 3).

Actual Result:

{
  "error": {
    "reason": "Error occurred in OpenSearch engine: all shards failed",
    "details": "Shard[0]: java.lang.NullPointerException\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
    "type": "SearchPhaseExecutionException"
  },
  "status": 500
}

Stack Trace:

Caused by: java.lang.NullPointerException
	at java.base/java.util.Objects.requireNonNull(Objects.java:233)
	at org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine$ScriptInputGetter.tryConvertDocValue(CalciteScriptEngine.java:214)
	at org.opensearch.sql.opensearch.storage.script.CalciteScriptEngine$ScriptInputGetter.field(CalciteScriptEngine.java:207)
	at org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitInputRef(RexToLixTranslator.java:1175)

Dataset Information

Dataset/Schema Type

  • OpenTelemetry (OTEL)
  • Simple Schema for Observability (SS4O)
  • Open Cybersecurity Schema Framework (OCSF)
  • Custom (details below)

Index Mapping

{
  "mappings": {
    "properties": {
      "name": { "type": "keyword" },
      "value": { "type": "long" },
      "category": { "type": "keyword" }
    }
  }
}

Sample Data

{"_id": "1", "name": "item1", "value": 100, "category": "A"}
{"_id": "2", "name": "item2", "value": 200, "category": "B"}
{"_id": "3", "name": "item3", "value": 300, "category": "A"}
{"_id": "4", "name": "item4", "value": 400, "category": "B"}
{"_id": "5", "name": "item5", "value": 500, "category": "A"}

Bug Description

Issue Summary:
Using cast() on OpenSearch metadata fields (_id, _index) within aggregation functions causes a NullPointerException. The issue occurs when the cast field is used in any aggregation operation (median, avg, sum, count).

Steps to Reproduce:

  1. Create test index and insert data:
curl -X PUT "localhost:9200/test-cast-bug" -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "name": { "type": "keyword" },
      "value": { "type": "long" },
      "category": { "type": "keyword" }
    }
  }
}'

curl -X POST "localhost:9200/test-cast-bug/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"name":"item1","value":100,"category":"A"}
{"index":{"_id":"2"}}
{"name":"item2","value":200,"category":"B"}
{"index":{"_id":"3"}}
{"name":"item3","value":300,"category":"A"}
{"index":{"_id":"4"}}
{"name":"item4","value":400,"category":"B"}
{"index":{"_id":"5"}}
{"name":"item5","value":500,"category":"A"}
'
  1. Test working cases:
# Case 1: cast on _id without aggregation - ✅ WORKS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | head 3
# Returns: [{"_id":"1","id_int":1},{"_id":"2","id_int":2},{"_id":"3","id_int":3}]

# Case 2: cast on regular field with aggregation - ✅ WORKS
source=test-cast-bug | eval value_int = cast(value as int) | stats median(value_int)
# Returns: {"median(value_int)": 300}

# Case 3: eval without cast with aggregation - ✅ WORKS
source=test-cast-bug | eval doubled = value * 2 | stats avg(doubled)
# Returns: {"avg(doubled)": 600.0}

# Case 4: regular field aggregation - ✅ WORKS
source=test-cast-bug | stats median(value)
# Returns: {"median(value)": 300}
  1. Test failing cases:
# Case 5: cast on _id with median - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats median(id_int)
# Error: NullPointerException

# Case 6: cast on _id with avg - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats avg(id_int)
# Error: NullPointerException

# Case 7: cast on _id with count - ❌ FAILS
source=test-cast-bug | fields _id | eval id_int = cast(_id as int) | stats count(id_int)
# Error: NullPointerException

# Case 8: cast on _index with aggregation - ❌ FAILS
source=test-cast-bug | eval idx_len = cast(length(_index) as int) | stats median(idx_len)
# Error: NullPointerException

Impact:
Users cannot perform type conversions on metadata fields when using aggregations, limiting the ability to analyze document IDs or other metadata in statistical operations.

Environment Information

OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT

Root Cause Analysis

This is a preliminary analysis and requires further investigation.

The NullPointerException occurs at:

CalciteScriptEngine$ScriptInputGetter.tryConvertDocValue(CalciteScriptEngine.java:214)

Detailed Analysis:

  1. Missing Metadata Fields in fieldTypes Map: When aggregations are pushed down in CalciteLogicalIndexScan.pushDownAggregate() (line 287), the code retrieves field types using:

    Map<String, ExprType> fieldTypes = this.osIndex.getFieldTypes();
  2. getFieldTypes() Excludes Metadata: The getFieldTypes() method (OpenSearchIndex.java:134-148) only returns fields from the index mapping. It does NOT include metadata fields like _id, _index, _score, etc.

  3. Metadata Fields Are Separate: Metadata fields are stored separately in getReservedFieldTypes() (OpenSearchIndex.java:151-153) which returns METADATAFIELD_TYPE_MAP.

  4. Script Execution Failure: When the aggregation script executes and tries to access _id:

    • Line 190: ExprType exprType = fieldTypes.get("_id") returns null
    • Line 207: This null is passed to tryConvertDocValue(fieldValueExpr, exprType)
    • Line 214: The switch statement cannot handle null, causing NullPointerException

Code Locations:

  • Bug Location: /opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (line 287)
  • Error Location: /opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java (lines 190, 207, 214)

Proposed Fix

This is a preliminary analysis and requires further investigation.

Modify CalciteLogicalIndexScan.java at line 287 to include reserved/metadata fields:

// Current (buggy) code:
Map<String, ExprType> fieldTypes = this.osIndex.getFieldTypes();

// Proposed fix:
Map<String, ExprType> fieldTypes = new LinkedHashMap<>(this.osIndex.getFieldTypes());
fieldTypes.putAll(this.osIndex.getReservedFieldTypes());

This pattern already exists in OpenSearchTypeFactory.convertSchema() (lines 313-314) and ensures metadata fields are included in the fieldTypes map used for script generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languagebugSomething isn't workingpushdownpushdown related issues

    Type

    No type

    Projects

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions