-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Query Information
PPL Command/Query:
# This works - returns 3 results with computed NameLen field
source=test-nested-eval-filter | eval NameLen=LENGTH(items.name) | fields id, items.name, NameLen
# This fails - returns 0 results (should return 2: banana and orange with length > 5)
source=test-nested-eval-filter | eval NameLen=LENGTH(items.name) | fields id, items.name, NameLen | where NameLen > 5
Expected Result:
The query should return 2 documents where the computed NameLen
field (length of items.name
) is greater than 5:
- order2 with banana (length 6)
- order3 with orange (length 6)
Actual Result:
The query returns 0 results:
{
"schema": [...],
"datarows": [],
"total": 0,
"size": 0
}
Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (details below)
Index Mapping
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"items": {
"type": "nested",
"properties": {
"name": {"type": "keyword"}
}
}
}
}
}
Sample Data
{"id":"order1","items":[{"name":"apple"}]}
{"id":"order2","items":[{"name":"banana"}]}
{"id":"order3","items":[{"name":"orange"}]}
Bug Description
Issue Summary:
Filtering on computed fields (created with eval
) that reference nested field paths returns zero results, even though the computed field is correctly calculated when not filtered. The issue is specific to nested fields - the same pattern works correctly with regular (non-nested) fields.
Steps to Reproduce:
- Create an index with a nested field mapping
- Insert documents with nested objects
- Run
eval
to create a computed field based on a nested field path (e.g.,LENGTH(items.name)
) - Add a
where
clause to filter on the computed field - Observe that the query returns 0 results instead of the expected filtered results
Minimal Reproduction:
# Create test index
curl -X PUT "localhost:9200/test-nested-eval-filter" -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"items": {
"type": "nested",
"properties": {
"name": {"type": "keyword"}
}
}
}
}
}'
# Insert test data
curl -X POST "localhost:9200/test-nested-eval-filter/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"id":"order1","items":[{"name":"apple"}]}
{"index":{"_id":"2"}}
{"id":"order2","items":[{"name":"banana"}]}
{"index":{"_id":"3"}}
{"id":"order3","items":[{"name":"orange"}]}
'
curl -X POST "localhost:9200/test-nested-eval-filter/_refresh"
# This works - returns 3 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-eval-filter | eval NameLen=LENGTH(items.name) | fields id, items.name, NameLen"
}'
# This fails - returns 0 results (should return 2)
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-nested-eval-filter | eval NameLen=LENGTH(items.name) | fields id, items.name, NameLen | where NameLen > 5"
}'
Verification with Regular Fields:
The same pattern works correctly with regular (non-nested) fields:
# Create index with regular field
curl -X PUT "localhost:9200/test-regular-eval-filter" -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"name": {"type": "keyword"}
}
}
}'
# Insert data
curl -X POST "localhost:9200/test-regular-eval-filter/_bulk" -H 'Content-Type: application/json' -d '
{"index":{"_id":"1"}}
{"id":"order1","name":"apple"}
{"index":{"_id":"2"}}
{"id":"order2","name":"banana"}
{"index":{"_id":"3"}}
{"id":"order3","name":"orange"}
'
curl -X POST "localhost:9200/test-regular-eval-filter/_refresh"
# This works correctly - returns 2 results
curl -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
"query": "source=test-regular-eval-filter | eval NameLen=LENGTH(name) | fields id, name, NameLen | where NameLen > 5"
}'
Impact:
This bug prevents users from filtering on computed fields that are based on nested field values, which is a common use case in observability data (OTEL, Jaeger) and other structured data with nested objects.
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
- Affects PPL queries using Calcite engine (V3)
- The issue is specific to nested fields; regular fields work correctly
- The
eval
command successfully computes values from nested fields when no filter is applied
Tentative Root Cause Analysis
This is a preliminary analysis and requires further investigation.
Core Issue
When a where
clause filters on a computed field that references a nested field, the filter is pushed down to OpenSearch as a script query. The script attempts to access the nested field value using doc values (via docProvider.get()
), but nested fields in OpenSearch do not have doc values. This causes the script to return null for all documents, making the filter evaluate to false for every document.
Code References
- Script Execution:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
(lines 226-275)ScriptDataContext.get(String name)
method (line 263)- Attempts to retrieve field values from
docProvider.get(name)
- Returns null when doc values are empty or missing (line 267)
@Override
public Object get(String name) {
// ...
ScriptDocValues<?> docValue = this.docProvider.get(name);
if (docValue == null || docValue.isEmpty()) {
return null; // No way to differentiate null and missing from doc value
}
// ...
}
- Alternative Access Method: The same class has a
getFromSource(String name)
method that could access nested fields from_source
, but it's not used by the generated Calcite code.
Why This Happens
- Doc Values Limitation: Nested fields in OpenSearch are stored differently and don't have doc values accessible via the standard
doc[]
API - Script Generation: The Calcite-generated code always uses the
DataContext.get()
method, which relies on doc values - No Nested Field Detection: The code doesn't detect when a field path references a nested field and switch to
_source
access - Silent Failure: When doc values are unavailable, the method returns null rather than falling back to
_source
access
Direct OpenSearch Query Verification
Testing the underlying OpenSearch script query directly confirms the issue:
curl -X POST "localhost:9200/test-nested-eval-filter/_search" -H 'Content-Type: application/json' -d '{
"query": {
"script": {
"script": {
"source": "doc[\"items.name\"].value.length() > 5",
"lang": "painless"
}
}
}
}'
Returns error:
"caused_by": {
"type": "illegal_state_exception",
"reason": "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
Possible Approach (Tentative - Needs Investiagtion)
The fix requires modifying how nested fields are accessed in filter scripts. Several approaches are possible:
Option 1: Fallback to Source Access
Modify ScriptDataContext.get()
to detect nested fields and fall back to _source
access:
- Detection: Check if the field path corresponds to a nested field in the index mapping
- Fallback: If doc values are unavailable and the field is nested, use
sourceLookup.get(name)
instead - Implementation Location:
/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
(ScriptDataContext class)
@Override
public Object get(String name) {
// ... existing UTC_TIMESTAMP handling ...
ScriptDocValues<?> docValue = this.docProvider.get(name);
if (docValue == null || docValue.isEmpty()) {
// Try to get from source for nested fields
Object sourceValue = this.sourceLookup.get(name);
if (sourceValue != null) {
return sourceValue;
}
return null;
}
// ... existing value processing ...
}
Option 2: Nested Field Detection in Code Generation
Modify the Calcite code generation to use getFromSource()
for nested fields:
- Detect nested fields during query planning
- Generate code that calls
getFromSource()
instead ofget()
for nested field paths - More complex but potentially more efficient
Option 3: Nested Query Wrapper
Instead of using script queries for nested fields, generate proper nested queries:
- Detect when filter expressions reference nested fields
- Generate OpenSearch nested query DSL instead of script queries
- Most efficient but requires significant refactoring
Implementation Considerations
- Performance:
_source
access is slower than doc values, but it's the only option for nested fields - Mapping Access: Need access to index mapping to detect nested fields
- Array Handling: Nested fields are arrays; need to handle multiple values correctly
- Backward Compatibility: Ensure the fix doesn't break existing functionality for regular fields
Related Issues
- [BUG] PPL dedup command returns zero results on nested field paths #4487 - PPL dedup command returns zero results on nested field paths (different manifestation of nested field handling issues)
- [FEATURE] Support nested (object) fields with Calcite #3452 - Support nested (object) fields with Calcite
Workaround
No easy workaround available. The issue is fundamental to how nested fields are accessed in filter scripts. Possible alternatives:
- Restructure Data: Flatten nested structures into regular fields (not always feasible)
- Pre-filter Before Eval: Apply filters before computing fields (doesn't work when filter depends on computed value)
- Use Regular Fields: If possible, use non-nested field mappings
None of these are practical workarounds for most use cases involving nested data structures.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status