-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Query Information
PPL Command/Query:
Example 1: Struct field with backticks (fails)
source=test_orders | lookup test_products product_id append `info`
Example 2: Nested field with spaces and backticks (fails)
source=test_orders | lookup test_products product_id append `info.Brand Name`
Example 3: Nested field with spaces without backticks (syntax error)
source=test_orders | lookup test_products product_id append info.Brand Name
Expected Result:
- Example 1 should append the entire
info
struct to results - Example 2 should append only the
Brand Name
field from theinfo
struct - Example 3 may provide an error message indicating backticks are required. At a minimum, it should be more clearly articulated in the documentation.
Actual Result:
Example 1 Error:
{
"error": {
"reason": "Invalid Query",
"details": "field [`info`] not found; input fields are: [name, info, info.Brand Name, info.Price USD, info.Product Name, product_id, _id, _index, _score, _maxscore, _sort, _routing]",
"type": "IllegalArgumentException"
},
"status": 400
}
Example 2 Error:
{
"error": {
"reason": "Invalid Query",
"details": "field [`info.Brand Name`] not found; input fields are: [name, info, info.Brand Name, info.Price USD, info.Product Name, product_id, _id, _index, _score, _maxscore, _sort, _routing]",
"type": "IllegalArgumentException"
},
"status": 400
}
Example 3 Error:
{
"error": {
"reason": "Invalid Query",
"details": "[Name] is not a valid term at this part of the query: '...d append info.Brand Name' <-- HERE. extraneous input 'Name' expecting <EOF>",
"type": "SyntaxCheckException"
},
"status": 400
}
Note: The error messages show that the field "info.Brand Name" exists in the available fields list, but when referenced with backticks as `info.Brand Name`
, the system searches for a field literally named `info.Brand Name`
(with backticks included).
Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (details below)
Index Mapping
test_orders:
{
"mappings": {
"properties": {
"order_id": { "type": "keyword" },
"product_id": { "type": "keyword" },
"quantity": { "type": "integer" }
}
}
}
test_products:
{
"mappings": {
"properties": {
"product_id": { "type": "keyword" },
"name": { "type": "text" },
"info": {
"properties": {
"Product Name": { "type": "text" },
"Brand Name": { "type": "text" },
"Price USD": { "type": "float" }
}
}
}
}
}
Sample Data
test_orders:
{
"order_id": "O1",
"product_id": "P1",
"quantity": 2
}
test_products:
{
"product_id": "P1",
"name": "Widget",
"info": {
"Product Name": "Super Widget",
"Brand Name": "WidgetCo",
"Price USD": 19.99
}
}
Bug Description
Issue Summary:
Backticks in field identifiers are not properly handled in the PPL lookup
command's append
and replace
clauses. When a field name is enclosed in backticks, the system treats the backticks as part of the field name rather than as identifier delimiters, causing field resolution to fail. This creates an impossible situation for fields containing spaces or special characters, as they require backticks for syntax but fail when backticks are used.
Steps to Reproduce:
- Create two indices with a common join key and a struct field containing nested fields with spaces
- Execute:
source=test_orders | lookup test_products product_id append info
(works correctly) - Execute:
source=test_orders | lookup test_products product_id append
info`` (fails - field not found) - Execute:
source=test_orders | lookup test_products product_id append
info.Brand Name`` (fails - field not found) - Execute:
source=test_orders | lookup test_products product_id append info.Brand Name
(fails - syntax error due to space) - Observe that nested fields with spaces cannot be accessed at all in lookup commands
Impact:
- Critical: Users cannot access nested struct fields with spaces in their names via lookup commands
- Creates a catch-22: fields with spaces require backticks for valid syntax, but backticks cause field resolution to fail
- Inconsistent behavior compared to other PPL commands (fields, stats, eval, etc.) that properly handle backticked identifiers
- No workaround exists for accessing nested fields with spaces in lookup operations
- Affects real-world data schemas where field names commonly contain spaces (e.g., OpenTelemetry attributes, product catalogs, metadata fields)
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
This issue affects the PPL lookup command specifically. Other PPL commands (fields, stats, eval, etc.) properly handle backticked identifiers. For comparison, this query works correctly:
source=test_products | fields `info.Brand Name`
Tentative Root Cause
This is a preliminary analysis and requires further investigation.
The root cause is in the buildFieldAliasMap
method in /ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
at lines 820-829:
private java.util.Map<String, String> buildFieldAliasMap(
List<LookupPairContext> lookupPairContext) {
return lookupPairContext.stream()
.collect(
Collectors.toMap(
pair -> pair.inputField.getText(), // Line 825 - BUG: doesn't strip backticks
pair -> pair.AS() != null ? pair.outputField.getText() : pair.inputField.getText(), // Line 826 - BUG
(x, y) -> y,
LinkedHashMap::new));
}
The method uses getText()
directly on the field expression context, which returns the raw text including backticks. Other parts of the codebase (e.g., lines 393, 442, 470 in the same file) properly use StringUtils.unquoteIdentifier()
to strip quotes and backticks from identifiers.
The StringUtils.unquoteIdentifier()
method (in /common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java
lines 72-78) is specifically designed to remove backticks:
public static String unquoteIdentifier(String identifier) {
if (isQuoted(identifier, "`")) {
return identifier.substring(1, identifier.length() - 1);
} else {
return identifier;
}
}
Tentative Proposed Fix
This is a preliminary analysis and requires further investigation.
Modify the buildFieldAliasMap
method in AstBuilder.java
to use StringUtils.unquoteIdentifier()
:
private java.util.Map<String, String> buildFieldAliasMap(
List<LookupPairContext> lookupPairContext) {
return lookupPairContext.stream()
.collect(
Collectors.toMap(
pair -> StringUtils.unquoteIdentifier(pair.inputField.getText()),
pair -> pair.AS() != null
? StringUtils.unquoteIdentifier(pair.outputField.getText())
: StringUtils.unquoteIdentifier(pair.inputField.getText()),
(x, y) -> y,
LinkedHashMap::new));
}
This change aligns with how other commands in the same file handle identifiers and ensures backticks are properly stripped before field resolution.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status