Skip to content

[BUG] Backticks not properly handled in PPL lookup command append/replace clauses #4516

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

Example 1: Struct field with backticks (fails)

source=test_orders | lookup test_products product_id append `info`

Example 2: Nested field with spaces and backticks (fails)

source=test_orders | lookup test_products product_id append `info.Brand Name`

Example 3: Nested field with spaces without backticks (syntax error)

source=test_orders | lookup test_products product_id append info.Brand Name

Expected Result:

  • Example 1 should append the entire info struct to results
  • Example 2 should append only the Brand Name field from the info struct
  • Example 3 may provide an error message indicating backticks are required. At a minimum, it should be more clearly articulated in the documentation.

Actual Result:

Example 1 Error:

{
  "error": {
    "reason": "Invalid Query",
    "details": "field [`info`] not found; input fields are: [name, info, info.Brand Name, info.Price USD, info.Product Name, product_id, _id, _index, _score, _maxscore, _sort, _routing]",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Example 2 Error:

{
  "error": {
    "reason": "Invalid Query",
    "details": "field [`info.Brand Name`] not found; input fields are: [name, info, info.Brand Name, info.Price USD, info.Product Name, product_id, _id, _index, _score, _maxscore, _sort, _routing]",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Example 3 Error:

{
  "error": {
    "reason": "Invalid Query",
    "details": "[Name] is not a valid term at this part of the query: '...d append info.Brand Name' <-- HERE. extraneous input 'Name' expecting <EOF>",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

Note: The error messages show that the field "info.Brand Name" exists in the available fields list, but when referenced with backticks as `info.Brand Name`, the system searches for a field literally named `info.Brand Name` (with backticks included).

Dataset Information

Dataset/Schema Type

  • OpenTelemetry (OTEL)
  • Simple Schema for Observability (SS4O)
  • Open Cybersecurity Schema Framework (OCSF)
  • Custom (details below)

Index Mapping

test_orders:

{
  "mappings": {
    "properties": {
      "order_id": { "type": "keyword" },
      "product_id": { "type": "keyword" },
      "quantity": { "type": "integer" }
    }
  }
}

test_products:

{
  "mappings": {
    "properties": {
      "product_id": { "type": "keyword" },
      "name": { "type": "text" },
      "info": {
        "properties": {
          "Product Name": { "type": "text" },
          "Brand Name": { "type": "text" },
          "Price USD": { "type": "float" }
        }
      }
    }
  }
}

Sample Data

test_orders:

{
  "order_id": "O1",
  "product_id": "P1",
  "quantity": 2
}

test_products:

{
  "product_id": "P1",
  "name": "Widget",
  "info": {
    "Product Name": "Super Widget",
    "Brand Name": "WidgetCo",
    "Price USD": 19.99
  }
}

Bug Description

Issue Summary:
Backticks in field identifiers are not properly handled in the PPL lookup command's append and replace clauses. When a field name is enclosed in backticks, the system treats the backticks as part of the field name rather than as identifier delimiters, causing field resolution to fail. This creates an impossible situation for fields containing spaces or special characters, as they require backticks for syntax but fail when backticks are used.

Steps to Reproduce:

  1. Create two indices with a common join key and a struct field containing nested fields with spaces
  2. Execute: source=test_orders | lookup test_products product_id append info (works correctly)
  3. Execute: source=test_orders | lookup test_products product_id append info`` (fails - field not found)
  4. Execute: source=test_orders | lookup test_products product_id append info.Brand Name`` (fails - field not found)
  5. Execute: source=test_orders | lookup test_products product_id append info.Brand Name (fails - syntax error due to space)
  6. Observe that nested fields with spaces cannot be accessed at all in lookup commands

Impact:

  • Critical: Users cannot access nested struct fields with spaces in their names via lookup commands
  • Creates a catch-22: fields with spaces require backticks for valid syntax, but backticks cause field resolution to fail
  • Inconsistent behavior compared to other PPL commands (fields, stats, eval, etc.) that properly handle backticked identifiers
  • No workaround exists for accessing nested fields with spaces in lookup operations
  • Affects real-world data schemas where field names commonly contain spaces (e.g., OpenTelemetry attributes, product catalogs, metadata fields)

Environment Information

OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT

Additional Details:
This issue affects the PPL lookup command specifically. Other PPL commands (fields, stats, eval, etc.) properly handle backticked identifiers. For comparison, this query works correctly:

source=test_products | fields `info.Brand Name`

Tentative Root Cause

This is a preliminary analysis and requires further investigation.

The root cause is in the buildFieldAliasMap method in /ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java at lines 820-829:

private java.util.Map<String, String> buildFieldAliasMap(
    List<LookupPairContext> lookupPairContext) {
  return lookupPairContext.stream()
      .collect(
          Collectors.toMap(
              pair -> pair.inputField.getText(),  // Line 825 - BUG: doesn't strip backticks
              pair -> pair.AS() != null ? pair.outputField.getText() : pair.inputField.getText(),  // Line 826 - BUG
              (x, y) -> y,
              LinkedHashMap::new));
}

The method uses getText() directly on the field expression context, which returns the raw text including backticks. Other parts of the codebase (e.g., lines 393, 442, 470 in the same file) properly use StringUtils.unquoteIdentifier() to strip quotes and backticks from identifiers.

The StringUtils.unquoteIdentifier() method (in /common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java lines 72-78) is specifically designed to remove backticks:

public static String unquoteIdentifier(String identifier) {
  if (isQuoted(identifier, "`")) {
    return identifier.substring(1, identifier.length() - 1);
  } else {
    return identifier;
  }
}

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.

Modify the buildFieldAliasMap method in AstBuilder.java to use StringUtils.unquoteIdentifier():

private java.util.Map<String, String> buildFieldAliasMap(
    List<LookupPairContext> lookupPairContext) {
  return lookupPairContext.stream()
      .collect(
          Collectors.toMap(
              pair -> StringUtils.unquoteIdentifier(pair.inputField.getText()),
              pair -> pair.AS() != null 
                  ? StringUtils.unquoteIdentifier(pair.outputField.getText()) 
                  : StringUtils.unquoteIdentifier(pair.inputField.getText()),
              (x, y) -> y,
              LinkedHashMap::new));
}

This change aligns with how other commands in the same file handle identifiers and ensures backticks are properly stripped before field resolution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing language

    Type

    No type

    Projects

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions