Skip to content

[BUG] JSON arrays can not be queried as expected (return NULL or unsupported data type: JString(nested)) #750

Open
@salyh

Description

@salyh

What is the bug?
While working on #669 I noticed that JSON arrays, indexed into OpenSearch, always return NULL when queried via spark. JSON objects and primitves are working as expected.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Index the following documents into OpenSearch with default dynamic mapping:
curl localhost:9200/json/_bulk -H 'content-type: application/json' -d '
{"index":{"_id":"1"}}
{"id":3,"name":"Bob Smith","title":null,"projects":[{"name":"SQL Spectrum querying","started_year":1990},{"name":"SQL security","started_year":1999},{"name":"OpenSearch security","started_year":2015}]}
{"index":{"_id":"2"}}
{"id":4,"name":"Susan Smith","title":"Dev Mgr","projects":[]}
{"index":{"_id":"3"}}
{"id":6,"name":"Jane Smith","title":"Software Eng 2","projects":[{"name":"SQL security","started_year":1998},{"name":"Hello security","started_year":2015,"address":[{"city":"Dallas","state":"TX"}]}]}
{"index":{"_id":"4"}}
{"id":7,"name":"Jane Smith2","title":"Software Eng 22","projectsasobject":{"name":"SQL security","started_year":1998}}
'
  1. Connect Spark to OpenSearch
  2. Open a Spark Shell
./spark-bin/bin/spark-shell -c spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog \
  --packages "org.opensearch:opensearch-spark-standalone_2.12:0.6.0-SNAPSHOT,org.opensearch:opensearch-spark-ppl_2.12:0.6.0-SNAPSHOT" \
  --conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions,org.opensearch.flint.spark.FlintSparkExtensions" 
  1. Run val dfp = spark.sql("source=dev.default.json"); dfp.show()
+-----------+--------------------+---+--------+---------------+                 
|       name|    projectsasobject| id|projects|          title|
+-----------+--------------------+---+--------+---------------+
|  Bob Smith|                NULL|  3|    NULL|           NULL|
|Susan Smith|                NULL|  4|    NULL|        Dev Mgr|
| Jane Smith|                NULL|  6|    NULL| Software Eng 2|
|Jane Smith2|{SQL security, 1998}|  7|    NULL|Software Eng 22|
+-----------+--------------------+---+--------+---------------+

Here in the first three row no NULL is expected

If the projects field is mapped to ["nested" type](Here in the first three row no NULL is expected) like

{
  "mappings" : {
    "properties": {
      "projects": {
        "type" : "nested"
      }
    }
  }
}

then an error is thrown:

scala> val dfp = spark.sql("source=dev.default.json"); dfp.show()
java.lang.IllegalStateException: unsupported data type: JString(nested)

Metadata

Metadata

Assignees

No one assigned

    Labels

    0.6Lang:PPLPipe Processing Language supportbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions