Open
Description
What is the bug?
While working on #669 I noticed that JSON arrays, indexed into OpenSearch, always return NULL when queried via spark. JSON objects and primitves are working as expected.
How can one reproduce the bug?
Steps to reproduce the behavior:
- Index the following documents into OpenSearch with default dynamic mapping:
curl localhost:9200/json/_bulk -H 'content-type: application/json' -d '
{"index":{"_id":"1"}}
{"id":3,"name":"Bob Smith","title":null,"projects":[{"name":"SQL Spectrum querying","started_year":1990},{"name":"SQL security","started_year":1999},{"name":"OpenSearch security","started_year":2015}]}
{"index":{"_id":"2"}}
{"id":4,"name":"Susan Smith","title":"Dev Mgr","projects":[]}
{"index":{"_id":"3"}}
{"id":6,"name":"Jane Smith","title":"Software Eng 2","projects":[{"name":"SQL security","started_year":1998},{"name":"Hello security","started_year":2015,"address":[{"city":"Dallas","state":"TX"}]}]}
{"index":{"_id":"4"}}
{"id":7,"name":"Jane Smith2","title":"Software Eng 22","projectsasobject":{"name":"SQL security","started_year":1998}}
'
- Connect Spark to OpenSearch
- Open a Spark Shell
./spark-bin/bin/spark-shell -c spark.sql.catalog.dev=org.apache.spark.opensearch.catalog.OpenSearchCatalog \
--packages "org.opensearch:opensearch-spark-standalone_2.12:0.6.0-SNAPSHOT,org.opensearch:opensearch-spark-ppl_2.12:0.6.0-SNAPSHOT" \
--conf "spark.sql.extensions=org.opensearch.flint.spark.FlintPPLSparkExtensions,org.opensearch.flint.spark.FlintSparkExtensions"
- Run
val dfp = spark.sql("source=dev.default.json"); dfp.show()
+-----------+--------------------+---+--------+---------------+
| name| projectsasobject| id|projects| title|
+-----------+--------------------+---+--------+---------------+
| Bob Smith| NULL| 3| NULL| NULL|
|Susan Smith| NULL| 4| NULL| Dev Mgr|
| Jane Smith| NULL| 6| NULL| Software Eng 2|
|Jane Smith2|{SQL security, 1998}| 7| NULL|Software Eng 22|
+-----------+--------------------+---+--------+---------------+
Here in the first three row no NULL is expected
If the projects field is mapped to ["nested" type](Here in the first three row no NULL is expected) like
{
"mappings" : {
"properties": {
"projects": {
"type" : "nested"
}
}
}
}
then an error is thrown:
scala> val dfp = spark.sql("source=dev.default.json"); dfp.show()
java.lang.IllegalStateException: unsupported data type: JString(nested)