Skip to content

RumbleDB 1.23.0 "Mountain ash" beta

Choose a tag to compare

@mschoeb mschoeb released this 26 Mar 14:33
· 1018 commits to master since this release
9fcf2e0

Update (July 3, 2025): Spark 4.0 support is available.

Use RumbleDB to query data with JSONiq, even data that does not fit in DataFrames.

Try-it-out sandbox: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb

Instructions to get started: https://rumble.readthedocs.io/en/latest/Getting%20started/

Supported versions
RumbleDB 1.23 supports Spark 3.5 with Scala 2.13 as well as Spark 4.0.
The jars are compatible with Java 11 and 17. As we are increasingly focusing our efforts towards the Spark 4 release and stability and conformance improvements and as Spark 4 is based on Scala 2.13, RumbleDB 1.23 support for Spark 3.4 as well as Scala 2.12 is dropped. Please use RumbleDB 1.22, which is stable, if you use Spark 3.4 or Spark 3.5 with Scala 2.12.

The standalone jar contains Spark 3.5 with Scala 2.13 and will thus just work.

General

  • Dropped support for Scala 2.12.
  • Dropped support for Spark 3.4
  • Renamed json-file() to json-lines(), old name can still be used for now but is marked deprecated
  • Added support for single quotes '. Strings with single quotes may contain double quotes ", but single quotes inside need to be escaped using \'. Analogous, strings with double quotes may contain single quotes, but double quotes inside need to be escaped using \"
  • Add support for some popular features of pandas/numpy libraries

JSONiq 3.1

Added option to use JSONiq 3.1 which brings changes to the JSONiq 1.0 spec to align it closer with XQuery 3.1. Enabling the option results in the following changes:

  • Objects and Arrays now have no effective boolean value and throw an error when checked
  • Keys for objects must be quoted
  • atomic is replaced by anyAtomicType
  • Remove JNDY0003 and replace it with XQDY0137
  • Both the JSONiq and XQuery parsers are available. The parser to use can be selected on the command line or with a language declaration in the query file.

Basic XML/XQuery support for both parsers

  • Add doc() function for reading an XML document
  • Add a new xml-files() function that allows for reading and processing of multiple .xml files in parallel
  • Add XPath steps for navigating XML documents. We are able to navigate through 32+ GB of XML data spread over many documents in just a few minutes on an Amazon EMR cluster.
  • Add data() function for atomization of nodes

Experimental XQuery Parser

Updated option to use XQuery parser instead of JSONiq. To use it, just prefix your query with xquery version "3.1";. Note: this is in a very early state and many features are still missing.

  • Context item is "." as opposed to "$$" from JSONiq
  • No JSONiq ObjectLookups with "."
  • No JSONiq ArrayLookup and ArrayUnboxing
  • Support for XQuery Map constructor and curly Array constructor
  • Support for String Lookup on Maps and Integer lookup on arrays with the ? operator

Minor Improvements and Bug fixes

  • subsequence and sequencelookups now use Spark pagination for large positions
  • Rumble shell now keeps history of previous sessions
  • Implements compare() with arities 2 and 3
  • Implements trace() arity 2
  • Implements xs:numeric
  • Adds support for setting base-uri in query and as CLI option
  • Implement FOAR0002, FOAY0001, FOTY0013, FODT0001, FODT0002, XPTY0018, XPTY0019, XQST0032
  • Increase decimal multiplication precision to 18 digits
  • Fixes index lookup with an index >= 1'000'000 throwing an error and incorrect behaviour with non-integer
  • Fixes calling parallelize on an already parallelized structure throwing an error
  • Fixes index lookup with decimal not adhering to spec
  • Fixes unnecessary warning shown when
  • Fixes effective boolean value of NaN and decimals equal to 0
  • Fixes stringToCodepoints() on multibyte ranges
  • Fixes indexof() shoudn't find NaN
  • Fixes some base64 errors
  • Fixes some edgecases in pow, log10, exp10, atan
  • Fixes resolveUri with empty baseUri
  • Fixes some incorrect exceptions of matches()
  • Fixes sum() with zeroElement not behaving correctly if sequence is non-empty
  • Fixes idiv and imult handling of inf and NaN
  • Fixes inner focus sometimes missing in simpleMap
  • Fixes bug allowing missing commas between function arguments