Skip to content

Performance: inefficient usage of JSON providers/builders #12362

@poikilotherm

Description

@poikilotherm

What steps does it take to reproduce the issue?

Profiling JsonPrinter for conversion of tabular data revealed the inefficiencies of Jakarta JSON-P and our naive usage of it:

  1. Create a unit test to export a lot of entities with the printer
  2. Run the test with enabled profiling
  3. Watch how everytime a JsonObjectBuilder or JsonArrayBuilder a classpath scan is triggered by Jakarta JSON-P, as it will try to reload the implementation via ServiceLoader.
  • When does this issue occur?

Everytime you try to create massive amounts of JSON data with JSON-P.

  • Which page(s) does it occurs on?

Mostly API.

  • What happens?

Every time we use Json.createX() in our codebase, a ServiceLoader will be triggered, trying to load a Jakarta JSON-P implementation. In addition, any frameworks using that (like JAX-RS) may be affected by the same problem.

Most often, the performance hit is negligible. This only arises as a problem in cases where we do a lot of these calls in a row or in parallel. Thus, instances serving a lot of users will suffer plus any time we deal with larger quantities of things, like during exports of tabular data.

This is a well known fact and we should apply counter measures:

  • To whom does it occur (all users, curators, superusers)?

Anyone using any of the API functionality.

  • What did you expect to happen?

Caching of Factories should be applied, avoiding the service loader extra mile.
The simplest fix for the example smoketest shown in the screenshot below, adding such a static cache for NullSafeJsonBuilder cuts the conversion time in half!

Which version of Dataverse are you using?

v6.10.1 / develop, but this is a long standing issue, as it is a shortcoming of the Jakarta API we use. It's up to us to optimize.

Any related open or closed issues to this bug report?

#11405 was the reason to look into this, as we discovered problems exporting datasets with larger amounts of tabular data.

Screenshots:

Here's a screenshot of converting a dataset with 100 files with 1000 data variables each. Building all POJOs taks 500msec, conversion to JSON in-memory model takes 10,000msec and converting the result to String takes 148msec.

Image

The culprit clearly is our naive usage of Json.createX(). Here's how this looks like after adding a factory cache to ourNullSafeJsonBuilder, cutting the conversion time in half:

Image

Note how the JSON array builder problem persists, as it's not optimized yet. This is especially bad, as even if there is no "Variable Metadata" in an installation at all, the JSON array builder is still created every time, but stays empty!

Are you thinking about creating a pull request for this issue?
Absolutely, this is an easy one.

  • Create a caching facade in our utils
  • Replace any direct Json.createX() call
  • Document in developer guide
  • Optional: using https://github.com/policeman-tools/forbidden-apis disallow any direct usage of Json.createX() in our code
  • Optional: optimize JAX-RS usage of JSON-P by leveraging MessageBodyReader and MessageBodyWriter.

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

No status

Status

WIP

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions