Skip to content

Performance of referencing library vs deprecated jsonschema.RefResolver is very bad when there are a lot of references in schema #178

Open
@nathan-stender

Description

@nathan-stender

Hello!

I have a library that formats scientific data into a JSON schema called the Allotrope Standard Model (ASM)

The validation schemas are fairly large and complicated compared to other schemas I've seen in discussion boards, and are very modular, meaning there are a lot of references. In allotropy we store the ASM schemas directly, and remove all remote references, replacing them with local references under $defs.

We are finding that validating against the schemas using jsonschema version 4.18.0 takes ~20x longer than 4.17.0.

As a concrete example:

Validating this data: https://raw.githubusercontent.com/Benchling-Open-Source/allotropy/refs/heads/main/tests/parsers/moldev_softmax_pro/testdata/MD_SMP_luminescence_endpoint_example08.json

Against this schema: https://github.com/Benchling-Open-Source/allotropy/blob/main/src/allotropy/allotrope/schemas/adm/plate-reader/REC/2024/06/plate-reader.schema.json

takes ~3.5s on 4.17.0 and ~55s on 4.18.0

This translates to a runtime for all 26 tests in tests/parsers/moldev_softmax_pro of ~30s in 4.17.0 to ~6m in 4.18.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions