Skip to content

Suggestion for a speed up using ODatabaseImport (and OETLJsonExtractor) #10546

@sblommers

Description

@sblommers

Hi there!

First of all thank you for OrientDB :-)

We are having some trouble when importing large amount of exported data from older 2.2.36 into 3.2.44 so we started profiling a bit. When doing so we found that when reading a large import file to import a full export the BufferedReader is used. This is an obviously choice and also threadsafe. but it also uses a lot of locking and unlocking putting a slow down on the import process.

We created a partial export of 800Mb gzipped with 5M records and started importing on 3.2.44 first with JDK8 and later with JDK21 and saw a jump in speed but not yet happy. Then in OJSONReader we replaced
this.in = new BufferedReader(iIn)
to
this.in = new UnsynchronizedBufferedReader(iIn);

from apache-commons
https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/UnsynchronizedBufferedReader.html

With the slice of 5M records on a single run (including some indexes) we saw a change from
Database import completed in 1329061 ms
to
Database import completed in 1096512 ms

This does include the indexes we somehow cannot skip even when using -rebuildIndexes=false

The suggestion; Since the import doesn't need a thread-safe implementation of the BufferedReader (OJSONReader constructor) we can shave off some waiting time. This will most likely also have a positive impact on the OETLJsonExtractor

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions