Bulk Loading Highly Connected Data #3434

amaster507 · 2022-02-10T21:07:57Z

amaster507
Feb 10, 2022

I have a few million "rows" of data coming from Dgraph (or my original MySQL) that I am looking at needing to bulk import into EdgeDB once I figure out (or wait for) GBAC ACL in current RFC.

In Dgraph they have bulk/live loading. The difference is that bulk loading is for a brand new cluster, where live loading is going to a cluster running live. The format of the data was a simple RDF format. Dgraph had the concept of blank nodes which allowed me to reference the same object later on in the RDF file without needing to know what identifier the cluster would assign it later on.

The movie database sample might look like this in RDF:

  _:movie_1 <Movie.name> "Dune" .
  _:movie_2 <Movie.name> "Avengers" .
  _:actor_1 <Person.name> "Iron Man" .
  _:movie_2 <Movie.actors> _:actor_1 .

The _:foo syntax is their "blank node" where foo is any [a-zA-Z0-9_-]

These "blank nodes" could be stored also in the database allow later live loaders to reference the same object identifier again without needing to know the actual identifier assigned during the last loading. Or they could be discarded and the next live load would insert a whole new set of interconnected data but not connected to the first set of data.

This might be a little challenging in EdgeDB because it is always a strongly typed system whereas Dgraph did both strong types in its GraphQL API and loose types or even no predefined types in it's underlying DQL language syntax. It was able to do this because it used a key-value store underneath to store the data instead of a rdbmd with set tables and columns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Loading Highly Connected Data #3434

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Bulk Loading Highly Connected Data #3434

Uh oh!

amaster507 Feb 10, 2022

Replies: 0 comments

amaster507
Feb 10, 2022