Bulk Loading Highly Connected Data #3434
amaster507
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a few million "rows" of data coming from Dgraph (or my original MySQL) that I am looking at needing to bulk import into EdgeDB once I figure out (or wait for) GBAC ACL in current RFC.
In Dgraph they have bulk/live loading. The difference is that bulk loading is for a brand new cluster, where live loading is going to a cluster running live. The format of the data was a simple RDF format. Dgraph had the concept of blank nodes which allowed me to reference the same object later on in the RDF file without needing to know what identifier the cluster would assign it later on.
The movie database sample might look like this in RDF:
The
_:foosyntax is their "blank node" wherefoois any[a-zA-Z0-9_-]These "blank nodes" could be stored also in the database allow later live loaders to reference the same object identifier again without needing to know the actual identifier assigned during the last loading. Or they could be discarded and the next live load would insert a whole new set of interconnected data but not connected to the first set of data.
This might be a little challenging in EdgeDB because it is always a strongly typed system whereas Dgraph did both strong types in its GraphQL API and loose types or even no predefined types in it's underlying DQL language syntax. It was able to do this because it used a key-value store underneath to store the data instead of a rdbmd with set tables and columns.
Beta Was this translation helpful? Give feedback.
All reactions