-
Notifications
You must be signed in to change notification settings - Fork 15
Description
There are currently several ways to serialize SmSn graphs to the file system, but the most important for everyday use is the so-called VCS serialization. Every atom corresponds to a file, in a directory corresponding to the logical datasource associated with the atom. Because every representation in SmSn is a set of atoms, this results in a huge number of files. However, a bigger problem is the fact that the entire graph must be synced with the file system at once. This takes significant time (several minutes), and requires the user to stop what they are doing and very consciously attend to the synchronization process. It's a major barrier for adoption, especially vis-a-vis solutions like Org-mode which sync to the file system directly. It is also somewhat of a liability to use Neo4j as the source of truth for SmSn data in between sync operations. I have personally experienced major data loss and corruption when Neo4j silently failed for some reason or other, and too much time elapsed between syncs.
Fixing this problem should actually result in a much simpler solution than the current one. Going forward, there will be a configurable source of truth which could be Neo4j or another TinkerPop-enabled graph DB, but also could be another data store such as the file system. The latter will be the default, and the former might be added again later (SmSn is not up to date with recent versions of Neo4j). No bulk sync operations will be necessary when the file system is the SoT, and the user will be free to place the data directory under version control using a solution of their choice. Much as we do now, we will provide a starter kit using Git as the version control solution.
cc @jmatsushita