Revision support - Possibly move to rdf_entity 2.x

After a discussion today with @sandervd and @brummbar we came up with the following template/suggestion so that remarks can be added and a complete solution can be created.

Purpose of the issue: **Support revisions**

# Current state and problematic pieces
## A bit of history
Rdf entity provides a layer to support storing entities directly in the triplestore. The idea behind it is that every property of every field has a predicate URI mapped to it and this is used as a storage identifier for the database. Properties without a mapped URI do not get stored in the database and are simply skipped.
The other major factor of the module is that it uses graphs in order to store the entities separated by bundle. That means that each bundle has its own graph, rather than each entity type. This approach, while seemed nice at first, is not ideal as one cannot enforce that all objects (entities) that have a "specific schemantic meaning" will live under graphs split by their type.
## Where the problems start
### The last major factor - Triplets
The last major factor in our decisions is the triplestore (or quadstore as we use it) itself. A triplestore means that everything is described using triplets. This has advantages and disadvantages but what is really important for us is that it lacks a bit of flexibility against SQL in terms that you cannot have more than 3 "columns" to describe something. 
For example, in SQL, for each field, you have a table where each entry stores the entity_id, the revision_id, the delta, the value and other properties required by each field. That means that a structure is created and you store one entry for each delta of each revision of each entity.
In SPARQL however, you need to find a way to do this in triplets without breaking the structure of the entity, allowing to query properly (so no serialized cheats) and without breaking the ontology (keep a predicate per property). That means that you had to do something like
```
entity1 field1 en-uk
entity1 field1 0
entity1 field1 test_value
entity1 field1 en-us
entity1 field1 2
entity1 field1 test_value2
```
Only by looking the above, the problem already exists since if you query for the field1 of entity1 you already don't know which property belongs to which delta. The sequence of the data stored is also not a way to distinguish as triplestore does not return results or stores them as you give them.

While the delta specific problem is for another issue, this remains here for the following sections.
### The string ID
One of the thing that makes the scemantic web so appealing is the identification of its objects (or entities) by a unique URI. For us, that means that unlike nodes, we are using URIs for identifying the entities. This also brought up many issues in the past, as many modules did not yet support the string IDs but we got over that. Why is that important here though?
Back a year and a half, we also had to somehow support multiple versions in the Joinup project. As it is normal, the idea to support revisions just like core does was one of the ideas. However, there were a few issues here:
* The fact that we are using a graph per bundle, means that all properties were to reside under the same graph.
* Scemantic web is a way of describing entities. For the triplestore (or quadstore) that we use, that means that each property can have a triplet of properties to describe it. As it is described in the previous section, that means that since all versions of the entity would share the same URI, the entities themselves would have no way of distinguishing which property belongs to which version.
* Different IDs cannot exist for the same entity so an entity with ID `http://example.com/rdf/1` cannot automatically have `http://example.com/rdf/1/version/2` as the implications are multiple. The same id can be the same id of another entity (not a revision) and the queries would be a nightmare if we had to concatenate ids.
### Rdf Draft
That is when the rdf_draft module came into play. The need in Joinup was that only up to two revisions can exist at any given time, a published and an unpublished one. Since the need for a history of changes was not a requirement, the solution came with the graphs themselves. 
For each bundle, a second graph was created, separating the two entities and giving the option for a publication status on the entity. For us, those graphs took the form of `http://joinup.eu/<bunlde>/[published|draft]`. 
This is already a solution to many of the needs that might come up but that also came up with some limitations:
* The fact that we were already split bundles of an entity type in graphs means that we have to take care of many parameters when we try to perform CRUD operations on entities residing in specific states (i.e. we don't only query on entities in specific states).
* Workarounds had to be implemented also for other cases like supporting search_api natively.
* Limitations of states of an entity. While it covered the cases where 2 versions exists at the same time, it took a lot of work to only support once more, and while not the same effort is needed for any subsequent version, it still requires manual work in terms of development and quite a good understanding of the overall module in order to add any more. Even enabling the draft version requires a bit of manual work and understanding.
* The number of versions is finite. The idea that we are manually adding a new graph for each new state that we want our entity to have, means that we can only create a specific number and that this number is not automatically scalable. Revisions is a no go for this module.
* Split of the entity. Manual intervention is needed in order to query all versions. All graphs must be added in the query to allow querying them all. And that is regardless of whether a field is used like the moderation status field.

# Revisions
The idea behind supporting revisions involves a few ideas, parameters and a couple of compromises:
* Since we are going to support such a major issue, it has to be done in a way that is split from the main module, so that the main module can work independently.
* We are going to try and mimic the node revisions system as much as we can so that we can follow best practices as well as make it a bit more understandable to users.
* We are still going to split entities under specific graphs, however, we are going to make it simpler
## Drop of the rdf_draft module
The rdf_draft module is a nice implementation but should be a legacy of the past. Apart from the fact that a lot of issues have come up due to the multiple graphs we need to support, it will directly conflict with the implementation of revisions.
## Graph structure
Following the problems above, we are going to drop the support of a graph per bundle and follow the notion of the Drupal entities in version 8. Each entity type has a table which stores the base fields of the entity. Unlike Drupal, however, we are going to have everything within a specific graph.
Without addressing the revisions yet, that means that every entity type will only have one graph to look into for anything.
## Revisions
Revisions, like rdf_draft, will reside in a separate module. That would require the need that the storage class (or entity class) will be overridden by the new module in order to support new methods like the `::allRevisions()` (corresponding to the `::allRevisions()` from the NodeStorage class).
However, since we are trying to split the rdf_entity module already, we can use this module to simply include an interface and a revision trait for each entity type that is defined and wants to use the revision system.
## How issues will be addressed
For all the above issues, during the discussion we came up with the following structure details.
### Revision graph
The revision graph will also belong to a specific entity type, it can be defined in the annotation of the entity type or in the mapping entity that we currently support. Each entity type's graph will be solely for internal use and should not be exposed if there is an exposed endpoint.
The name of the graph is user defined.
### Identification of the entities
The revision graph will be a pool of data from all revisions of the entities. As nodes do, even the current revision will exist in the graph.
Since we define that revision graphs are solely for internal usage, the IDs of the entities can be arbitrary and different from the original IDs. This gives us the ability to create IDs like `http://<random alphanumeric string>.com/<entity type>/revision/<revision serial id>`. The serial number can be global or per entity. If global, since triplestore does not have serial numbering, it has to be stored in Drupal or be determined on the fly when a new revision is stored. The later is a better solution solely because migrating data will not break the structure.
### Connection with original entity
The idea is that the revision entities will use a property like the `dcat:isVersionOf` to link to the original content. Possible implications here is that the original entity might already have a property mapped to the `dcat:isVersionOf` predicate so probably another property might be used. Something like `<base_url>/drupalIsVerionOf`.
Additionally, the revisions sub module can define to all entity types that have a revision graph defined an additional base field mapped to something like `<base_url>/drupalRevisionId` which also maps back to the current revision ID. 
### Additional properties
Every revision should include the following properties apart from the `drupalIsVersionOf`:
* Revision ID. The idea is that this is the serial number that will also be used to construct the revision url.
* Revision timestamp. This will be used to determine when the revision was created and the order of the revisions.
* Revision updated. As with node revisions, this can be used by constraints to determine if the revision can be edited or another revision has been updated more recently.
### Drop support of states
RDF Draft enforces the idea of states within the rdf_entity module. However, the revisions is not necessarily the entity in a different state rather than a history of it. Further support by states can be attained but this will be irrelevant to the rdf entity structure and only relevant to the corresponding status field.
# Conclusion and Compromises
* While the idea to limit objects of a certain meaning under a specific graphs is already marked as not ideal, we can, in an organizational level, decide so. Keeping entities of the same ontology within a certain graph certainly could not always be the case but for sure keeps content a bit more organized and makes it more flexible to query and use.
* With these changes, we might need a new version. If so, it might be tough to have an upgrade path from version 1 to version 2 and Joinup might be stuck with version 1 at least for a while.
* The above description surely is far from what we have currently in Joinup but would also solve all issues that we had to support in the past in Joinup. 
** All queries can be supported by default.
** Entities existing in the published graph can be simply moved over to the main entity type graph and the update path is complete.
** Enabling revisions only require to copy over the version of the published graph over to the revisions graph.
** Upgrading from rdf_draft only requires to create a new revision in the revisions graph.
** Support to the federation is easily achievable by having a new adding a state in the status field which is 'federation' (other statuses are published or unpublished, this is not about the state_machine state field we are using). Entities with the `federation` status are simply prone to becoming the new revision of the entity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revision support - Possibly move to rdf_entity 2.x #83

Current state and problematic pieces

A bit of history

Where the problems start

The last major factor - Triplets

The string ID

Rdf Draft

Revisions

Drop of the rdf_draft module

Graph structure

Revisions

How issues will be addressed

Revision graph

Identification of the entities

Connection with original entity

Additional properties

Drop support of states

Conclusion and Compromises

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revision support - Possibly move to rdf_entity 2.x #83

Description

Current state and problematic pieces

A bit of history

Where the problems start

The last major factor - Triplets

The string ID

Rdf Draft

Revisions

Drop of the rdf_draft module

Graph structure

Revisions

How issues will be addressed

Revision graph

Identification of the entities

Connection with original entity

Additional properties

Drop support of states

Conclusion and Compromises

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions