Skip to content
This repository was archived by the owner on Aug 25, 2022. It is now read-only.
This repository was archived by the owner on Aug 25, 2022. It is now read-only.

Revision support - Possibly move to rdf_entity 2.x #83

Open
@idimopoulos

Description

@idimopoulos

After a discussion today with @sandervd and @brummbar we came up with the following template/suggestion so that remarks can be added and a complete solution can be created.

Purpose of the issue: Support revisions

Current state and problematic pieces

A bit of history

Rdf entity provides a layer to support storing entities directly in the triplestore. The idea behind it is that every property of every field has a predicate URI mapped to it and this is used as a storage identifier for the database. Properties without a mapped URI do not get stored in the database and are simply skipped.
The other major factor of the module is that it uses graphs in order to store the entities separated by bundle. That means that each bundle has its own graph, rather than each entity type. This approach, while seemed nice at first, is not ideal as one cannot enforce that all objects (entities) that have a "specific schemantic meaning" will live under graphs split by their type.

Where the problems start

The last major factor - Triplets

The last major factor in our decisions is the triplestore (or quadstore as we use it) itself. A triplestore means that everything is described using triplets. This has advantages and disadvantages but what is really important for us is that it lacks a bit of flexibility against SQL in terms that you cannot have more than 3 "columns" to describe something.
For example, in SQL, for each field, you have a table where each entry stores the entity_id, the revision_id, the delta, the value and other properties required by each field. That means that a structure is created and you store one entry for each delta of each revision of each entity.
In SPARQL however, you need to find a way to do this in triplets without breaking the structure of the entity, allowing to query properly (so no serialized cheats) and without breaking the ontology (keep a predicate per property). That means that you had to do something like

entity1 field1 en-uk
entity1 field1 0
entity1 field1 test_value
entity1 field1 en-us
entity1 field1 2
entity1 field1 test_value2

Only by looking the above, the problem already exists since if you query for the field1 of entity1 you already don't know which property belongs to which delta. The sequence of the data stored is also not a way to distinguish as triplestore does not return results or stores them as you give them.

While the delta specific problem is for another issue, this remains here for the following sections.

The string ID

One of the thing that makes the scemantic web so appealing is the identification of its objects (or entities) by a unique URI. For us, that means that unlike nodes, we are using URIs for identifying the entities. This also brought up many issues in the past, as many modules did not yet support the string IDs but we got over that. Why is that important here though?
Back a year and a half, we also had to somehow support multiple versions in the Joinup project. As it is normal, the idea to support revisions just like core does was one of the ideas. However, there were a few issues here:

  • The fact that we are using a graph per bundle, means that all properties were to reside under the same graph.
  • Scemantic web is a way of describing entities. For the triplestore (or quadstore) that we use, that means that each property can have a triplet of properties to describe it. As it is described in the previous section, that means that since all versions of the entity would share the same URI, the entities themselves would have no way of distinguishing which property belongs to which version.
  • Different IDs cannot exist for the same entity so an entity with ID http://example.com/rdf/1 cannot automatically have http://example.com/rdf/1/version/2 as the implications are multiple. The same id can be the same id of another entity (not a revision) and the queries would be a nightmare if we had to concatenate ids.

Rdf Draft

That is when the rdf_draft module came into play. The need in Joinup was that only up to two revisions can exist at any given time, a published and an unpublished one. Since the need for a history of changes was not a requirement, the solution came with the graphs themselves.
For each bundle, a second graph was created, separating the two entities and giving the option for a publication status on the entity. For us, those graphs took the form of http://joinup.eu/<bunlde>/[published|draft].
This is already a solution to many of the needs that might come up but that also came up with some limitations:

  • The fact that we were already split bundles of an entity type in graphs means that we have to take care of many parameters when we try to perform CRUD operations on entities residing in specific states (i.e. we don't only query on entities in specific states).
  • Workarounds had to be implemented also for other cases like supporting search_api natively.
  • Limitations of states of an entity. While it covered the cases where 2 versions exists at the same time, it took a lot of work to only support once more, and while not the same effort is needed for any subsequent version, it still requires manual work in terms of development and quite a good understanding of the overall module in order to add any more. Even enabling the draft version requires a bit of manual work and understanding.
  • The number of versions is finite. The idea that we are manually adding a new graph for each new state that we want our entity to have, means that we can only create a specific number and that this number is not automatically scalable. Revisions is a no go for this module.
  • Split of the entity. Manual intervention is needed in order to query all versions. All graphs must be added in the query to allow querying them all. And that is regardless of whether a field is used like the moderation status field.

Revisions

The idea behind supporting revisions involves a few ideas, parameters and a couple of compromises:

  • Since we are going to support such a major issue, it has to be done in a way that is split from the main module, so that the main module can work independently.
  • We are going to try and mimic the node revisions system as much as we can so that we can follow best practices as well as make it a bit more understandable to users.
  • We are still going to split entities under specific graphs, however, we are going to make it simpler

Drop of the rdf_draft module

The rdf_draft module is a nice implementation but should be a legacy of the past. Apart from the fact that a lot of issues have come up due to the multiple graphs we need to support, it will directly conflict with the implementation of revisions.

Graph structure

Following the problems above, we are going to drop the support of a graph per bundle and follow the notion of the Drupal entities in version 8. Each entity type has a table which stores the base fields of the entity. Unlike Drupal, however, we are going to have everything within a specific graph.
Without addressing the revisions yet, that means that every entity type will only have one graph to look into for anything.

Revisions

Revisions, like rdf_draft, will reside in a separate module. That would require the need that the storage class (or entity class) will be overridden by the new module in order to support new methods like the ::allRevisions() (corresponding to the ::allRevisions() from the NodeStorage class).
However, since we are trying to split the rdf_entity module already, we can use this module to simply include an interface and a revision trait for each entity type that is defined and wants to use the revision system.

How issues will be addressed

For all the above issues, during the discussion we came up with the following structure details.

Revision graph

The revision graph will also belong to a specific entity type, it can be defined in the annotation of the entity type or in the mapping entity that we currently support. Each entity type's graph will be solely for internal use and should not be exposed if there is an exposed endpoint.
The name of the graph is user defined.

Identification of the entities

The revision graph will be a pool of data from all revisions of the entities. As nodes do, even the current revision will exist in the graph.
Since we define that revision graphs are solely for internal usage, the IDs of the entities can be arbitrary and different from the original IDs. This gives us the ability to create IDs like http://<random alphanumeric string>.com/<entity type>/revision/<revision serial id>. The serial number can be global or per entity. If global, since triplestore does not have serial numbering, it has to be stored in Drupal or be determined on the fly when a new revision is stored. The later is a better solution solely because migrating data will not break the structure.

Connection with original entity

The idea is that the revision entities will use a property like the dcat:isVersionOf to link to the original content. Possible implications here is that the original entity might already have a property mapped to the dcat:isVersionOf predicate so probably another property might be used. Something like <base_url>/drupalIsVerionOf.
Additionally, the revisions sub module can define to all entity types that have a revision graph defined an additional base field mapped to something like <base_url>/drupalRevisionId which also maps back to the current revision ID.

Additional properties

Every revision should include the following properties apart from the drupalIsVersionOf:

  • Revision ID. The idea is that this is the serial number that will also be used to construct the revision url.
  • Revision timestamp. This will be used to determine when the revision was created and the order of the revisions.
  • Revision updated. As with node revisions, this can be used by constraints to determine if the revision can be edited or another revision has been updated more recently.

Drop support of states

RDF Draft enforces the idea of states within the rdf_entity module. However, the revisions is not necessarily the entity in a different state rather than a history of it. Further support by states can be attained but this will be irrelevant to the rdf entity structure and only relevant to the corresponding status field.

Conclusion and Compromises

  • While the idea to limit objects of a certain meaning under a specific graphs is already marked as not ideal, we can, in an organizational level, decide so. Keeping entities of the same ontology within a certain graph certainly could not always be the case but for sure keeps content a bit more organized and makes it more flexible to query and use.
  • With these changes, we might need a new version. If so, it might be tough to have an upgrade path from version 1 to version 2 and Joinup might be stuck with version 1 at least for a while.
  • The above description surely is far from what we have currently in Joinup but would also solve all issues that we had to support in the past in Joinup.
    ** All queries can be supported by default.
    ** Entities existing in the published graph can be simply moved over to the main entity type graph and the update path is complete.
    ** Enabling revisions only require to copy over the version of the published graph over to the revisions graph.
    ** Upgrading from rdf_draft only requires to create a new revision in the revisions graph.
    ** Support to the federation is easily achievable by having a new adding a state in the status field which is 'federation' (other statuses are published or unpublished, this is not about the state_machine state field we are using). Entities with the federation status are simply prone to becoming the new revision of the entity.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions