From 6ac755292d9a53b65b531225c6efb7f3e44f3366 Mon Sep 17 00:00:00 2001 From: Christoph Weber Date: Fri, 12 May 2023 23:03:12 +0200 Subject: [PATCH 1/9] temp --- docs/counters/index.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/counters/index.rst b/docs/counters/index.rst index e2093699..2006a2ea 100644 --- a/docs/counters/index.rst +++ b/docs/counters/index.rst @@ -22,3 +22,5 @@ Morango instances use **record-max counters** to keep track of the maximum versi The **database-max counter** table tracks a mapping of scope filter strings to lists of (instance ID, counter) pairs. These (instance ID, counter) pairs reflect different Morango instances that have been previously synced at some counter value. Morango sends **filter-max counters** to determine what data is already shared before syncing to efficiently determine the difference in data. Filter-max counters are the highest counters associated with every instance ID for both a filter and its supersets. + +Examples: From 592015498ccad8bce65b494ba0a3744b518e67a1 Mon Sep 17 00:00:00 2001 From: Christoph Weber Date: Fri, 12 May 2023 23:27:01 +0200 Subject: [PATCH 2/9] adding documentation sections --- docs/devsetup/index.rst | 30 ++++++++++++++++++++++++++++++ docs/gettingstarted/index.rst | 8 ++++++++ docs/index.rst | 2 ++ docs/syncing/index.rst | 5 +++++ 4 files changed, 45 insertions(+) create mode 100644 docs/devsetup/index.rst create mode 100644 docs/gettingstarted/index.rst diff --git a/docs/devsetup/index.rst b/docs/devsetup/index.rst new file mode 100644 index 00000000..ff81554d --- /dev/null +++ b/docs/devsetup/index.rst @@ -0,0 +1,30 @@ +Dev Setup +======== + +Installation +Dependencies + +Tests +SQLite +Postgres +Integrationtest->Kolibri + +Soft-deletion +------------- + +Typically, deletion merely hides records, rather than actually erasing data. + +When a record for a subclass of ``SyncableModel`` is deleted, its ID is added to the ``DeletedModels`` table. When a subsequent serialization occurs, this information is used to turn on the ``deleted`` flag in the store for that record. When syncing with other Morango instances, the soft deletion will propagate to the store record of other instances. + +This is considered a "soft-delete" in the store because the data is not actually cleared. + + +Hard-deletion +------------- + +There are times, such as GDPR removal requests, when it's necessary to actually to erase data. + +This is handled using a ``HardDeletedModels`` table. Subclasses of ``SyncableModel`` should override the ``delete`` method to take a ``hard_delete`` boolean, and add the record to the ``HardDeletedModels`` table when this is passed. + +On serialization, Morango clears the ``serialized`` field entry in the store for records in ``HardDeletedModels`` and turns on the ``hard_deleted`` flag. Upon syncing with other Morango instances, the hard deletion will propagate to the store record of other instances. + diff --git a/docs/gettingstarted/index.rst b/docs/gettingstarted/index.rst new file mode 100644 index 00000000..74d30f38 --- /dev/null +++ b/docs/gettingstarted/index.rst @@ -0,0 +1,8 @@ +Getting Started +======== + +Syncable Models + +Actions + +Hooks diff --git a/docs/index.rst b/docs/index.rst index 322c0b4c..0710bb8a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -10,6 +10,8 @@ Morango is a Django database replication engine written in pure Python. It is de overview/index architecture/index + devsetup/index + gettingstarted/index syncing/index counters/index merging/index diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index cab70fed..a8bc8fe5 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -25,6 +25,11 @@ In the illustration below, the application layer (on the right) is where app dat .. image:: ./sync_process.png +Store vs. Buffer + +Dirty-Bit + +Example: Orchestration ------------- From 4b76927be0c2a5285dd121567514c5bc2302b4c4 Mon Sep 17 00:00:00 2001 From: Jakob Beckmann Date: Fri, 12 May 2023 23:57:48 +0200 Subject: [PATCH 3/9] docs: add dev setup documentation --- docs/devsetup/index.rst | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/docs/devsetup/index.rst b/docs/devsetup/index.rst index ff81554d..c74c417b 100644 --- a/docs/devsetup/index.rst +++ b/docs/devsetup/index.rst @@ -1,30 +1,29 @@ Dev Setup ======== -Installation -Dependencies +Before getting started, ensure you have the following dependencies installed: -Tests -SQLite -Postgres -Integrationtest->Kolibri +- ``python`` (2.7 or 3.6 - 3.11) +- ``swig`` +- ``openssl`` +- ``docker-compose`` (for testing against postgres backends) -Soft-deletion -------------- +Optionally create a virtual environment with your Python setup for this project, then run the following commands:: -Typically, deletion merely hides records, rather than actually erasing data. + pip install -r requirements/dev.txt + pip install -r requirements/test.txt + # for testing with postgres: this might require a local install of a postgres package + pip install -r requirements/postgres.txt -When a record for a subclass of ``SyncableModel`` is deleted, its ID is added to the ``DeletedModels`` table. When a subsequent serialization occurs, this information is used to turn on the ``deleted`` flag in the store for that record. When syncing with other Morango instances, the soft deletion will propagate to the store record of other instances. -This is considered a "soft-delete" in the store because the data is not actually cleared. +Testing +------- +Tests can be launched as follows:: -Hard-deletion -------------- + make test + # launch against a postgres backend + make test-with-postgres -There are times, such as GDPR removal requests, when it's necessary to actually to erase data. - -This is handled using a ``HardDeletedModels`` table. Subclasses of ``SyncableModel`` should override the ``delete`` method to take a ``hard_delete`` boolean, and add the record to the ``HardDeletedModels`` table when this is passed. - -On serialization, Morango clears the ``serialized`` field entry in the store for records in ``HardDeletedModels`` and turns on the ``hard_deleted`` flag. Upon syncing with other Morango instances, the hard deletion will propagate to the store record of other instances. +The integration tests can be found in the `Kolibri repository `_. From 6c5bfca16bd7e601f5e95b270dbd580517daf99f Mon Sep 17 00:00:00 2001 From: Christoph Weber Date: Sat, 13 May 2023 00:03:29 +0200 Subject: [PATCH 4/9] adding documentation --- docs/counters/index.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/counters/index.rst b/docs/counters/index.rst index 2006a2ea..40a94be9 100644 --- a/docs/counters/index.rst +++ b/docs/counters/index.rst @@ -23,4 +23,11 @@ The **database-max counter** table tracks a mapping of scope filter strings to l Morango sends **filter-max counters** to determine what data is already shared before syncing to efficiently determine the difference in data. Filter-max counters are the highest counters associated with every instance ID for both a filter and its supersets. -Examples: +**Example** (in pseudocode) + +#. Instance A creates a model, e.g. exam_x. It registers it in its store: ``{ "model" : "exam_x", "counter" : 1 }`` +#. It then syncs this exam to instance B and registers it in its store: ``{ "model" : "exam_x", "counter" : 1, "max_counters": { "B" : 1 }}`` +#. After some time, instance A updates the model because the exam changed. It registers this in the store: ``{ "model" : "exam_x", "counter" : 2, "max_counters": { "B" : 1 }}`` +#. The next time instance A syncs with instance B, it registers that the counter of ``exam_x`` is bigger than the ``max_counter`` of instance B. +#. This triggers a transfer_session in which the model ``exam_x`` is transferred to instance B and then updated in the store: ``{ "model" : "exam_x", "counter" : 2, "max_counters": { "B" : 2 }}`` + From 395270e2641417580cb1191a4ec12a3d9fcef4c3 Mon Sep 17 00:00:00 2001 From: Christoph Weber Date: Sat, 13 May 2023 00:50:08 +0200 Subject: [PATCH 5/9] adding documentation --- docs/counters/index.rst | 1 + docs/syncing/index.rst | 8 +++++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/counters/index.rst b/docs/counters/index.rst index 40a94be9..07ae3038 100644 --- a/docs/counters/index.rst +++ b/docs/counters/index.rst @@ -10,6 +10,7 @@ The **database ID** identifies the actual database being used by a Morango insta Each syncable model instance within the database is identified by a unique **model source ID**. This is calculated randomly by default and takes the calculated partition and Morango model name into account. Models can also define their own behavior by overriding ``calculate_source_id``. +.. _counters: Counters -------- diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index a8bc8fe5..65454a3f 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -25,11 +25,13 @@ In the illustration below, the application layer (on the right) is where app dat .. image:: ./sync_process.png -Store vs. Buffer +**Store, Buffer \& Dirty Bit** -Dirty-Bit +Both store and buffer are tables in the backend database (generally either SQLite or Postgres). Check `Counters <../counters#counters>`__ for the update logic. -Example: +* **Store**: Holds every Serializable Models in the instance and synced instances including counters / maxcounters. +* **Buffer**: Holds Serializable Models marked for transfer (sending or receiving) during a sync session. +* **Dirty Bit**: Flag in store that is set, when a Serializable Model was updated during a dequeue from the Buffer. Gets unset as soon as the Django Model gets updated and is consistent with the store again. Orchestration ------------- From 288f2f387583a06d1cac658a55dd10f818d25553 Mon Sep 17 00:00:00 2001 From: Jakob Beckmann Date: Sun, 14 May 2023 00:18:48 +0200 Subject: [PATCH 6/9] docs: update sync process documentation --- docs/syncing/index.rst | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index 65454a3f..b22bf9e5 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -5,7 +5,7 @@ Syncing Concepts -------- -The **store** holds serialized versions of syncable models. This includes both data that is on the current device and data synced from other devices. +The **store** holds serialized versions of syncable models. This includes both data that is on the current device and data synced from other devices. The store is represented as a standard Django model, containing syncable models as JSON. The **outgoing buffer** and **incoming buffer** mirror the schema of the store. They also include a transfer session ID which used to identify sets of data that are being synced as a coherent group to other Morango instances. @@ -15,11 +15,14 @@ Process Syncing is the actual exchange of data in a sync session. The general steps for syncing data are: -1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store -2. **Queuing/Buffering** - storing serialized records and their modification history to a separate Buffers data structure -3. **Transfer/chunking of data** - the actual transfer of data over a request/response cycle in chunks of 500 records at a time -4. **Dequeuing** - merging the data received in the receiving buffers to the receiving store and record-max counter -5. **Deserialization** - merging data from the receiving Store into the Django models in the Application layer +1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store. The serialized data in the store is versioned via a counter (described in `Counters <../counters#counters>`__). +2. **Queuing/Buffering** - storing serialized records and their modification history to a separate Buffers data structure. This Django model only contains the changes to be synced with the other Morango instance. This is in contrast to the Store, which contains all data, regardless of what is getting transferred in this sync session. +3. **Transfer/chunking of data** - the actual transfer of data over a request/response cycle in a set of chunked records. If both sides support it, the chunked records are compressed before being sent of the network. The actual transfer is done over HTTP. +4. **Dequeuing** - merging the data received in the receiving buffers to the receiving store and record-max counter. During this step, the data from the incoming buffer is merged into the store on the receiving side. Merge conflicts in case of version splits can be solved automatically. As new data is written into the store, the dirty bit on that object is set to indicate that the data needs to be deserialized and pushed to the Application Layer. +5. **Deserialization** - merging data from the receiving Store into the Django models in the Application layer. Data marked as stale in the Application Layer (where a newer version is available in the Store, on a record with the dirty bit set), the data in the store is deserialized from JSON into a Django model and integrated into the Application Layer. + +The individual steps of the syncing process are implemented in `morango/sync/operations.py `_. They are implemented as operations that are registered for every process step described above. A project using Morango can define their own operations and register them to be executed as part of an arbitrary step in the process via configuration options such as ``MORANGO_INITIALIZE_OPERATIONS``. + In the illustration below, the application layer (on the right) is where app data resides as Django models, and the Morango layer (on the left) is where the Morango stores, counters, and buffers reside. *Instance A* (on the top) is sending data to *Instance B* (on the bottom). Application Django models in *Instance A* are serialized in JSON format and saved to the store. Data is queued in the buffers on *Instance A*, and then transmitted to the corresponding buffers on *Instance B*. The data is then integrated into the store and Django app models on *Instance B*. @@ -53,6 +56,8 @@ Signals During the sync process, Morango fires a few different signals from ``signals`` in ``PullClient`` and ``PushClient``. These can be used to track the progress of the sync. +The operations described in the previous section are triggered via such a signal, which has the operations attached to it. The ``SyncSignal`` definition can be found under `morango/sync/utils.py `_. + There are four signal groups: - ``session`` @@ -66,6 +71,8 @@ Each signal group has 3 stages that can be fired: - ``in_progress`` - ``completed`` +The ``SessionController`` is responsible to register the configured operations to the corresponding signal, and triggers the individual steps when its ``proceed_to`` function is called. + For a push or pull sync lifecycle, the order of the fired signals would be as follows: 1) Session started From da447ba20403b0d5df42f46e593e03f78e4306d3 Mon Sep 17 00:00:00 2001 From: ovanov Date: Sun, 14 May 2023 10:57:07 +0200 Subject: [PATCH 7/9] extend Syncing documentation. --- docs/syncing/index.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index b22bf9e5..baffa7c7 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -13,6 +13,11 @@ The **outgoing buffer** and **incoming buffer** mirror the schema of the store. Process ------- +By default, Kolibri instances are listening for other Kolibri instances in the same network, while at the same time, exposing an URL to which other instances can request a connection. The connection is established via a REST call to the endpoint. For the exact request flow, see the `documentation `_. +After a connection request the two instances exchange certificates, which are used to authenticate the other instance. If the certificates are valid, the sync session is started. One instance is the **client** (i.e. Student) and the other is the **server** (i. +e. Teacher). The server instance verifies that the client has the proper permissions to sync with it, and then the client and server exchange exactly the data, for which the client has the permissions to sync. The certificate verification takes place in `morango/api/permissions.py `_ + + Syncing is the actual exchange of data in a sync session. The general steps for syncing data are: 1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store. The serialized data in the store is versioned via a counter (described in `Counters <../counters#counters>`__). From dcd36a5ddbb4bee0ca569fcde5053fce7127d9a7 Mon Sep 17 00:00:00 2001 From: ovanov Date: Sun, 14 May 2023 11:23:46 +0200 Subject: [PATCH 8/9] move Syncing documentation. --- docs/gettingstarted/index.rst | 9 +++++++++ docs/syncing/index.rst | 5 ----- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/gettingstarted/index.rst b/docs/gettingstarted/index.rst index 74d30f38..d42ebecd 100644 --- a/docs/gettingstarted/index.rst +++ b/docs/gettingstarted/index.rst @@ -1,5 +1,14 @@ Getting Started ======== +This document is intended to provide a high-level overview of how Morango internals work and how Kolibri interacts with it. + +Syncing Process +-------- + +By default, Kolibri instances are listening for other Kolibri instances in the same network, while at the same time, exposing an URL to which other instances can request a connection. The connection is established via a REST call to the endpoint. For the exact request flow, see the `documentation `_. +After a connection request the two instances exchange certificates, which are used to authenticate the other instance. If the certificates are valid, the sync session is started. One instance is the **client** (i.e. Student) and the other is the **server** (i. +e. Teacher). The server instance uses Morango to verify that the client has the proper permissions to sync with it. Then the client and server exchange exactly the data, for which the client has the permissions to sync. The certificate verification takes place in `morango/api/permissions.py `_. + Syncable Models diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index baffa7c7..b22bf9e5 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -13,11 +13,6 @@ The **outgoing buffer** and **incoming buffer** mirror the schema of the store. Process ------- -By default, Kolibri instances are listening for other Kolibri instances in the same network, while at the same time, exposing an URL to which other instances can request a connection. The connection is established via a REST call to the endpoint. For the exact request flow, see the `documentation `_. -After a connection request the two instances exchange certificates, which are used to authenticate the other instance. If the certificates are valid, the sync session is started. One instance is the **client** (i.e. Student) and the other is the **server** (i. -e. Teacher). The server instance verifies that the client has the proper permissions to sync with it, and then the client and server exchange exactly the data, for which the client has the permissions to sync. The certificate verification takes place in `morango/api/permissions.py `_ - - Syncing is the actual exchange of data in a sync session. The general steps for syncing data are: 1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store. The serialized data in the store is versioned via a counter (described in `Counters <../counters#counters>`__). From 2230d78e634588706b87b3d3cc3d30afb64dd6dc Mon Sep 17 00:00:00 2001 From: Jakob Beckmann Date: Sun, 14 May 2023 11:34:41 +0200 Subject: [PATCH 9/9] docs: add link between sync docs and related architecture docs --- docs/architecture/index.rst | 2 ++ docs/syncing/index.rst | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/architecture/index.rst b/docs/architecture/index.rst index 37220651..ba407b30 100644 --- a/docs/architecture/index.rst +++ b/docs/architecture/index.rst @@ -132,6 +132,8 @@ In Kolibri, on the ``FacilityDataset`` model, we generate the certificate as a f There's flexibility in the application layer for determining the validity of a root certificate, and it's specified on a per-profile basis. For the ``facilitydata`` profile, Kolibri leverages its ``auth`` models for this. +.. _operations: + Session controller, contexts, and operations -------------------------------------------- diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst index b22bf9e5..7b9066ef 100644 --- a/docs/syncing/index.rst +++ b/docs/syncing/index.rst @@ -21,7 +21,7 @@ Syncing is the actual exchange of data in a sync session. The general steps for 4. **Dequeuing** - merging the data received in the receiving buffers to the receiving store and record-max counter. During this step, the data from the incoming buffer is merged into the store on the receiving side. Merge conflicts in case of version splits can be solved automatically. As new data is written into the store, the dirty bit on that object is set to indicate that the data needs to be deserialized and pushed to the Application Layer. 5. **Deserialization** - merging data from the receiving Store into the Django models in the Application layer. Data marked as stale in the Application Layer (where a newer version is available in the Store, on a record with the dirty bit set), the data in the store is deserialized from JSON into a Django model and integrated into the Application Layer. -The individual steps of the syncing process are implemented in `morango/sync/operations.py `_. They are implemented as operations that are registered for every process step described above. A project using Morango can define their own operations and register them to be executed as part of an arbitrary step in the process via configuration options such as ``MORANGO_INITIALIZE_OPERATIONS``. +The individual steps of the syncing process are implemented in `morango/sync/operations.py `_. They are implemented as operations that are registered for every process step described above. A project using Morango can define their own operations and register them to be executed as part of an arbitrary step in the process via configuration options such as ``MORANGO_INITIALIZE_OPERATIONS``. Details on these operations can be found under `Session controller, contexts, and operations <../architecture#operations>`__ In the illustration below, the application layer (on the right) is where app data resides as Django models, and the Morango layer (on the left) is where the Morango stores, counters, and buffers reside. *Instance A* (on the top) is sending data to *Instance B* (on the bottom). Application Django models in *Instance A* are serialized in JSON format and saved to the store. Data is queued in the buffers on *Instance A*, and then transmitted to the corresponding buffers on *Instance B*. The data is then integrated into the store and Django app models on *Instance B*.