learningequality · weber-chris · May 12, 2023 · May 12, 2023 · May 12, 2023 · May 12, 2023
diff --git a/docs/architecture/index.rst b/docs/architecture/index.rst
@@ -132,6 +132,8 @@ In Kolibri, on the ``FacilityDataset`` model, we generate the certificate as a f
 There's flexibility in the application layer for determining the validity of a root certificate, and it's specified on a per-profile basis. For the ``facilitydata`` profile, Kolibri leverages its ``auth`` models for this.
 
 
+.. _operations:
+
 Session controller, contexts, and operations
 --------------------------------------------
 

diff --git a/docs/counters/index.rst b/docs/counters/index.rst
@@ -10,6 +10,7 @@ The **database ID** identifies the actual database being used by a Morango insta
 
 Each syncable model instance within the database is identified by a unique **model source ID**. This is calculated randomly by default and takes the calculated partition and Morango model name into account. Models can also define their own behavior by overriding ``calculate_source_id``.
 
+.. _counters:
 Counters
 --------
 
@@ -22,3 +23,12 @@ Morango instances use **record-max counters** to keep track of the maximum versi
 The **database-max counter** table tracks a mapping of scope filter strings to lists of (instance ID, counter) pairs. These (instance ID, counter) pairs reflect different Morango instances that have been previously synced at some counter value.
 
 Morango sends **filter-max counters** to determine what data is already shared before syncing to efficiently determine the difference in data. Filter-max counters are the highest counters associated with every instance ID for both a filter and its supersets.
+
+**Example** (in pseudocode)
+
+#. Instance A creates a model, e.g.  exam_x. It registers it in its store:  ``{ "model" : "exam_x", "counter" : 1 }``
+#. It then syncs this exam to instance B and registers it in its store: ``{ "model" : "exam_x", "counter" : 1, "max_counters": { "B" : 1 }}``
+#. After some time, instance A updates the model because the exam changed. It registers this in the store: ``{ "model" : "exam_x", "counter" : 2, "max_counters": { "B" : 1 }}``
+#. The next time instance A syncs with instance B, it registers that the counter of ``exam_x`` is bigger than the ``max_counter`` of instance B.
+#. This triggers a transfer_session in which the model ``exam_x`` is transferred to instance B and then updated in the store: ``{ "model" : "exam_x", "counter" : 2, "max_counters": { "B" : 2 }}``
+
diff --git a/docs/devsetup/index.rst b/docs/devsetup/index.rst
@@ -0,0 +1,29 @@
+Dev Setup
+========
+
+Before getting started, ensure you have the following dependencies installed:
+
+- ``python`` (2.7 or 3.6 - 3.11)
+- ``swig``
+- ``openssl``
+- ``docker-compose`` (for testing against postgres backends)
+
+Optionally create a virtual environment with your Python setup for this project, then run the following commands::
+
+    pip install -r requirements/dev.txt
+    pip install -r requirements/test.txt
+    # for testing with postgres: this might require a local install of a postgres package
+    pip install -r requirements/postgres.txt
+
+
+Testing
+-------
+
+Tests can be launched as follows::
+
+    make test
+    # launch against a postgres backend
+    make test-with-postgres
+
+The integration tests can be found in the `Kolibri repository <https://github.com/learningequality/kolibri/blob/develop/kolibri/core/auth/test/test_morango_integration.py>`_.
+
diff --git a/docs/gettingstarted/index.rst b/docs/gettingstarted/index.rst
@@ -0,0 +1,17 @@
+Getting Started
+========
+This document is intended to provide a high-level overview of how Morango internals work and how Kolibri interacts with it.
+
+Syncing Process
+--------
+
+By default, Kolibri instances are listening for other Kolibri instances in the same network, while at the same time, exposing an URL to which other instances can request a connection. The connection is established via a REST call to the endpoint. For the exact request flow, see the `documentation <https://kolibri-dev.readthedocs.io/en/develop/dataflow/index.html#data-flow>`_.
+After a connection request the two instances exchange certificates, which are used to authenticate the other instance. If the certificates are valid, the sync session is started. One instance is the **client** (i.e. Student) and the other is the **server** (i.
+e. Teacher). The server instance uses Morango to verify that the client has the proper permissions to sync with it. Then the client and server exchange exactly the data, for which the client has the permissions to sync. The certificate verification takes place in `morango/api/permissions.py <https://github.com/learningequality/morango/blob/release-v0.6.x/morango/api/permissions.py>`_.
+
+
+Syncable Models
+
+Actions
+
+Hooks
diff --git a/docs/index.rst b/docs/index.rst
@@ -10,6 +10,8 @@ Morango is a Django database replication engine written in pure Python. It is de
 
    overview/index
    architecture/index
+   devsetup/index
+   gettingstarted/index
    syncing/index
    counters/index
    merging/index

diff --git a/docs/syncing/index.rst b/docs/syncing/index.rst
@@ -5,7 +5,7 @@ Syncing
 Concepts
 --------
 
-The **store** holds serialized versions of syncable models. This includes both data that is on the current device and data synced from other devices.
+The **store** holds serialized versions of syncable models. This includes both data that is on the current device and data synced from other devices. The store is represented as a standard Django model, containing syncable models as JSON.
 
 The **outgoing buffer** and **incoming buffer** mirror the schema of the store. They also include a transfer session ID which used to identify sets of data that are being synced as a coherent group to other Morango instances.
 
@@ -15,16 +15,26 @@ Process
 
 Syncing is the actual exchange of data in a sync session. The general steps for syncing data are:
 
-1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store
-2. **Queuing/Buffering** - storing serialized records and their modification history to a separate Buffers data structure
-3. **Transfer/chunking of data** - the actual transfer of data over a request/response cycle in chunks of 500 records at a time
-4. **Dequeuing** - merging the data received in the receiving buffers to the receiving store and record-max counter
-5. **Deserialization** - merging data from the receiving Store into the Django models in the Application layer
+1. **Serialization** - serializing data that is associated with Django models in the Application layer, and storing it in JSON format in a record in the Store. The serialized data in the store is versioned via a counter (described in `Counters <../counters#counters>`__).
+2. **Queuing/Buffering** - storing serialized records and their modification history to a separate Buffers data structure. This Django model only contains the changes to be synced with the other Morango instance. This is in contrast to the Store, which contains all data, regardless of what is getting transferred in this sync session.
+3. **Transfer/chunking of data** - the actual transfer of data over a request/response cycle in a set of chunked records. If both sides support it, the chunked records are compressed before being sent of the network. The actual transfer is done over HTTP.
+4. **Dequeuing** - merging the data received in the receiving buffers to the receiving store and record-max counter. During this step, the data from the incoming buffer is merged into the store on the receiving side. Merge conflicts in case of version splits can be solved automatically. As new data is written into the store, the dirty bit on that object is set to indicate that the data needs to be deserialized and pushed to the Application Layer.
+5. **Deserialization** - merging data from the receiving Store into the Django models in the Application layer. Data marked as stale in the Application Layer (where a newer version is available in the Store, on a record with the dirty bit set), the data in the store is deserialized from JSON into a Django model and integrated into the Application Layer.
+
+The individual steps of the syncing process are implemented in `morango/sync/operations.py <https://github.com/learningequality/morango/blob/HEAD/morango/sync/operations.py>`_. They are implemented as operations that are registered for every process step described above. A project using Morango can define their own operations and register them to be executed as part of an arbitrary step in the process via configuration options such as ``MORANGO_INITIALIZE_OPERATIONS``. Details on these operations can be found under `Session controller, contexts, and operations <../architecture#operations>`__
+
 
 In the illustration below, the application layer (on the right) is where app data resides as Django models, and the Morango layer (on the left) is where the Morango stores, counters, and buffers reside. *Instance A* (on the top) is sending data to *Instance B* (on the bottom). Application Django models in *Instance A* are serialized in JSON format and saved to the store. Data is queued in the buffers on *Instance A*, and then transmitted to the corresponding buffers on *Instance B*. The data is then integrated into the store and Django app models on *Instance B*.
 
 .. image:: ./sync_process.png
 
+**Store, Buffer \& Dirty Bit**
+
+Both store and buffer are tables in the backend database (generally either SQLite or Postgres). Check `Counters <../counters#counters>`__ for the update logic.
+
+* **Store**: Holds every Serializable Models in the instance and synced instances including counters / maxcounters.
+* **Buffer**: Holds Serializable Models marked for transfer (sending or receiving) during a sync session.
+* **Dirty Bit**: Flag in store that is set, when a Serializable Model was updated during a dequeue from the Buffer. Gets unset as soon as the Django Model gets updated and is consistent with the store again.
 
 Orchestration
 -------------
@@ -46,6 +56,8 @@ Signals
 
 During the sync process, Morango fires a few different signals from ``signals`` in ``PullClient`` and ``PushClient``. These can be used to track the progress of the sync.
 
+The operations described in the previous section are triggered via such a signal, which has the operations attached to it. The ``SyncSignal`` definition can be found under `morango/sync/utils.py <https://github.com/learningequality/morango/blob/HEAD/morango/sync/utils.py>`_.
+
 There are four signal groups:
 
 - ``session``
@@ -59,6 +71,8 @@ Each signal group has 3 stages that can be fired:
 - ``in_progress``
 - ``completed``
 
+The ``SessionController`` is responsible to register the configured operations to the corresponding signal, and triggers the individual steps when its ``proceed_to`` function is called.
+
 For a push or pull sync lifecycle, the order of the fired signals would be as follows:
 
 1) Session started
-Original file line number
+Diff line change
@@ Expand Up @@
     There's flexibility in the application layer for determining the validity of a root certificate, and it's specified on a per-profile basis. For the ``facilitydata`` profile, Kolibri leverages its ``auth`` models for this.
+    .. _operations:
     Session controller, contexts, and operations
     --------------------------------------------
@@ Expand Down @@