Merge pull request #8 from smartcomputer-ai/dist

lukebuehler · web-flow · commit 614f48f30d0b · 2024-04-14T16:01:52.000+02:00
Dist
diff --git a/src/grit/object_model_v2.py b/src/grit/object_model_v2.py
@@ -21,12 +21,20 @@
 MessageId = ObjectId
 Message = NamedTuple("Message", 
     [('previous', MessageId | None), #if none, it's a signal, otherwise, a queue
+     ('prune', MessageId | None), 
+        #NEW: /if set, previous is not allowed to be set, instead, the previous message has to be set here, 
+        #     which migh be pruned by grit (ie not available anymore)
      ('headers', Headers | None),
-     #('type', str), #NEW aka, "message_type"/"mt" -- is this a good idea, or should it remain part of the headers? 
-                     # the pro is that the message types could be made more explicit in the object model here since the runtime inspects the message types substiantly (e.g., "genesis", "update", and, in the future "gc/garbage/disconnect")
+     ('type', str),
+        #NEW aka, "message_type"/"mt" -- is this a good idea, or should it remain part of the headers? 
+        #    the pro is that the message types could be made more explicit in the object model here since the runtime inspects the message types substiantly (e.g., "genesis", "update", and, in the future "gc/garbage/disconnect")
      ('content', BlobId | TreeId | ListId | None)]) #NEW with None option, because many messages are just a singal or a ping, and have no content
 MailboxId = ObjectId
-Mailbox = dict[ActorId, MessageId]
+
+Mailbox = dict[tuple(ActorId, str|None), MessageId] 
+        #NEW: Channel name (str), to allow to send on multiple channels to an actor
+        #     if channel name is None then it is the "default channel"
+        # ActorId can be either sender or receiver
 
 StepId = ObjectId
 Step = NamedTuple("Step",
@@ -39,4 +47,4 @@
 Object = Blob | Tree | List | Message | Mailbox | Step
 
 
-
+# TODO: in serialization, add grit/object model version header
diff --git a/tests/perf/perf_grid.py b/tests/perf/perf_grid.py
@@ -17,7 +17,7 @@
 VERBOSE = False
 N_COLUMNS = 50
 N_ROWS = 50
-N_TEST_MESSAGES = 50
+N_TEST_MESSAGES = 200
 
 count_init_messages = 0
 count_grid_messages = 0
@@ -150,6 +150,7 @@ async def perf_grid_run(store:ObjectStore, refs:References) -> None:
 
     print(f"count_init_messages: {count_init_messages}")
     print(f"count_grid_messages: {count_grid_messages}")
-    print(f"total processed in the grid: {N_COLUMNS * N_ROWS * N_TEST_MESSAGES}")
+    total_grid_messages = N_COLUMNS * N_ROWS * N_TEST_MESSAGES
+    print(f"total processed in the grid: {total_grid_messages}")
     assert count_init_messages == N_TEST_MESSAGES
     assert count_grid_messages == N_COLUMNS * N_TEST_MESSAGES
diff --git a/thinking/distributed.md b/thinking/distributed.md
@@ -0,0 +1,63 @@
+
+## Grit
+
+ - send objects
+ - retrieve object
+ - set refs (should that be here?, maybe just for runtime use)
+ - multi-tenant support
+ - stream larger objects
+
+
+## Worker
+ 
+ - figure out what manifests are supported on this worker
+ - request work from orch. -> which actors to run
+ - run (some) OS level actors (? or should they only exist once?, but that would mean they dont have an actor id... that would be ok)
+ - timers (should the worker do that?)
+ - send messages to orch. ... ideally send messages straight to other worker where actor is p2p. or route to local ones (could be future work)
+ - route messages internally
+ - backoff of delivering messages if actor does not accept (stop trying at some point? but then how to persist that mismatch)
+ - listen to messages for its own actors
+ - set step head for actors (when messages are sent off? or before?)
+ - keep track of performance of actors and general resource usage, to message orch when re-balancing is needed
+ - keep a local grit cache
+ - run queries
+ - ask the orch. for which Grit to use (if Grit is sharded)
+
+
+## Orchestrator
+
+ - aware of all actors
+ - aware of manifests of all actors
+ - decide which actor to run on which worker
+ - route messages between workers, as long as there is no p2p solution yet
+ - re-route if worker goes down/offline
+ - warn if no worker exists for manifest
+ - restart actor messaging after complete shutdown (tradeoff, to allow worker to either set the step head before sending all mesages, or having to wait. with the former, we can pesist undelivered messages in the orch. and then start from there without re-analyzing the entire message state.)
+ - initiate pruning
+ - snapshot all heads for an agent (before updates), refert to certain snapshots
+ - initiate updates
+ - host web server (could be different service) and route queries and messages to workers
+
+
+## Structure
+
+We'll implement the first version as a monolith that can be started with different settings.
+
+All of it will be in python.
+
+- protos
+- src
+  - shared
+    - protos
+    - grit (interfaces, object model, serialization)
+    - wit (intefaces, inner wit runner)
+    - runtime (?, interfaces)
+    - web
+  - grit (grit server)
+  - apex (orchestrator)
+  - worker (runs actors)
+  - inproc (in process runtime) (or "play", "reference", "inproc")
+  - web (webserver)
+  - cli (connects to apex and grit, or simple runtime)
+
diff --git a/thinking/manifests.md b/thinking/manifests.md
@@ -0,0 +1,15 @@
+# Manifests
+
+## How to know what a wit needs to run?
+
+We need a way to indicate to the runtime or worker what the wit/actor needs to run.
+
+## Indicators
+ - Language or language runtime
+ - Library versions (sem versioned)
+
+I think that's it.
+
+## Structure
+
+Where to save this information? We could re-use the wit structure for this. But probably should be a separate file.
diff --git a/thinking/pruning.md b/thinking/pruning.md
@@ -0,0 +1,65 @@
+# How to Prune (GC) Grit?
+
+How do we prune the Grit DAG? This requires a garbage collection type system.
+
+The main idea is pretty simple: grit needs to be extended so that the "previous" fields in messages or steps can be set to "None" while maintaining a "soft-link" to the history.
+
+Here is how this could look like:
+```
+Message = NamedTuple("Message", 
+    [('previous', MessageId | None),
+     ('previous-pruned', MessageId | None), --only one or the other "previous.." would be allowed to be set
+     ('headers', Headers | None),
+     ('type', str),
+     ('content', BlobId | TreeId | None)])
+```
+
+If `previous-pruned` is set, `previous` is not allowed to be set. Although this maintains a historical link to the obj ids that came before, they can be discarded by the garbage collector.
+
+## Messages
+Maintaining a link to the history is important for the message object type because it allows an actor to send a pruned message list, giving the recipient a chane to process also previous, now pruned messages before it accepts the message with the prune marker.
+
+## Lifecycle
+The runtime would probably send "prune" signals to each actor when it is time to prune. But actors could also decide to prune messages or step histories on their own initiative. The mechanics would be the same
+
+ 1) Runtime sends "prune signal" via normal message
+ 2) Actor sends a pruned message to all or most of its outbox
+ 3) Actor also incorporates a prune marker in the new step
+ 5) (later and indepenently) reviever accepts the pruned messages in its inbox, completeing the cycle for that message channel.
+ 4) Grit can now garbage collect the messages and steps that are not needed anymore
+
+## Maintaining Some History
+
+It would be nice if an actor could retain *some* history of what happened to it. That is, if a prune request does not prune all the way to the present moment, but rather a little bit back. 
+
+How much back could be configurable (or part of the prune request signal).
+
+How to do this?
+```
+Message = NamedTuple("Message", 
+    [('previous', MessageId | None),
+     ('prune-from', MessageId | None), --if set, prunes back from the message id specified here
+     ('headers', Headers | None),
+     ('type', str),
+     ('content', BlobId | TreeId | None)])
+```
+In this case, `previous` would still be always set (if it is a message queue), and `prune-from` would indicate any message id in the history of previous messages where to prune from...
+
+However, it is not certain if this is the best design. It requires a lot on the part of the actor. Altenatively, the pruning happens often, which would result in many pruning markers throughout the history, *and then the runtime or Grit decides what to actually prune.*
+
+In the second design, the prune messages could also contain some sort of timestamp which allows grit to decide, but grit could maintain that timestamp too.
+
+With the most sleek design the message could just have a flag whether pruning previous messages is allowed, everything else would stay the same:
+
+```
+Message = NamedTuple("Message", 
+    [('previous', MessageId | None),
+     ('prune', bool), --if set, prunes back from the message id specified here
+     ('headers', Headers | None),
+     ('type', str),
+     ('content', BlobId | TreeId | None)])
+```
+
+I'm not sure if there is a better way to indicate the prune marker... I think somehow the first design at the very top is better, because it makes the prune action much more explicit than a flag (branching mechanisms have to be introduced anyhow).
+
+Finally, it could also be that we simply have Grit track the time and prune without any markers and/or involvement of the actors. But that would make it difficult to guess whether data or history is available or not. Especially if the wit logic relies on the history (such as comparing two obejects how they changed over history). The actor would have no way to know why data is not available in Grit (although we could return a "pruned" object if it doesnt exist anymore, but then that would work like an additional null, which is bad, better make it explicit).