diff --git a/README.md b/README.md index 621096c..c19e1cb 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ If so, the owner of the issue will: 4. **Get approval**: - When approvers are satisfied with the design, they will approve the PR on github. - - When all approvers have signed off, merge the PR with the design + - When all approvers have signed off, merge the PR with the design. 5. **Implement the design** based on the finalized design document. - Any changes or decisions made during the implementation phase should be captured in user docs or protocol spec *and flagged in PRs*. diff --git a/in-progress/0003-native-merkle-trees.md b/in-progress/0003-native-merkle-trees.md index afe5e7d..2bf53d7 100644 --- a/in-progress/0003-native-merkle-trees.md +++ b/in-progress/0003-native-merkle-trees.md @@ -12,7 +12,7 @@ This design attempts to solve the problem of slow sync and merkle tree insertion ## Introduction -We require high performance merkle tree implementations both to ensure nodes can stay synched to the network and sequencers/provers can advance the state as required to build blocks. Our cuirrent TS implementations are limited in their single-threaded nature and the unavoidable constraint of have to repeatedly call into WASM to perform a hash operation. +We require high performance merkle tree implementations both to ensure nodes can stay synced to the network and sequencers/provers can advance the state as required to build blocks. Our current TS implementations are limited in their single-threaded nature and the unavoidable constraint of have to repeatedly call into WASM to perform a hash operation. Some analysis of the quantity of hashing and the time required can be found [here](https://hackmd.io/@aztec-network/HyfTK9U5a?type=view). @@ -20,13 +20,13 @@ This design proposes the creation of a set of multi-threaded merkle tree impleme ## Implementation -There are many parts to this design, we will walk through them individiually and discuss the choices made at each stage. +There are many parts to this design, we will walk through them individually and discuss the choices made at each stage. ### Overall Architecture -A new C++ binary, World State, will be created that will be started by the node software. It will be configured with the location in which Merkle Tree data should be stored. It will then accept and respond with msgpack-ed messages over one or more streams. The initial implementation will simply used stdio, but this will be absrtacted such that this could be replaced by other stream-based mechanisms. +A new C++ binary, World State, will be created that will be started by the node software. It will be configured with the location in which Merkle Tree data should be stored. It will then accept and respond with msgpack-ed messages over one or more streams. The initial implementation will simply used stdio, but this will be abstracted such that this could be replaced by other stream-based mechanisms. -To interface with the World State, an abstraction will be created at the `MerkleTreeDb` level. This accurately models the scope of functionality provided by the binary as owner of all the trees. It was considered that the abstraction could sit at the level of individual trees, but this creates difficulty whan we want to send an entire block to the World State to be inserted. This is an important use case as synching entire blocks is where signifcant performance optimisations can be made. +To interface with the World State, an abstraction will be created at the `MerkleTreeDb` level. This accurately models the scope of functionality provided by the binary as owner of all the trees. It was considered that the abstraction could sit at the level of individual trees, but this creates difficulty when we want to send an entire block to the World State to be inserted. This is an important use case as synching entire blocks is where significant performance optimisations can be made. ``` TS @@ -47,7 +47,7 @@ An abstract factory will then be created to construct the appropriate concrete t ### Interface -The interface will be an asynchronous message based communication protocol. Each message is provided with meta data uniquely identiying it and is responded to inidividually. It is not necessary to wait for a response to a message before sending a subsequent message. A simple message specification will be created, some examples of which are shown here: +The interface will be an asynchronous message based communication protocol. Each message is provided with meta data uniquely identifying it and is responded to individually. It is not necessary to wait for a response to a message before sending a subsequent message. A simple message specification will be created, some examples of which are shown here: ``` C++ enum WorldStateMsgTypes { @@ -144,7 +144,7 @@ As a sequencer/prover inserts transaction side-effects, the resulting new state #### Commits -When a block settles, the node performs a commit. It verifies any uncommitted state it may have against that published on chain to determine if that state is canonical. If it is not, the `uncommitted` state is dicarded and the node perform an `Update` operation using the newly published side effects. +When a block settles, the node performs a commit. It verifies any uncommitted state it may have against that published on chain to determine if that state is canonical. If it is not, the `uncommitted` state is discarded and the node perform an `Update` operation using the newly published side effects. Once the node has the correct `uncommitted` state, it commits that state to disk. This is the only time that a write transaction is required against the database. @@ -163,7 +163,7 @@ Indexed Trees require significantly more hashing than append only trees. In fact For each leaf being inserted: 1. Identify the location of the leaf whose value immediately precedes that being inserted. -2. Retrieve the sibling path of the preceeding leaf before any modification. +2. Retrieve the sibling path of the preceding leaf before any modification. 3. Set the 'next' value and index to point to the leaf being inserted. 4. Set the 'next' value and index of the leaf being inserted to the leaf previously pointed to by the leaf just updated. 5. Re-hash the updated leaf and update the leaf with this hash, requiring the tree to be re-hashed up to the root. @@ -209,4 +209,4 @@ As the World State is used heavily in all operations, we will gain confidence th ## Prototypes -Areas of this work have been prototyped already. The latest being [here](https://github.com/AztecProtocol/aztec-packages/pull/7037). \ No newline at end of file +Areas of this work have been prototyped already. The latest being [here](https://github.com/AztecProtocol/aztec-packages/pull/7037). diff --git a/in-progress/5040-native-merkle-trees-napi.md b/in-progress/5040-native-merkle-trees-napi.md index cee6f1c..029a5fa 100644 --- a/in-progress/5040-native-merkle-trees-napi.md +++ b/in-progress/5040-native-merkle-trees-napi.md @@ -11,7 +11,7 @@ This document proposes integrating the [Native Merkle Trees database](https://gi ## Introduction -The original native Merkle tree spec proposed building a `MerkleTreesDb` native binary in C++. The TypScript code would use message passing over streams to communicate with the database. A long lived process would be started once and accept messages over an input stream (e.g. stdin or a socket), process the messages and return the result over another stream (e.g. stdout). +The original native Merkle tree spec proposed building a `MerkleTreesDb` native binary in C++. The TypeScript code would use message passing over streams to communicate with the database. A long lived process would be started once and accept messages over an input stream (e.g. stdin or a socket), process the messages and return the result over another stream (e.g. stdout). [Node-API](https://nodejs.org/docs/latest-v18.x/api/n-api.html) is an API for building native addons that integrate seamlessly into NodeJS. @@ -21,7 +21,7 @@ This approach would simplify deployment and maintenance (no new binaries need to A new module would be written in C++ that would adapt the existing Native Merkle Trees database to Node-API semantics. This module could sit alongside the stream-based message passing implementation detailed in the [original spec](https://github.com/AztecProtocol/engineering-designs/blob/f9d1a897303c1481c790cecc4616961e1c183622/in-progress/0003-native-merkle-trees.md#interface) -This module would be build with CMake normally as the rest of the C++ code, with the exception that its build artifact would be a shared library (with a custom extension `.node` instead of `.so`). The TypeScript project would use [`bindings`](https://www.npmjs.com/package/bindings) to load the native module and re-export the functions and classes from C++. +This module would be built with CMake normally as the rest of the C++ code, with the exception that its build artifact would be a shared library (with a custom extension `.node` instead of `.so`). The TypeScript project would use [`bindings`](https://www.npmjs.com/package/bindings) to load the native module and re-export the functions and classes from C++. > [!NOTE] > TypeScript definitions would have to be written from the C++ code. Ideally these would be generated from existing code, but if that doesn't work then they would have to be written and maintained manually. diff --git a/in-progress/7025-instrumenting-the-node-with-open-telemetry.md b/in-progress/7025-instrumenting-the-node-with-open-telemetry.md index 4ccf4b4..43b37a9 100644 --- a/in-progress/7025-instrumenting-the-node-with-open-telemetry.md +++ b/in-progress/7025-instrumenting-the-node-with-open-telemetry.md @@ -11,7 +11,7 @@ The node should emit useful stats about the way its running so that node operato ## Introduction -In order to confidently deploy and maintain a node in production it needs to provide basic information about how it's operating. These metrics need to be emitted in portable manner so that monitoring tools can easily ingest them. These metrics should be optional such that running a node does not require running any other infrastructure to ingest the metrics. +In order to confidently deploy and maintain a node in production it needs to provide basic information about how it's operating. These metrics need to be emitted in а portable manner so that monitoring tools can easily ingest them. These metrics should be optional such that running a node does not require running any other infrastructure to ingest the metrics. OpenTelemetry is a framework for capturing instrumentation data from applications and encoding them into a standard format that's vendor neutral. In the past we've used Prometheus and Grafana to capture metrics from services, OpenTelemetry would enable us to continue running that stack while also giving the community the chance to ingest data into different systems (e.g. Clickhouse, DataDog). diff --git a/in-progress/7346-batch-proving-circuits-and-l1.md b/in-progress/7346-batch-proving-circuits-and-l1.md index ea6c0c0..a45a8e1 100644 --- a/in-progress/7346-batch-proving-circuits-and-l1.md +++ b/in-progress/7346-batch-proving-circuits-and-l1.md @@ -30,7 +30,7 @@ The circuit then performs the following checks and computations: - Recursively verifies the left and right inputs - Checks that the tree is greedily filled from left to right -- Checks that constants from left and right match +- Checks that constants from the left and right match - Checks that the end state from left matches the start state from right (ie they follow from each other) - Outputs the start of left and end of right - Hashes together or sums up any accumulated fields (tx count, effects hashes, accumulated fees, etc) @@ -201,7 +201,7 @@ contract Rollup { ### State -To track both the proven and unproven chains, we add the following state variables to the contract: +To track both proven and unproven chains, we add the following state variables to the contract: ```diff contract Rollup { diff --git a/in-progress/7482-sequencer-prover-test-net.md b/in-progress/7482-sequencer-prover-test-net.md index 7123891..e5eee94 100644 --- a/in-progress/7482-sequencer-prover-test-net.md +++ b/in-progress/7482-sequencer-prover-test-net.md @@ -100,7 +100,7 @@ In Spartan v1 the committee will solely be responsible for building the Pending The validator set will be selected by the multisig. -At the beginning of each epoch, we will assign proposers from the the validator set to slots. +At the beginning of each epoch, we will assign proposers from the validator set to slots. The exact number of validators will be determined via stress tests, modeling, and feedback from the community. @@ -133,7 +133,7 @@ sequenceDiagram P->>L1R: Watches for new L2PendingBlockProcessed events P->>L1R: Download TxEffects - P->>P: Apply TxEffects to world state + P->>P: Apply TxEffects to the world state PXE->>L2N: Submit TxObject+Proof L2N->>MP: Add TxObject+Proof MP->>P: Broadcast TxObjects+Proofs diff --git a/in-progress/7520-testnet-overview.md b/in-progress/7520-testnet-overview.md index cebe50c..4c513d8 100644 --- a/in-progress/7520-testnet-overview.md +++ b/in-progress/7520-testnet-overview.md @@ -43,7 +43,7 @@ A deployment of the Aztec Network includes several contracts running on L1. TST will be an ERC20 asset that will be used to pay for transaction fees on the Aztec Network. -It will also used on L1 as part of the validator selection process. +It will also be used on L1 as part of the validator selection process. Protocol incentives are paid out in TST. @@ -322,12 +322,12 @@ There will be penalties for proposers and provers who do not fulfill their dutie - committee members - proposers - provers -- What is require to convince L1 that the conditions are met? -- What is the "cost" of an enforcement action? e.g., if tiny penalty it might not be worth to enforce it. +- What is required to convince L1 that the conditions are met? +- What is the "cost" of an enforcement action? e.g., if the penalty is tiny it might not be worth to enforce it. - What are the penalties for proposers and provers? - How can it be ensured that the penalties are fair? - What should be burned, and what should be distributed? -- Expected annual return for validators (mean, median)? +- What is the expected annual return for validators (mean, median)? ## Disclaimer diff --git a/in-progress/7588-spartan-clusters.md b/in-progress/7588-spartan-clusters.md index 3331428..f528ed2 100644 --- a/in-progress/7588-spartan-clusters.md +++ b/in-progress/7588-spartan-clusters.md @@ -20,7 +20,7 @@ We will: To properly test a decentralized system, we need the ability to spin up networks with different topologies such as the number of nodes, validators, provers, pxes. -Additionally, we need to test under simulated stress/attack conditions. +Additionally, we need to test under simulated stress and attack conditions. Further, we need to be able to deploy these networks in a repeatable and automated way. @@ -33,7 +33,7 @@ This allows us to define a network configuration in a helm chart and deploy it t K8s is also easy to use in CI via [kind](https://kind.sigs.k8s.io/). > kind is a tool for running local Kubernetes clusters using Docker container “nodes”. -> kind was primarily designed for testing Kubernetes itself, but may be used for local development or CI. +> kind was primarily designed for testing Kubernetes itself, but it may be used for local development or CI. Further, we can use [chaos mesh](https://chaos-mesh.org/) to simulate network conditions such as node failures, latency, etc. diff --git a/in-progress/8131-forced-inclusion.md b/in-progress/8131-forced-inclusion.md index d0d56b0..17bfe7c 100644 --- a/in-progress/8131-forced-inclusion.md +++ b/in-progress/8131-forced-inclusion.md @@ -11,7 +11,7 @@ We propose a method to provide the Aztec network with similar **inclusion** censorship resistance of the base-layer. -The mechanisms uses a delayed queue on L1, and require the ability to include valid but failing transactions (many of these changes overlaps with tx-objects). +The mechanisms uses a delayed queue on L1, and requires the ability to include valid but failing transactions (many of these changes overlaps with tx-objects). After some delay, the following blocks are complied to include transactions from the queue, and failing to do so will reject the blocks. The length of the delay should take into account the expected delays of shared mutable, as it could be impossible to force a transaction that uses shared mutable if the queue delay is too large. @@ -36,11 +36,11 @@ This case should be easily solved, pay up you cheapskate! But lets assume that this is not the case you were in, you paid a sufficient fee and they keep excluding it. In rollups such as Arbitrum and Optimism both have a mechanism that allow the user to take his transactions directly to the base layer, and insert it into a "delayed" queue. -After some delay have passed, the elements of the delayed queue can be forced into the ordering, and the sequencer is required to include it, or he will enter a game of fraud or not where he already lost. +After some delay has passed, the elements of the delayed queue can be forced into the ordering, and the sequencer is required to include it, or he will enter a game of fraud or not where he already lost. The delay is introduced into the system to ensure that the forced inclusions cannot be used as a way to censor the rollup itself. -Curtesy of [The Hand-off Problem](https://blog.init4.technology/p/the-hand-off-problem), we borrow this great figure: +Courtesy of [The Hand-off Problem](https://blog.init4.technology/p/the-hand-off-problem), we borrow this great figure: ![There must be a hand-off from unforced to forced inclusion.](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35f3bee4-0a8d-4c40-9c8d-0a145be64d87_3216x1243.png) The hand-off is here the point where we go from the ordering of transactions that the sequencer/proposer is freely choosing and the transactions that they are forced to include specifically ordered. @@ -53,9 +53,9 @@ However, nothing is really that easy. Note the emphasis on **inclusion** censorship resistance. While this provides a method to include the transaction, the transaction might itself revert, and so it is prevented from achieving its goal! -This is particularly an issue in build-ahead models, as the delay ensure that the sequencer have plenty of time to include transactions, before the hand-off, that will alter the state of the chain, potentially making the transaction revert. +This is particularly an issue in build-ahead models, as the delay ensures that the sequencer have plenty of time to include transactions, before the hand-off, that will alter the state of the chain, potentially making the transaction revert. -Again, [The Hand-off Problem](https://blog.init4.technology/p/the-hand-off-problem) have a wonderful example and image: +Again, [The Hand-off Problem](https://blog.init4.technology/p/the-hand-off-problem) has a wonderful example and image: Consider that you have forced a transactions that is to use a trading pair or lending market, the sequencer could include your transaction right after it have emptied the market or pushed it just enough for your tx to fail to then undo the move right after. ![The sequencer or builder can manipulate the state at the hand-off.](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b449e22-66d0-4fe1-bc25-28aea3329e72_2799x1428.png) @@ -68,7 +68,7 @@ A different attack could be to make the fees spike such that your forced transac This can be avoided by not requiring forced transactions to pay the fee, as long as the transaction have a bounded amount of gas. A reason that this could be acceptable is that the cost to go to L1 SHOULD be far greater than going to the L2, so you cannot use it as a mechanism to dos as it is still much more expensive than sending the same transaction and paying L2 gas for it. -However, it opens a annoying problem for us, the circuits would need to know that it is a forced transaction to give us that "special" treatment. +However, it opens a annoying problem for us, the circuits will need to know that it is a forced transaction to give us that "special" treatment. And how will we let it learn this fact? We essentially have a list of to be forced transactions at the contract, and we need to know if the block includes any of these. @@ -118,11 +118,11 @@ For the sake of simplicity, we will briefly assume that the above changes are ma The idea is fairly simple: - As part of the `Header` include a `txs_hash` which is the root of a SHA256 merkle tree, whose leafs are the first nullifiers (the transaction hashes) of the transactions in the block. -- The rollup circuits have to ensure that this `txs_hash` is build correctly, e.g., are from the transaction hashes of the same transactions published to DA. +- The rollup circuits have to ensure that this `txs_hash` is built correctly, e.g., are from the transaction hashes of the same transactions published to DA. - We take the idea of a delayed queue, that after some delay, forces the blocks to order transactions based on the queue. When inserting into the delayed queue, we can check the private kernel proof (fairly expensive 💸) and we store the transaction hash along with a bit of metadata, e.g., the time at which it must be included etc. - At the time of epoch proof inclusion, the `txs_hash` roots can be used to prove the inclusion of members from the queue. - If specific transactions were required and are not proven to be included, the ethereum transaction simple reverts. + If specific transactions were required and are not proven to be included, the ethereum transaction simply reverts. Beware that we only really address the forced inclusion needs when the proof is proposed. Nevertheless, the criteria for inclusion can be based on the time of the proposal. diff --git a/in-progress/8404-based-fallback.md b/in-progress/8404-based-fallback.md index bf341eb..d73168b 100644 --- a/in-progress/8404-based-fallback.md +++ b/in-progress/8404-based-fallback.md @@ -19,7 +19,7 @@ Since the consensus layer is only there for the ability to have fast ledger grow Therefore, we provide a fallback mechanism to handle the case where the committee fail to perform their duties - allowing anyone to grow the ledger. The nature in which they could fail to perform their duties varies widely. -It could be an attack to try and censor the network, or it might be that a majority of the network is running a node that corrupts its state the 13 of August every year as an homage to the Aztec empire. +It could be an attack to try and censor the network, or it might be that a majority of the network is running a node that corrupts its state on the 13th of August every year as an homage to the Aztec empire. Nevertheless, we need a mechanism to ensure the liveness of the chain, even if slower, in these events. @@ -49,7 +49,7 @@ The based fallback is expected to have a significantly worse user experience sin Because of this, we really do not want to enter the fallback too often, but we need to make it happen often enough that it is usable. For applications that are dependent on users acting based on external data or oracle data, a low bar to enter based mode can be desirable since it will mean that they would be able to keep running more easily. -Lending and trading falls into these catagories as they usually depend on external data sources, e.g., prices of CEX'es influence how people use a trading platform and lending platforms mostly use oracles to get prices. +Lending and trading falls into these categories as they usually depend on external data sources, e.g., prices of CEX'es influence how people use a trading platform and lending platforms mostly use oracles to get prices. We suggest defining the time where Based fallback can be entered to be $T_{\textsf{fallback}, \textsf{enter}}$ after the last proven block. The minimum acceptable value for $T_{\textsf{fallback}, \textsf{enter}}$ should therefore be if a committee fails to performs its proving duties as specified as a full epoch $E$ in https://github.com/AztecProtocol/engineering-designs/pull/22 * 2. @@ -114,7 +114,7 @@ Furthermore, if they need to build an epoch proof without there being blocks on A separate direction that I would like to entertain, is that we have two ways to "get out" of the based fallback, both based on time, similarly to how we ended up in here. 1. If the last proven block is older than $T_{\textsf{fallback}, \textsf{exit}, \textsf{activity}}$ we allow exiting the based fallback. -2. After $T_{\textsf{fallback}, \textsf{exit}}$ we allow exiting the fallback regardless of when the last proven block were. +2. After $T_{\textsf{fallback}, \textsf{exit}}$ we allow exiting the fallback regardless of when the last proven block was. Option 1, ensures that we can quickly leave the based fallback if there is no activity, allowing a good experience for the happy path. Option 2, ensures that we will not be stuck forever in based fallback even if there is a malicious entity pushing blocks once in a while. diff --git a/in-progress/9101-blob-integration/design.md b/in-progress/9101-blob-integration/design.md index 6b6fee3..abb7e58 100644 --- a/in-progress/9101-blob-integration/design.md +++ b/in-progress/9101-blob-integration/design.md @@ -292,10 +292,10 @@ The information set out herein is for discussion purposes only and does not repr - Begin reading TxEffects data from the consensus layer - https://github.com/AztecProtocol/aztec-packages/issues/10056 - Provide a way for nodes to sync the chain without a dependence on calldata - https://github.com/AztecProtocol/aztec-packages/issues/10057 - Request Response for block data - - Sync from `snap` sync service - a publically hosted database (this can be checked against the calldata) + - Sync from `snap` sync service - a publicly hosted database (this can be checked against the calldata) - Remove `data` blob from the rollup - https://github.com/AztecProtocol/aztec-packages/issues/10058 ## QUESTIONS - Does the PXE read from L1, or does it just interface with the node itself?? -- Investigate the use of Kurtosis to set up execution layer + consensus layer nodes \ No newline at end of file +- Investigate the use of Kurtosis to set up execution layer + consensus layer nodes diff --git a/in-progress/proving-queue/0005-proving-queue.md b/in-progress/proving-queue/0005-proving-queue.md index 411e02c..ac76a8a 100644 --- a/in-progress/proving-queue/0005-proving-queue.md +++ b/in-progress/proving-queue/0005-proving-queue.md @@ -17,9 +17,9 @@ We need to prove blocks of transactions in a timely fashion. These blocks will l ## Overall Architecture -The overall architecture of the proving subsystem is given in the following diagram. The orchestrator understands the process of proving a full block and distills the proving process into a stream of individual proof requests. These requests can be thought of as 'jobs' pushed to a queueing abstraction. Then, once a proof has been produced a callback is inovked notifying the orchestrator that the proof is available. The dotted line represents this abstraction, an interface behind which we want to encourage the development of alternative methods of distributing these jobs. In this diagram the 'Prover' is an entity responsible for taking the requests and further distributing them to a set of individual proving agents. +The overall architecture of the proving subsystem is given in the following diagram. The orchestrator understands the process of proving a full block and distills the proving process into a stream of individual proof requests. These requests can be thought of as 'jobs' pushed to a queueing abstraction. Then, once a proof has been produced a callback is invoked notifying the orchestrator that the proof is available. The dotted line represents this abstraction, an interface behind which we want to encourage the development of alternative methods of distributing these jobs. In this diagram the 'Prover' is an entity responsible for taking the requests and further distributing them to a set of individual proving agents. -In this architecture it is important to understand the seperation of concerns around state. The orchestrator must maintain a persisted store describing the overall proving state of the block such that if it needs to restart it can continue where it left off and doesn't need to restart the block from scratch. This state however will not include the in-progress state of every outstanding proving job. This is the responsibility of the state managed by the prover. This necessitates that once the `Proof Request` has been accepted by the prover, the information backing that request has been persisted. Likewise, the `Proof Result` acknowldgement must be persisted before completion of the callback. This ensures no proving jobs can be lost in the cracks. It is always possible that a crash can occur for example after the `Proof Reault` is accepted and before it has been removed from the prover's store. For this reason, duplicate requests in either direction must be idempotent. +In this architecture it is important to understand the separation of concerns around state. The orchestrator must maintain a persisted store describing the overall proving state of the block such that if it needs to restart it can continue where it left off and doesn't need to restart the block from scratch. This state however will not include the in-progress state of every outstanding proving job. This is the responsibility of the state managed by the prover. This necessitates that once the `Proof Request` has been accepted by the prover, the information backing that request has been persisted. Likewise, the `Proof Result` acknowledgment must be persisted before completion of the callback. This ensures no proving jobs can be lost in the cracks. It is always possible that a crash can occur for example after the `Proof Result` is accepted and before it has been removed from the prover's store. For this reason, duplicate requests in either direction must be idempotent. ![Proving Architecture](./proving-arch.png) @@ -52,7 +52,7 @@ type Metadata = { ``` -Proving request data, such as input witness and recursive proofs are stored in a directory labelled with the job's id residing on an NFS share/S3 bucket or similar. This is an optimsation, as prover agents can access the data independently, without requiring the broker to transfer large amounts of data to them. If it turns out that this is not required then the proof requests will reside on a disk local to the broker and will be transferred over the network from the broker. Maybe this should be a configurable aspect of the system. +Proving request data, such as input witness and recursive proofs are stored in a directory labelled with the job's id residing on an NFS share/S3 bucket or similar. This is an optimization, as prover agents can access the data independently, without requiring the broker to transfer large amounts of data to them. If it turns out that this is not required then the proof requests will reside on a disk local to the broker and will be transferred over the network from the broker. Maybe this should be a configurable aspect of the system. ![alt text](./broker.png) @@ -94,4 +94,4 @@ Finally, the broker periodically checks all current jobs to see if their `Last U The described interactions should mean that we maintain a queue of jobs, prioritised in whatever way we need, queryable by however we require whilst only using a simple LMDB store and directory structure. By doing all of this in memory we drastically reduce the amount of DB access required at the expense of potentially some duplicated effort and negotiation upon broker restart (something we hope is a rare occurence). Even if we consider a worst case scenario of ~200,000 outstanding proof requests, this should not require more than a few 10's MB of memory to cache. One potential concern is performance. There will be a large number of prover agents querying for work and these queries will need to be very efficient, but this will be the case with any system. -The last step is that the broker pushes all completed jobs back to the orchestrator, shortly after they have been completed but asynchronously to the completion message from the agent. The job is removed from both the directory listing and the index DB. When the queue is empty, a check is performed that the proof request directory is empty. Any remaining data is deleted. \ No newline at end of file +The last step is that the broker pushes all completed jobs back to the orchestrator, shortly after they have been completed but asynchronously to the completion message from the agent. The job is removed from both the directory listing and the index DB. When the queue is empty, a check is performed that the proof request directory is empty. Any remaining data is deleted. diff --git a/in-progress/world-state/0004-world-state.md b/in-progress/world-state/0004-world-state.md index 4d5c5f0..5fe2d72 100644 --- a/in-progress/world-state/0004-world-state.md +++ b/in-progress/world-state/0004-world-state.md @@ -33,7 +33,7 @@ Every tree instance is represented by 3 stores of state, 2. Uncommitted state - an in-memory cache of updated nodes overlaying the committed store also referenced by node index. 3. Snapshotted state - the historical values of every populated node, structured to minimize space consumption at the expense of node retrieval speed. Reading this tree requires 'walking' down from the root at a given block number. -The idea behind this structure is that sequencers are the only actor interested in uncommitted state. It represents the state of their pending block and they update the uncommitted state as part of building that block. Once a block has been published to L1, its state is committed and the uncommmitted state is destroyed. After each block is committed, the historical tree is traversed in a BFS manner checking for differences in each node. If a node is the same as previously, the search does not continue to its children. Modified nodes are updated in the snapshot tree. +The idea behind this structure is that sequencers are the only actor interested in uncommitted state. It represents the state of their pending block and they update the uncommitted state as part of building that block. Once a block has been published to L1, its state is committed and the uncommitted state is destroyed. After each block is committed, the historical tree is traversed in a BFS manner checking for differences in each node. If a node is the same as previously, the search does not continue to its children. Modified nodes are updated in the snapshot tree. Clients only read committed or snapshotted state, they have no need to read uncommitted state. @@ -57,7 +57,7 @@ Reading the sibling path for leaf 3 at block 2 is performed by traversing the tr ![image](./historic-hash-path.png) -This system of content addressing is used for snapshotting both append-only and indexed trees. Block 3 updated leaf 0, which could only happen in an indexed tree and it should be clear that the same method of hash path retrievel works in this case. It enables us to serve merkle membership requests for any block in time whilst only storing the changes that occur with each block. +This system of content addressing is used for snapshotting both append-only and indexed trees. Block 3 updated leaf 0, which could only happen in an indexed tree and it should be clear that the same method of hash path retrieval works in this case. It enables us to serve merkle membership requests for any block in time whilst only storing the changes that occur with each block. Despite this method of only storing changes with each block. Historical trees will still require a signifcant amount of data and as it stands there is no ability to prune the history meaning nodes either store all history or no history. @@ -82,7 +82,7 @@ The reasoning behind this structure is: 3. The block updates cache could be either in memory or persisted in a database. Based on the configuration of the node and what it is used for. 4. The 'view' of the world state provided by an image is exactly as it was for that block, meaning historic state requests can be served against blocks on the pending chain. 5. Block building participants can request multiple images for the purpose of e.g. simulating multiple different block permutations or proving multiple blocks concurrently if the chosen proving coordination protocol permits. -6. For images extending beyond the previous epoch, the referencing of the tip of the previous epoch is to ensure that the block updates database for any image does not grow larger than 1 epoch and it is effecitvely 'reset' upon finalization of an epoch. +6. For images extending beyond the previous epoch, the referencing of the tip of the previous epoch is to ensure that the block updates database for any image does not grow larger than 1 epoch and it is effectively 'reset' upon finalization of an epoch. 7. Re-orgs become trivial. The world state simply destroys the current set of images of the pending chain. ## The Commit Process @@ -108,7 +108,7 @@ Indexed trees are more complicated than append-only as they support updates anyw ![reference counting](./reference-counting.png) -We now have sufficient information to prune snapshots beyond a configured historical window. We will demonstrate with our 3 block example by only keeping a 2 block history (priuning block 1) and adding a further block. After the update to the tree for block 3, the tree is traversed from the root of block 1 and the following rules applied to each node: +We now have sufficient information to prune snapshots beyond a configured historical window. We will demonstrate with our 3 block example by only keeping a 2 block history (pruning block 1) and adding a further block. After the update to the tree for block 3, the tree is traversed from the root of block 1 and the following rules applied to each node: 1. Reduce the reference count by 1. 2. If the count is now 0, remove the node and move onto the node's children. @@ -129,7 +129,7 @@ We now add block 4, which updates leaf 0 again. Whilst this might not be likely ### Snapshotting Append Only Trees -We have seperated the snapshotting of append only trees into its own section as we propose a completely different approach. We won't snapshot them at all! By their very nature, simply storing an index of the size of the tree after every additional block, it is possible to reconstruct the state of the tree at any point in its history. We will demonstrate how this works. Consider the following append only tree after 3 blocks of leaves have been added. The index at the bottom shows that the tree was at size 2 after 1 block, size 3 after 2 blocks and size 5 after 3 blocks. We want to query the tree as it was at block 2 so only considering the green leaves, not those coloured yellow. +We have separated the snapshotting of append only trees into its own section as we propose a completely different approach. We won't snapshot them at all! By their very nature, simply storing an index of the size of the tree after every additional block, it is possible to reconstruct the state of the tree at any point in its history. We will demonstrate how this works. Consider the following append only tree after 3 blocks of leaves have been added. The index at the bottom shows that the tree was at size 2 after 1 block, size 3 after 2 blocks and size 5 after 3 blocks. We want to query the tree as it was at block 2 so only considering the green leaves, not those coloured yellow. ![append only tree](./append-only-tree.png) @@ -164,7 +164,7 @@ fr getNodeValue(uint32_t level, index_t index, uint32_t blockHeight) { } ``` -Now that we have a way of computing the value of any node at any previous block height we can serve requests for historic state directly from the `current` state. We are swapping a significant reduction in storage for an increase in compute to regenerate point-in-time state. This trade-off seems benficial however now that the underlying native merkle tree design affords us the ability to perform these operations concurrently across multiple cores. +Now that we have a way of computing the value of any node at any previous block height we can serve requests for historic state directly from the `current` state. We are swapping a significant reduction in storage for an increase in compute to regenerate point-in-time state. This trade-off seems beneficial however now that the underlying native merkle tree design affords us the ability to perform these operations concurrently across multiple cores. ## Change Set @@ -190,4 +190,4 @@ As the World State is used heavily in all operations, we will gain confidence th 1. Unit tests within the C++ section of the repo. 2. Further sets of unit tests in TS, comparing the output of the native trees to that of the TS trees. -3. All end to end tests will inherently test the operation of the World State. \ No newline at end of file +3. All end to end tests will inherently test the operation of the World State.