Decentralized Message Queue (DMQ) Implementation Overview

Overview

This document serves as an extension to CIP-137, providing technical insights and implementation guidance for the proposed Decentralized Message Queue (DMQ), primarily supporting Mithril protocol operations.

Significant progress toward enabling Mithril has already been achieved. Notably, an initial implementation step was merged into the ouroboros-network repository, and a document highlighting decisions and goals demonstrates ongoing progress. Current development progress is actively tracked in this issue.

Document Structure

This document is divided into two parts. The first section outlines the network team's interpretation of CIP-137 requirements, discussing both implementation considerations and expected business logic. This will form the basis for collaboration with the Mithril team. The second section will provide a more detailed implementation plan.

Technical Architecture and Protocol Requirements

The objective is to leverage Cardano's Network Diffusion layer to distribute various types of information—in this case, signatures—by creating multiple overlay networks. Achieving this goal involves:

Ensuring Cardano's Network Diffusion layer is reusable;
Implementing CIP-137.

The first requirement has already been met, with more information available here. The second requirement necessitates implementing three mini-protocols:

Message Submission Protocol: Node-to-node communication employing a pull-based model for efficient and secure diffusion.
Local Message Submission Protocol: Enables local clients (e.g., Mithril Signers) to submit messages securely.
Local Message Notification Protocol: Allows clients to receive notifications of new messages via the network node.

This section captures the sequence diagram of signatures through the Mithril network architecture.

High-Level Overview

flowchart RL
    subgraph MPA[Mithril Processes A]
        MSA1[Mithril Signer A1]
        MSA2[Mithril Signer A2]
        MAA1[Mithril Aggregator A1]
        MAA2[Mithril Aggregator A2]
    end
    subgraph MN[Mithril Diffusion Network]
        DMQN1[DMQ Node 1]
        DMQN2[DMQ Node 2]
        DMQN3[DMQ Node 3]
        DMQN1 <-- "N2N Signature Submission Protocol" --> DMQN2
        DMQN1 <-- "N2N Signature Submission Protocol" --> DMQN3
        DMQN2 <-- "N2N Signature Submission Protocol" --> DMQN3
    end
    subgraph MPB[Mithril Processes B]
        MSB1[Mithril Signer B1]
        MSB2[Mithril Signer B2]
        MAB1[Mithril Aggregator B1]
        MAB2[Mithril Aggregator B2]
    end
    MSA1 -- "N2C Local Message Submission Protocol" --> DMQN1
    MSA2 -- "N2C Local Message Submission Protocol" --> DMQN1
    MAA1 <-- "N2C Local Message Notification Protocol" --> DMQN1
    MAA2 <-- "N2C Local Message Notification Protocol" --> DMQN1

    DMQN3 -- "N2C Local Message Submission Protocol" --> MSB1
    DMQN3 -- "N2C Local Message Submission Protocol" --> MSB2
    DMQN3 <-- "N2C Local Message Notification Protocol" --> MAB1
    DMQN3 <-- "N2C Local Message Notification Protocol" --> MAB2

Signatures in the above diagram, flow from Signers to Aggregators. Starting from Mithril Signers that submit messages to local DMQ nodes via the N2C Local Message Submission Protocol, these messages are then diffused to other nodes through the N2N Signature Diffusion Protocol. Mithril Aggregators, connected via the N2C Message Notification Protocol, then receive notifications of new signatures.

Node-to-Client Local Message Submission Protocol

The Local Message Submission mini-protocol allows local clients, such as Mithril Signers, to submit messages directly to network nodes. Due to mutual trust between local processes, risks like malformed messages, excessive message sizes, or invalid contents are minimal.

Connections are short-lived, reducing complexity in resource management and connection handling.

Node-to-Client Local Message Notification Protocol

This protocol enables local clients (e.g., Mithril Aggregators) to receive timely notifications from network nodes regarding newly received messages.

As currently specified, the protocol involves short-lived connections where an Aggregator node queries the DMQ node for a single message at a time. This design limitation can result in inefficiencies when multiple messages are available, forcing the Aggregator node to continuously poll the DMQ node. To avoid unnecessary polling, the protocol could instead be enhanced by allowing the DMQ node to indicate, within its response, whether additional messages are available and, if so, how many. This adjustment would streamline communication, reduce redundant connection attempts, and improve overall efficiency with minimal protocol complexity. However, there will be burst of messages shortly after the opening of a signing round, resulting in around 3000 messages per round to be available at a time. Given this it makes more sense to design the protocol to avoid polling at all and make connections to be longer lived. This way aggregators could just keep a connection to the DMQ node and receive messages as soon as they were available. Changing this protocol in this way also makes it more flexible and general so it can be used for a wider range of applications (e.g. Peras and Leios).

Node-to-Node Message Submission Protocol

This protocol facilitates message diffusion between DMQ nodes, utilizing a pull-based strategy. The inbound side explicitly requests new messages from peers, thereby efficiently managing resource consumption and safeguarding against potential DoS attacks.

This protocol closely mirrors the Cardano Transaction Submission Protocol, leveraging existing logic from the ouroboros-network library.

Validation of Signatures

Before being diffused to other peers, an incoming message must be verified by the receiving node. The message contains almost all the information required to validate a message (as explained in the CIP), the only external piece of information needed is a snapshot of the stake distribution that can be cached once every couple of epochs since it doesn't change too often.

It is possible to perform concurrent/parallel verification of multiple messages since there's no dependency between individual messages.

Invalidation of Signatures

Messages can be garbage collected after their expected TTL has expired. The TTL should at least cover the duration of a signature round (e.g., 10 minutes). However, DMQ nodes are unaware of Mithril-specific details like signing round durations. As a result, they cannot independently determine an appropriate TTL for messages in their mempool — but Signer nodes can.

If Signer nodes include a TTL with each message submitted via the N2C Message Submission protocol, DMQ nodes can use and propagate that value accordingly.

To prevent adversarial nodes from abusing this by assigning excessively long TTLs (potentially leading to DoS attacks), a maximum allowed TTL should be configurable at the protocol level. This lets peers reject connections or messages that exceed the acceptable TTL threshold, protecting against misuse.

Network Handshake

Currently, no specialized network parameters are required beyond the standard NodeToNodeVersionData.

Protocol Limits

Protocol message size limits as defined by CIP-137 are:

Message part	Lower bound	Upper bound
messageBody	360 B	2,000 B
blockNumber	4 B	4 B
ttl (min 0, max 65535s)	2 B	2 B
kesSignature	448 B	448 B
operationalCertificate	304 B	304 B
message part totals	1,150 B	2,790 B

These limits should guide the sizing of protocol tokens.

Implementation Details

Node-to-Client Message Submission Protocol

The Node-to-Client Message Submission Protocol implementation adheres to the CIP mini-protocol specification. The protocol uses the following concrete CDDL specification:

localMessageSubmissionMessage
  = msgSubmitMessage
  / msgAcceptMessage
  / msgRejectMessage
  / msgDone

msgSubmitMessage = [0, message]
msgAcceptMessage = [1]
msgRejectMessage = [2, reason]
msgDone          = [3]

reason = invalid
       / alreadyReceived
       / ttlTooLarge
       / other

invalid         = [0, tstr]
alreadyReceived = [1]
ttlTooLarge     = [2]
other           = [3, tstr]

messageId    = bstr
messageBody  = bstr
blockNumber  = word32
ttl          = word16
kesSignature = bstr
operationalCertificate = bstr

message = [
  messageId,
  messageBody,
  blockNumber,
  ttl
  kesSignature,
  operationalCertificate
]

Only the server side of this protocol is relevant for this implementation. Upon establishing a new connection via the local socket, the server processes inbound messages according to this protocol. When receiving a MsgSubmitMessage, the server must validate the message prior to adding it to the internal mempool. If the message is invalid or has already been received, the server must reject it by sending a MsgRejectMessage with the relevant reason and immediately close the connection. If the message is valid, the server adds it to the mempool and responds with MsgAcceptMessage. After the response, the connection is closed when the protocol concludes with a MsgDone.

Node-to-Client Message Notification Protocol

The Node-to-Client Message Notification Protocol also follows the CIP mini-protocol specification (with a small modification) and uses the following CDDL specification:

localMessageNotificationMessage
  =
  ; corresponds to either MsgRequestMessagesBlocking or
  ; MsgRequestMessagesNonBlocking in the spec
    msgRequestMessages
  / msgReplyMessages
  / msgClientDone
  / msgServerDone

msgRequestMessages = [0, isBlocking, ackedMessages]
msgReplyMessages   = [1, messages]
msgClientDone      = [2]
msgServerDone      = [3]

messageId    = bstr
messageBody  = bstr
blockNumber  = word32
ttl          = word16
kesSignature = bstr
operationalCertificate = bstr

message = [
  messageId,
  messageBody,
  blockNumber,
  kesSignature,
  operationalCertificate
]

isBlocking = false / true
ackedMessages = * messageId
messages = [* message]

Again, this protocol is concerned primarily with the server side. Upon establishing a connection via the local socket, the server awaits a MsgRequestMessages from the client. This message can be blocking or non-blocking. If it is a non-blocking request, the server should reply promptly (i.e. within a small timeout) with a (possibly empty) reply. This reply contains a flag saying whether the server still has messages to send or not. If the flag is set to False the client must use a blocking request; otherwise it must be a non-blocking request. A blocking request, waits until at least one transaction is available. The client acknowledges what messages it received whenever it issues new requests so that the server can keep track of its internal FIFO for resource management.

This protocol currently supports one-to-many client-server connections where multiple Aggregator nodes concurrently requesting messages. The server must track which messages have been sent to which clients and which were acknowledged by them during the lifetime of the connection.

Node-to-Node Signature Submission Protocol

The Node-to-Node Message Submission Protocol adheres to the CIP mini-protocol specification and CDDL specification.

This protocol involves both outbound and inbound sides. Upon establishing a connection, the Handshake protocol runs using NodeToNodeVersionData. Currently, there are no specific protocol parameters to negotiate, so handshake data remains standard and does not directly impact the Signature Submission Protocol.

After the handshake, the Signature Submission Protocol is initialized exclusively for peers identified as hot by the diffusion layer, indicating high activity or network value.

To facilitate the protocol operations, a shared internal state (the mempool) is maintained. This mempool stores all messages and tracks their status to determine readiness for Aggregator consumption.

Messages enter the mempool through two paths:

Local Message Submission Protocol (from Signer nodes)
Message Submission Protocol (from peer DMQ nodes)

Every message entering the mempool undergoes validation as defined by the CIP (see Message Authentication Mechanism). Each message is independent, allowing parallel validation.

Messages are removed from the mempool upon exceeding their TTL. The TTL for each message will either come from the Signer node for locally received messages or from other DMQ nodes, assuming that they respect the max TTL value allowed.

The current design allows for the possibility that some Aggregators might not receive all diffused messages. However, we believe this scenario is considered acceptable, given that the Aggregators do not require this invariant to hold in order to function properly.

Work is ongoing to develop a reusable version of the transaction submission protocol (utilized by cardano-node), which is anticipated to simplify implementation efforts for this protocol.

Peer Stake Distribution

Message validation requires access to the most recent Peer Stake Distribution data, obtainable from existing mechanisms such as cardano-cli. A dedicated mechanism is required to periodically fetch and refresh the stake distribution snapshot every few epochs. This can be efficiently managed via a lightweight background thread periodically updating a shared mutable state variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly