Skip to content

[Design] hathor-core: Add --enable-event-queue to facilitate reliable integrations #103

Open
@msbrogli

Description

@msbrogli

Summary

Add the an event-safe mode to the full-node, which brings important features to facilitate reliable integrations.

Motivation

During the development of the wallet service, we spent a long time developing a sync algorithm between the state of the wallet and the state of the full node. The motivation is to simplify this integration and others with the full node.

Without this feature, the integration must understand Hathor's architecture to develop a reliable integration. After this feature, they just have to react to a small number of cases accordingly.

Guide-level explanation

In this document, integrations and applications have the same meaning and will be used interchangeably. Blocks and transactions are collectively called vertices. Transactions refer to all non-block vertices (e.g., regular transaction and token creation transactions).

The full node can be executed with --enable-event-queue, which means all events will be persisted and can be accessed and modified through an HTTP API. Hence, when an application is stopped or restarted, it can resume processing events from the last processed event.

Application should use the full node as the single source of truth, which means there is no need to store the vertices or their metadata. In fact, they should always fetch information from the full node, which is fast and reliable.

Applications can also remove old events at their will to save space. For instance, an application can have the policy to persist only the last month of events. In this case, the application must periodically call the API to clean up old events.

The full node ensures that no events will be missed even in case of system crash or power failure. The full node also stores the creation date and last modified date of vertices. The full node can optionally keep track of which events were correctly processed by the application.

Applications will commonly have the following initial workflow:

  1. Stop full node's sync algorithm, so no update will be accepted. The application can optionally clear the events in this step.
  2. Process all vertices in topological order. The previous step ensures that nothing will be modified while this initial processing is executed. The topological order ensures that all dependencies will be fulfilled.
  3. Start full node's sync algorithm and process all events in real time.

After the application is fully initialized, it just has to process the new events. For this, the application can receive the ordered events in real time or just poll the events as needed.

Applications must process events in the same order they arrive. So, after an event is successfully handled, applications must mark them as done to get the new event. If the queue of events gets full, the full node stops syncing and signals error.

The sync between the full node and an application will consist of handling the following events:

  1. Mempool: New executed tx.
  2. Mempool: New voided tx.
  3. Mempool: Tx changed to voided.
  4. Mempool: Tx changed to executed.
  5. Blockchain: New best block found.
  6. Blockchain: New orphan block found.
  7. Reorg: Tx is back to the mempool and is executed.
  8. Reorg: Tx is back to the mempool and is voided.
  9. Reorg: Tx is confirmed by another block.
  10. Reorg: Block became orphan.
  11. Reorg: Orphan block became part of the best blockchain.

If the full node dies while the application keeps running, the full node will remember which events have already been processed and which have not. Applications must also keep track of which events have been processed on their end, so they can properly mark them as done after the full node is back.

In case of a non-graceful stop, the full node should start in pause mode and wait for the application to resume the sync.

APIs

GET /events/[:event_id]

Get the details of a specific event id.

GET /events/list/

Get the list of events according to the filters.

POST /events/flush/

Flush the event list, deleting all that matches the filters.

GET /events/next/

Return the event in front of the queue. If this API is called multiple times without marking the event as done, it will return the same event all the times.

POST /events/[:event_id]/mark-as-done/

Mark the event as successfully done, and returns the next event of the queue. The caller can optionally delete the event.

GET /sync/status

Get the current status of the full node.

POST /sync/pause

Pause the full-node from syncing and receiving new transactions. It might be used for the application's maintenance.

POST /sync/resume

Resume syncing and receiving new transactions.

Events

Events are stored in JSON format with the following required fields:

{
  "id": int (incremental),
  "parent_id": int (parent event),
  "timestamp": int,
  "type": str,
}

Each event type can extend the JSON with extra fields.

Daemon events

NODE_STARTED

This event is generated when the full node is started.

NODE_STOPPED

This event is generated when the full node is gracefully stopped.

Sync events

SYNC_RESUMED

This event is generated when the full node resumes the sync algorithm.

SYNC_PAUSED

This event is generated when the full node pauses the sync algorithm.

Consensus events

NEW_TX_ACCEPTED

This event is generated when a new transaction is accepted and placed into the mempool.

NEW_BEST_BLOCK

This event is generated when a block is appended to the best blockchain. This event generates a sequence of METADATA_UPDATE:first_block events in topological order.

NEW_ORPHAN_BLOCK

This event is generated when a new orphan block is received.

TX_DECLINED

This event is generated when the full node declines a transaction. The transaction will not be stored in the database and its data will only be available in this event.

REORG

This event is generated when a reorg occurs, i.e., a new blockchain with higher proof-of-work is found. This event genrates a sequence of other events.

METADATA_UPDATED

This event is generated when a vertex's metadata is updated. It might be caused by different sources.

The extra fields are:

{
  field: str,
  operation: str (e.g., set, append, remove, clear),
  value: any,
  old_value: any | null (only for set operations),
}

Example: Normal flow

The expected normal flow of events would be like this:

  1. NEW_TX_ACCEPTED
  2. NEW_TX_ACCEPTED
  3. NEW_BEST_BLOCK
  4. NEW_ORPHAN_BLOCK
  5. NEW_TX_ACCEPTED
  6. NEW_TX_ACCEPTED
  7. NEW_TX_ACCEPTED
  8. NEW_BEST_BLOCK
  9. NEW_TX_ACCEPTED
  10. NEW_TX_ACCEPTED

Example: Small Reorg

  1. NEW_TX_ACCEPTED
  2. NEW_TX_ACCEPTED
  3. NEW_BEST_BLOCK
  4. NEW_ORPHAN_BLOCK
  5. REORG_BEGIN
  6. METADATA_UPDATED (child of REORG)
  7. METADATA_UPDATED (child of REORG)
  8. METADATA_UPDATED (child of REORG)
  9. METADATA_UPDATED (child of REORG)
  10. METADATA_UPDATED (child of REORG)
  11. METADATA_UPDATED (child of REORG)
  12. METADATA_UPDATED (child of REORG)
  13. METADATA_UPDATED (child of REORG)
  14. METADATA_UPDATED (child of REORG)
  15. METADATA_UPDATED (child of REORG)
  16. REORG_END
  17. NEW_TX_ACCEPTED
  18. NEW_TX_ACCEPTED
  19. NEW_BEST_BLOCK

Example: Wallet Service

Using these events, the wallet service would not need to run a complex sync algorithm anymore. It will only have to update the database according to each type of event. As the events are generated in topological order, it can handle each event independently, and it can safely assume all dependencies will have already been processed.

Reference-level explanation

Persistence of events in a storage

As the number of events can grow fast, the full node must persist these events in a storage. Here we must add the details of how it will be implemented.

Drawbacks

Rationale and alternatives

Instead of implements the event queue in the full node, we could use one of the nice and full-featured queue system available (e.g., RabbitMQ and ZeroMQ). The queue system would run as a separate service, and the full node would have to pause syncing when new events cannot be successfully put into the queue. Even though this alternative seems good, it would require the applications to deploy, maintain, and monitor another service. So, it seems better to have a built-in queue system inside hathor-core.

Prior art

Bitcoin has an integration with ZeroMQ to handle similar situations. But, as ZeroMQ is a separate service, it might happen to lose some notifications. This is a complex situation to handle for most use cases and that's the main reason we are implement a simple event system inside the full node. For further information, see Block and Transaction Broadcasting with ZeroMQ.

Unresolved questions

Future possibilities

Add events for networking.

Metadata

Metadata

Assignees

Labels

designIssue that describes a project in details

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions