Skip to content

NATS-based cluster #127

@lalinsky

Description

@lalinsky

The key idea, use NATS JetStream as a replicated oplog, plus a KV store for status tracking.

We have one stream for tracking index statuses, similarly to NAT KV, but implemented using JS primitives directly. The stream is called fpindex-status, messges use subjects fpindex.status.{index_name}, and contains msgpack encoded messages. The stream is configured to keep 3 messages per subject. We use direct get for getting the latest value.

Index can have these states:

  • creating (stream does not exist yet, reads/writes not allowed)
  • created (stream exists, reads/writes allowed)
  • deleting (stream still exists, reads/writes disallowed, waiting for consumers to stop, index can't be recreated)
  • deleted (stream does not exist, safe to recreate index)

The messages in the status stream as msgpack encoded:


pub const IndexState = enum(u8) {
  creating,
  created,
  deleting,
  deleted,
};

pub const IndexStatus = struct {
  state: IndexState,
};

All status operations use NATS optimistic locking, using the expected_last_subject_seq option.

For each index we have an updates stream for storing the oplog, fpindex-updates-{index} is the stream name, fpindex.updates.{index} is the subject feeding the stream. Stream is configured to store all messages.

Each instance is subscribed to the status stream, ephemeral consumer always reading from the beginning, used for starting and stopping per-index consumers. We should also keep track of the pending states (creating/deleting) and if there is and use this information in the cleanup process, see below.

The per-index customers are durable. When we starting the subscription, we should first check if the consumer exists, if the last delivered message matches what we have in the index (using last_seq metadata stored in the index). If there is a mismatch, is need to be reconciled. That means deleting the index, deleting the consumer and starting from scratch.
If everything is OK, we can start the existing subscription and keep updating the index.

For completeness, the updates stream should also include create_index and delete_index messages. When the consumer receives delete_index, it will delete the local index, unsubscribe and delete the consumer.

There should be a cleanup process, watching for indexes in deleting state, if such stream no longer has any consumers, it's safe to delete it. We can use NATS advisory subject $JS.EVENT.ADVISORY.CONSUMER.DELETED.*.* for automatically getting notified about consumer deletions.

Operations

Create Index

  • check status
  • if created, return
  • if deleting, return error
  • if creating, and more than 10 seconds old, we can try to take it over
  • mark the status as creating
  • create stream
  • post create_index (expected_seq=0)
  • mark the status as created

Delete Index

  • check status
  • if deleting, return
  • if creating, return error
  • mark status as deleting
  • trigger the cleanup process

Update

  • check status
  • if not created, return error
  • post to the stream

Get Index Info

  • check status
  • check status of the updates stream
  • build info based on that

Check Index Exists

  • check status
  • return based on that

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions