Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

NMT doesn't properly handle out of order data #41

@Dieterbe

Description

@Dieterbe

due to the way nsqd currently fills over traffic from a in-mem to via-diskqueue channels (by selecting on them), it can arbitrarily reorder your data. ideally if the in memory channel is always empty this shouldn't happen but maybe due to minor hickups. we can see if increasing the size of the memory buffer helps though obviously then we would incur more data loss in case of an nsqd crash.

@woodsaj confirmed this by feeding data into nsqd in order, and have an NMT consumer with 1 concurrent handler, and the data was out of order.

I've had conversations with Matt Reiferson (of nsq) seeing how feasible it would be to add simple ordering guarantees to nsqd, even if merely per-topic per nsqd instance. but even that seems quite complex/tricky and would require a different model for requeues, defers, msg timeouts etc and would be a drastically different nsqd behavior, even with nsqio/nsq#625

his recommendation was to use an ephemeral channel to always read the latest data to serve up to users from RAM, and just drop what we can't handle, an additionally use a diskqueue backed channel which you read from and store into like HDFS, so that you can then use hadoop to properly compute the chunks to store in archival storage (i.e. go-tsz chunks in cassandra) even on out of order data.
though this seems like far more complexity than we want, although i like the idea of separating in-mem data and archival storage, that seems to let us simplify things. but using hadoop to work around poor ordering after the fact...

what we can also do:

  • current approach, but keep a window of messages which we sort, let's say of 10seconds long, and after 10s we can assume we have a good order and decode the messages and commit their metrics to go-tsz chunks and we wouldn't have much risk getting chunks that are >10s late. but of course then it will also take 10s for data to start showing up when NMT responds to queries. hmm well i guess the query handler could also look through the messages in the window and pull data from there.
  • related idea: don't explicitly keep a window of messages to sort, but keep simple non-go-tsz-optimized datastructures of points (like simple arrays of uint32-float64 pairs) so that the metrics of all new messages can immediately be added and are available for querying. whenever the data is getting old enough to move to cassandra, that's when we generate the chunks, at which point the data should be very stable.
    however this means for update operations we might commit the wrong values if the 2 writes for the same slot happen in the wrong order (though we're not currently doing any updates) and also it would be less RAM efficient to keep the data in such arrays.

note that in both above approaches we assume ordering of messages is all we need.
in reality messages from the collectors can contain points for different timestamps (and this is hard to address in the collectors per AJ) so in NMT we would have to order the actual points, not just the messages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions