Skip to content

Conversation

@cheme
Copy link

@cheme cheme commented Jun 30, 2023

Very nice documentation (I wish I had it when reviewing the prs :)

I open this pr with my reading notes, since the doc is a bit long, something may be a bit off (in later part), but may indicate a reference may be good.
Also it is from my personal scope, so I focus on some points that I find relevant but may not be.

Only thing that may really need a double check: my reading of the clamping function in convention-and-definition (and subsequently in the actual code). I may just have misunderstood the spec though (or was especially tired yesterday evening).


// TODO this implies that we need to manually register nodes: should be good to define how it is
// done (offchain worker putting a extrinsic): this needs to be known by implementors (even if offchain
should be called by those correctly there may be some quirks).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bit of this do you think implies we need to manually register nodes?

As you probably saw on the other PR, Basti has asked me to change how the registration stuff is done... I'll push that update and a spec update soon. As the registrations will be more actively driven by the node, I'll need to say a bit more in this spec...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes +1 for registration mechanism in the spec (if it is not already, I am not the best reader, just mentioning what was lacking for my comprehension, but sometime I just read badly).

)

It targets a normal distribution using [ziggurat](https://www.doornik.com/research/ziggurat.pdf) method.
Note that implementation can diverge here. (TODO meaning using a different random function would be hard to detect and we don't attempt to).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations are not allowed to diverge here. These random numbers must match between implementations or the delays computed by message senders will not match the real delays. This is why I've given a number of examples. Of course a malicious implementation can do whatever it wants, but that goes beyond just using a different random number generator.

I don't really want to go into detail on how the random number generation works here; I figured just saying it must match specific versions of the rand crates would be good enough? The rand library authors promise to provide deterministic portable random number generation where possible; see here. If they change the algorithm in a new version and we are forced to use the new version, we may need to copy the old algorithm into the mixnet crate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations are not allowed to diverge here. These random numbers must match between implementations or the delays computed by message senders will not match the real delays.

If nothing prevent this, does it need to be part of the spec? maybe something like "is expected to".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing forces a node to do anything that's written in this spec really. I don't think this bit is special in any way?

all non-mixnodes in a session; see the [parameters chapter](./parameters.md). Note that the rates
should be halved in some phases; see the [sessions chapter](./sessions.md#phases).

TODO maybe be explicit if `Node` is `mixnet-node` or if it includes `non-mixnet-nodes`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try rewording this. I thought it was clear enough that it meant both given that it mentions both mixnodes and non-mixnodes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what to do if I am no t a mixnet node, problably I should generate cover traffic, but if all non mixnet node generate cover traffic, then I think we got a problem (I am thinking of non mixnet node as an unbound resource but if we consider them as bounded by the number of accepted node by validator it can be ok, but then we got a problem of node blocking access to others).
So I was more thinking that the sentence did apply only for the mixnet node.
My understanding was that non mixnet node are allowed to do cover and should probably do if they are going to send a real message, but no rules could really apply for them.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All nodes that are connected to the mixnet are expected to generate cover traffic. Ideally we want some nodes that aren't intending to send messages to connect to the mixnet anyway, but we almost certainly don't want all nodes to connect to the mixnet as this would overload it. I don't really have a solution to this at the moment. Perhaps nodes not intending to send messages could connect with some probability determined by the runtime. This would probably be quite slow to react to changes in the total number of nodes, but if that number is fairly stable it could work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the initial version, simply enabling the mixnet by default for nodes run with --validator could work.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the cover traffic rate for non-mixnodes is currently 10x lower than for mixnodes. N mixnodes should comfortably support 10*N non-mixnodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense, seems important to have those numbers on doc (even if nobody is forced to emit cover messages it can give an idea of the possible behavior).

| 20..21 | `data_size` | Little-endian number of bytes of message data in this fragment |
| 22 | `num_surbs` | Number of SURBs in this fragment |
| 23..2047 | `payload` | Message data and SURBs |
// TODO explicitely state if surbs are before or after message.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraph below is explicit about this I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I did understand after reading the whole spec, (not the paragraph below but the later parts), having it stated before would have facilitate my reading, but really not needed indeed.

// TODO should common parameter like this be part of the runtime api?
- Multiple fragments with the same `message_id` and `fragment_index` may be received. It is
recommended that only one be kept.
// TODO maybe also acceptable to drop all if they differs in content.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, though I'm not sure if it's worth adding the code to detect this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having it in spec is free (just mentioning it is ok, even if not implemented).

- Multiple fragments with the same `message_id` and `fragment_index` may be received. It is
recommended that only one be kept.
// TODO maybe also acceptable to drop all if they differs in content.
// Explicitelly this means that a packet can be send twice? Or we store only per message_id (in this
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, packets may be sent multiple times (although as noted never with the same SURBs). See the stuff about retransmission in requests-and-replies.md. The message ID is reused when retransmitting to increase the chance that we get a complete message (if we were only missing a single fragment the first time, we only care about this fragment making it the second time).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think it should be stated somewhere explicitly (I mean it is in the spec, but could be good to get the info without reading the whole spec, like at the start of some part).
Sending packet multiple time is a choice of the user, but probably in some circumstances he do not want to do so (this leaks info).

to the [mixnet topology](./topology.md). If no external addresses have been published for a
mixnode, or none of them work, nodes should attempt to discover addresses using the libp2p DHT.

// Here I am not sure, do we have the Dht? or if we are talking about discovery dht, we should name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just talking about the libp2p peer ID --> address DHT. At least I thought that was a thing, and that Substrate would use it if you just asked to connect to a peer ID without providing any addresses? Maybe I'm wrong about this though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just talking about the libp2p peer ID --> address DHT

that is the one I am calling the discovery dht, I think it is a substrate specific one.

## MAC verification

`mac` should equal the BLAKE2b hash of `actions` computed with the key `mac_key`.
// TODO is it the first 16 byte?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean sorry? mac and actions are as defined in the table at the top, mac_key is derived as in the "Secret derivation" section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mac is only 16 byte long, but blake2b output 32bytes.

Copy link
Owner

@zdave-parity zdave-parity Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLAKE2b actually outputs 64 bytes. The MAC key (16 bytes), actions encryption key (32 bytes), and delay seed (16 bytes) are generated by a single hash:

mac_key . actions_encryption_key . delay_seed =
    blake2b("sphinx-small-d-s", 0, kx_shared_secret)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make sense to indicate it is only the first 16 bytes just to lift any misunderstanding.


| Action | Description | Additional data |
|----------|--------------------------------------------|------------------------------|
// TODO a bit unclear why we would have a Mac when there is on define outside in packet structure.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah. This is the MAC for the next hop, so maybe I could say "MAC for next hop" or something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would find "next hop mac" good.


The artificial delay should be calculated as `exp_random(delay_seed) * mean_delay`, where
`mean_delay` is the mean forwarding delay (see the [parameters chapter](./parameters.md)).
// TODO state if something prevent not following this rule.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean -- it should always be possible to do this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually "should" means that nothing enforce this.
maybe we can use uper case SHOULD that indicate expected behavior that implementors should follow but that nothing enforce.
oposed to a MUST that should make the node appears like a bad actor.

Copy link
Owner

@zdave-parity zdave-parity Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used "may" for optional behaviour. I'm not particularly convinced that distinguishing between must and should would be that useful. At some point there will hopefully be an incentivisation scheme, at which point such a distinction might make sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting all MAY SHOULD MUST uppercase could make change of behavior (eg when incentivisation required or defined) easier and it is pretty common in specs I think.

- From the node to itself. For loop cover traffic.

// TODO here make it explicit if "non-mixnode" are "node".
// if so also it implies there is no mixnode to mixnode or cover between mixnode (this should maybe
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is implied? Mixnodes should generate cover traffic in pretty much the same way as non-mixnodes, they just don't need to worry about the gateway stuff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in a previous reply, I really think that non mixnode cannot generate cover traffic as mixnode (exept if their number is bounded, but then we need to define how their priority queue works).

a non-mixnode is generating a loop route, if there is only a single connected gateway mixnode,
the gateway must necessarily appear twice.
// TODO in this case I would find it better to return an error on route generation and put back
// the mixnet in a synching state.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think reusing a gateway node was a particularly big deal. Allowing it means that non-mixnodes only need to successfully connect to a single mixnode for things to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if there is not enough mixnode to do a N hop route, then the N hop security asumption does not hold, it become N-(number of more than one peer access) security, but when sending the query we only did accept sending for a N security, it would need a message like "not enough node in network, do you still want to send your message Y/N"?

I mean my issue here is "unless this is unavoidable" part of the sentence.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime enforces a minimum bound on the number of mixnodes; if there are too few registered, the get-mixnodes function will simply return an error. There are only really two cases where you should end up with a route containing the same mixnode twice:

  • If you are a mixnode and you're generating a loop route (obviously you will be in the route twice).
  • If you are not a mixnode, you're generating a loop route, and there is only one connected gateway mixnode (this is the example given, the gateway mixnode needs to appear twice).

I figured the simplest way to word this was "unless this is unavoidable". But maybe this is misleading.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally misunderstood the sentence here, I like the detail explanation of case better.
"except for cover loop where first mixnet node is the same as last mixnet node"
(when reading unavoidable I did think of a contextual temporary reason)

If you are not a mixnode, you're generating a loop route, and there is only one connected gateway mixnode (this is the example given, the gateway mixnode needs to appear twice).

When reading I was thinking of route as either the query route or the surb route, so this case is not that relevant to me.
Actually do we consider that surb should not use mixnode from the query route (I think not)?

// the mixnet in a synching state.
// and state that this rule cannot be check by relaying node (only when creating route).
- No node should ever appear twice consecutively in a route.
// TODO maybe state that this rule can be enforced by relaying node by simply dropping the packet
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules are really just to avoid poor anonymity of particular packets due to poor route choice. I don't think there's any need for mixnodes to attempt to enforce them; if a node wants to compromise the anonymity of the packets it sends then there are many ways for it to do so.

FWIW this rule is actually almost redundant given the "no mixnode should appear in a route more than once" rule.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this rule is actually almost redundant given the "no mixnode should appear in a route more than once" rule.

it is (but the "unless it is unavoidable" makes things awkward i think)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants