Skip to content

Conversation

@DavidM-D
Copy link
Contributor

@DavidM-D DavidM-D commented Dec 6, 2023

@DavidM-D DavidM-D requested a review from a team as a code owner December 6, 2023 08:02
@render
Copy link

render bot commented Dec 6, 2023

@DavidM-D DavidM-D changed the title Stateless Transactions NEP-522: Stateless Transactions Dec 6, 2023
@DavidM-D DavidM-D added A-NEP A NEAR Enhancement Proposal (NEP). WG-tools Tools Working Group should be accountable. WG-protocol Protocol Standards Work Group should be accountable S-draft/needs-implementation A NEP in the DRAFT stage that needs implementation. S-review/needs-wg-to-assign-sme A NEP that needs working group to assign two SMEs. labels Dec 7, 2023
@walnut-the-cat
Copy link
Contributor

@DavidM-D , is this NEP ready to be reviewed?

Copy link
Contributor

@ewiner ewiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these problems are solvable with sufficiently smart clients, enough (100s-1,000s) of keys per account and rock solid infrastructure, but we haven't managed that so far and it's probably a lot easier to just simplify how we call the network.

But it's also a thoroughly solved problem in the blockchain world at this point. That doesn't mean we shouldn't make it easier for users, but it raises the bar to do so.

However, IMO this proposal is more compelling for DelegateAction than for Transaction. Nonce management is all about having a single party controlling each nonce, but once you're using a metatransaction, there's already two parties juggling a single nonce (the original user and the relayer).


Far more perniciously, since no client I'm aware of verifies the RPC response using a light client before retrying, RPC nodes can prompt the client to provide them with transactions with different nonces. This allows rogue RPC nodes to perform replay attacks.

We have found that often users are managing a single key using multiple clients or are running multiple servers controlling the same key. As such clients fetch a nonce from an indexer every time they need to make a transaction. Indexers tend to fall behind or fail under load and this causes our wallets, relayers and other services to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They fetch a nonce from an indexer? Why is that? Fetching nonces from RPC (view_access_key) should be less likely to fall behind. That wouldn't prevent the nonce collision issues you're talking about, but should help for when a key is rarely used but the network is under high load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, @MaximusHaximus, why don't we fetch directly from an RPC node or do we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidM-D Actually, we do use RPC calls to fetch access key details (including the nonce) - here's where it happens inside of near-api-js just FYI: https://github.com/near/near-api-js/blob/master/packages/accounts/src/account.ts#L590-L598

We could have the expiry time measured in seconds since Unix epoch and have it be a u32. This saves some space, but it's [inconsistent with our contract standards](https://github.com/near/NEPs/blob/random-nonces/neps/nep-0393.md) and [2038](https://en.wikipedia.org/wiki/Year_2038_problem) isn't that far away in the great span of things.

Why not do this without the expiry time?
We could but it would mean that there'd still be a dependency on an indexer. The current block height is somewhat predictable, but it's not like you can set your watch to it especially under load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my last question, why use an indexer for this instead of a status RPC call?


We also add an optional `expires_at` field to the `Transaction` and `DelegateAction` messages. The protocol guarantees that this message will not be run in any block with a block timestamp later than the one contained in `expires_at`.

`expires_at` and `random_nonce` exist as alternatives to `max_block_height` and `nonce` respectively. They will cause this field to be ignored and the mechanisms connected to them to not be used. We put limits on the validity period of messages using `random_nonce` to avoid a large cost of checking their uniqueness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does max_block_height already exist on Transaction? It's not in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to and it's mandatory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean for Transaction, not DelegateAction. For Transaction, it appears that expires_at would be the only expiration option, not an alternative to max_block_height.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late here but in case this is still relevant...

Today, transactions don't have an explicit max_block_height, as @ewiner correctly observed. Instead, they expire at the height of the given block_hash + transaction_validity_period. In mainnet, this is currently at 86400 blocks, which is 1 day if block time is 1s.

In theory, this means by picking a block hash at a specific height, one can define an expiration block height. But at the same time, the block hash is used to pick the canonical hash in case of forks. So I would argue, you really want to use the most recent block_hash you know, to ensure your changes build on top of the blockchain state that is known to you. But then you always get the fixed validity of 86400 blocks.

So yes, in my understanding, the proposal is the first proper way to limit a transaction validity period by the user.

Here is the check in the code:

https://github.com/near/nearcore/blob/7bad46aa5a9fe12b6d2b6e28800c7623365dd92f/chain/chain/src/store.rs#L707-L711


There is no requirement that the `random_nonce` field is unique or random. A client could (inadvisably) decide to always provide a `random_nonce` with a value of 0 and it would work as expected until they tried to send two completely identical transactions.

When `random_nonce` is present the protocol **may** reject transactions with an `expires_at` more than 120 seconds after the most recent block timestamp or a `max_block_height` more than 100 blocks after the most recent block height and the error `ExpiryTooLate` will be thrown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-delegate transactions, I believe this also implies that if you specify random_nonce, you must also specify either expires_at or max_block_height.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct I will make that clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "may reject" is too weak due to the cost to all nodes if a long-lived transaction is accepted. Too long-lived transactions should be a chunk validity condition.


### Solution

Our new solution is simple from a client side. Generate a random nonce, set an `expires_at` 100 seconds in the future and send it to the network. If it fails spuriously retry with the same nonce, if it fails consistently prompt the user to make a decision on whether to resend the transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many systems that use a random nonce recommend using a timestamp, or a timestamp concatenated with a random number, like the first 64 bits of a UUIDv7. Would you recommend or discourage that approach, and why? I think that's worth calling out here either way.

Copy link
Contributor Author

@DavidM-D DavidM-D Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend a random number and it doesn't need to be cryptographically secure.

random_nonce is only used to disambiguate identical short lived transactions with identical expires_at/max_block_height. Therefore the transactions requiring disambiguation will be created at the roughly the same time making the timestamp a poor additional source of entropy to the expiry.

We don't require a cryptographically secure number as we don't require non predictability, the client could reasonably increment a single value per transaction and it would remain secure.


If a client attempts to send an identical transaction with an identical `random_nonce` we preserve the existing behavior for responding to an already sent transaction. Provided it still exists they will receive the previous transactions receipt and no action will be taken on chain.

There is no requirement that the `random_nonce` field is unique or random. A client could (inadvisably) decide to always provide a `random_nonce` with a value of 0 and it would work as expected until they tried to send two completely identical transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to make sure this doesn't conflict with plans around transaction priority / gas wars. On many other chains at least, there's an explicit feature to allow the sender to overwrite a pending transaction at a given key+nonce with a new one, either to attempt to proactively 'cancel' a transaction that isn't likely to be included in a block or to increase it's priority so it'll get included faster.

That first use case isn't as important here, because a pending transaction with a given random_nonce isn't preventing other transactions with different random_nonces like it would for the incrementing nonces. But transaction priority features might require that we change this part of the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't prevent any of these mechanisms when not using a random_nonce.

Also if we want people to be able to cancel/re-submit outstanding transactions, wouldn't it be better to do it for individual transactions rather than all outstanding transactions using nonce contention? It lead to a nicer API and fewer re-submissions during peak load.


Great care must be taken to ensure that the `CryptoHash` of any message that has been executed must exist anywhere that message may be valid and could be executed. If this is not the case then it's possible to launch replay attacks on transactions.

Since these transactions store the `CryptoHash` in the working trie for the duration of their validity, the validity period of these transactions must always be small enough to prevent slow lookups or excessive space use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this trie is unbounded. Even though no single entry will last in the trie for more than 2 minutes or so, I'm worried about the attack surface here.

An obvious possible solution is to limit based on the number of working trie entries for a given key.

I don't know much about the current mechanics of NEAR under high load – for normal incrementing nonces, is there a point where the network will reject transactions because there's too many pending for that key? If so, we could use the same limit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bounded by the number of transactions that can be executed by a shard per expiry window (~100 seconds). Assuming a shard can one day do 10,000 transactions a second (I've only observed up to ~300) that'd be 32MBs in the trie.

I don't see much point in having a per key limit since an attacker could easily create an account with 100,000 keys. It would also require us to store the keys that each cryptohash belongs to which would double the amount of data stored in the trie under certain circumstances.

@bowenwang1996
Copy link
Collaborator

During the protocol working group meeting today, @mm-near came up with an idea to address the nonce issue for parallel transaction submission without modifying the state:

  • Transaction nonces are a 32-byte hash of block height concatenated with a salt. For each height, there are 2048 predefined valid salts (could be 1,2,...,2048).
  • Transactions are valid for a short period of time (100 blocks for example)
  • Validators precompute the hashes for each height and store them in memory. Assuming transactions are valid for 100 blocks, validators need to keep 204,800 hashes for each shard in memory, which is roughly 6.5MB
  • When validators process a chunk, they check whether each transaction is valid by checking whether its nonce belongs to the list of valid nonces.
  • Forks do not matter since the hash computation is based on block height.

This scheme does not require changing anything in the state and is therefore simpler to implement. However, it is not compatible with the existing design. To unify both mechanisms, we can make transaction nonce 256 bit and if the highest bit is set to 0 then the current mechanism is used (i.e, it is treated as a 64 bit number). Otherwise the new mechanism is used.

@ewiner
Copy link
Contributor

ewiner commented Jan 8, 2024

I'm sorry, I don't fully understand your proposal yet or the benefits of it.

Is this correct?

  1. The transaction author picks two numbers: one between currentBlockHeight and currentBlockHeight-99, and another between 1 and 2048.
  2. The nonce is determined by some hash function with those two numbers as inputs.
  3. Validators keep track of those 204,800 total allowed hashes and ensure that the given nonce is in the list.

How does this protect against transaction replay?

Also, for offline signers there's a delay between when a transaction is signed and when it can be broadcast. If I understand this proposal correctly, someone using this new nonce scheme would have to anticipate and estimate the block height for a short window when they broadcast the transaction.
There's already a 24-hour time limit between tx creation and broadcast, enforced by the blockHash parameter. I'd be worried about restricting that even further.

@DavidM-D
Copy link
Contributor Author

DavidM-D commented Jan 8, 2024

2048 possible salts per block seem too small. If you're selecting those salts randomly and you want to have a >99% chance of not having a collision you can only submit 6 transactions per block.

What is the hashing being used for? The small set of possible inputs make it both trivial to reveal the input (the validators do it every block) and larger than the data it's hashing.

Copy link
Contributor

@MaximusHaximus MaximusHaximus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like to put my voice behind this NEP also -- nonce behaviour has been a regular frustration point in cases where it is desirable to use a single account to execute many transactions concurrently, which is a very common pattern.

A real-world example of how this impacts throughput is how we provide new account creation on the testnet chain. This functionality is provided by a helper service which used to hold a single access key for the .testnet account. I discovered that even with eager nonce incrementation, the highest throughput that I could accomplish with a single key was only ~6 new account transactions per second. To work around this and provide a higher throughput for testnet new account creation, I added 50 keys to the .testnet top-level account, and made the service use those keys in a round-robin fashion, which is what is currently live on testnet -- 300 new accounts per second is the effective limit (6 per key * 50 keys) of new account creation on testnet today.

While it is true that we can scale this higher by adding more access keys to the account and spreading the transactions across more keys, it's a 'gotcha' that catches many developers and not a great development experience or efficient use of development time. Putting hundreds of keys on a single account to scale up throughput makes it harder to manage things like key rotation and makes secret management frustrating.

@DavidM-D DavidM-D changed the title NEP-522: Stateless Transactions NEP-522: Optional Transaction Ordering Jan 9, 2024
@bowenwang1996
Copy link
Collaborator

As @ewiner pointed out, only allowing 2048 salts per block is too small when multiple users want to submit transactions at the same time. While we can increase that number, it may create other issues such as the hashes that validators need to compute. Instead, we can do the following:

  • Transaction nonce remains a u64.
  • Nonces are only valid for a small number of blocks N (100 for example). More specifically, when a transaction is submitted, it is invalid if the difference in height between base_block_hash and the tip of the chain is larger than N.
  • Validators locally keep track of all the nonces used in the past N blocks and keep updating them. This can be done by storing a hashset of nonces and separately a deque of nonces. Assuming that each block has 100,000 transactions, the total amount of memory usage is roughly 1,600,000 * N. If N is 100, this is roughly 160MB, which is reasonable. When a node restarts, it needs to read the past N blocks to build the in-memory indices. Forks are also fine since all operations here are in memory.
  • When validators receive a transaction of this form, after checking that its base block is within the last N blocks, they then check that the nonce is unique, i.e, not in the set of nonces already used. Note that it is okay for nonces to repeat after N blocks because it is guaranteed that the base block hash will not be the same and therefore the transactions cannot be the same.
  • For a user to send a transaction, it is sufficient to generate a random nonce. The probability of collision given N*100,000 transactions is extremely low when N is small (100 for example). The requirement that the base block hash has to be quite recent is not really an additional burden since client side APIs (near-api-js for example) already queries the chain for the latest block when creating a transaction.

This mechanism, however, is again not compatible with the current format. One idea to unify them is to check the highest bit of the nonce and only use the new mechanism is the highest bit is set to 1. However, due to a protocol feature introduced long time ago to prevent transaction duplication, there are already access keys with nonces larger than 2^32. As a result, we may need to introduce a new version of Transaction or play with serialization to allow this new type of nonce.

@DavidM-D
Copy link
Contributor Author

DavidM-D commented Jan 9, 2024

So this is relatively similar to the current proposal with some differences.

Rather than storing the nonces we store hashes of the transactions which contain the nonces. This prevents other entities from frontrunning a transaction, copying the nonce and then preventing the other transaction from being accepted. It also allows us to have a smaller space of nonces since we need unique nonces for otherwise identical transactions, in a pinch it could perhaps be as low as 2^16.

We also use an expiry rather than a block hash and check that isn't too far in the future. This requires a short validity period of a transaction but doesn't require that the signature occurred during that validity period, unlike requiring a block hash. That's somewhat useful for some of the colder wallet setups although a little on the margins.

Our implementation uses a Trie of Expiry => Hash which makes removal simple, but I'm sure a deque works just fine too.

I'm glad to see that it appears we're converging to some extent!

@birchmd
Copy link
Contributor

birchmd commented Jan 9, 2024

Thanks for the active work on this @DavidM-D ! At Aurora we also feel the troubles of nonce contention and a protocol change to address this is a good idea!

Some comments:

  • To prevent replay, a transaction hash must be remembered until its expiry time elapses. Therefore, I think the protocol should enforce a maximum expiry time. In addition to the may reject rule you have in the proposal, I think there should be a new block validity rule that says for all transactions included in a chunk, their expiry time must be in the range [chunk.timestamp, chunk.timestamp + 100s) (the upper bound would be a new protocol-level parameter). This prevents lax nodes that accept all transactions with all expiries from putting a long-lived transaction into a chunk that all nodes tracking the shard then need to remember.
  • Separately, I like @bowenwang1996 's suggestion of using the already present block_hash field to give an implicit expiry (by enforcing a fixed maximum number of blocks between the specified hash and transaction inclusion). I think this is nice because it opens an alternative to the trie for tracking which transaction hashes have already been included. For each block you create a Bloom filter containing the included transactions and then store those in a deque. This is a fixed-sized data structure regardless of the number of transactions, and it is smaller than the worst case size of other storage options (e.g. using 256 kiB Bloom filters and 128 block expiry is only 32 MiB). During times of high transaction volume there will be an increased chance of a false positive rejection, but with 256 kiB filters the probability of rejection is less than 50% even for 1 million transactions.

@DavidM-D
Copy link
Contributor Author

@bowenwang1996 it seems like we've converged on a design and there appears to be a need. How do we move this into review?

@walnut-the-cat
Copy link
Contributor

walnut-the-cat commented Jan 24, 2024

Thank you @DavidM-D for submitting this NEP.

As a moderator, I reviewed the NEP meets the proposed template guidelines, so I am moving this NEP to the REVIEW stage. Having said that, I believe we need to make sure the NEP reflects all the discussions happened among stakeholders so if you can make appropriate changes, that will be great.

Meanwhile, I would like to ask the @near/wg-protocol working group members to assign 2 Technical Reviewers to complete a technical review (see expectations below).

Just for clarity, Technical Reviewers play a crucial role in scaling NEAR ecosystem as they provide their in-depth expertise in the niche topic while work group members can stay on guard of the NEAR ecosystem. The discussions may get too deep and it would be inefficient for each WG member to dive into every single comment, so NEAR Developer Governance designed this process that includes subject matter experts helping us to scale by writing a summary with the raised concerns and how they were addressed.

Technical Review Guidelines
  • First, review the proposal within one week. If you have any suggestions that could be fixed, leave them as comments to the author. It may take a couple of iterations to resolve any open comments.

  • Second, once all the suggestions are addressed, produce a Technical Summary, which helps the working group members make a weighted decision faster. Without the summary, the working group will have to read the whole discussion and potentially miss some details.

Technical Summary guidelines:

  • A recommendation for the working group if the NEP is ready for voting (it could be approving or rejecting recommendation). Please note that this is the reviewer's personal recommendation.

  • A summary of benefits that surfaced in previous discussions. This should include a concise list of all the benefits that others raised, not just the ones that the reviewer personally agrees with.

  • A summary of concerns or blockers, along with their current status and resolution. Again, this should reflect the collective view of all commenters, not just the reviewer's perspective.

Here is a nice example and a template for your convenience:


### Recommendation

Add recommendation

### Benefits

* Benefit

* Benefit

### Concerns

| # | Concern | Resolution | Status |

| - | - | - | - |

| 1 | Concern | Resolution | Status |

| 2 | Concern | Resolution | Status |

Please tag the @near/nep-moderators once you are done, so we can move this NEP to the voting stage. Thanks again.

@DavidM-D
Copy link
Contributor Author

DavidM-D commented Feb 7, 2024

@walnut-the-cat anything I need to do to keep this moving?

@walnut-the-cat
Copy link
Contributor

walnut-the-cat commented Feb 21, 2024

Hey @DavidM-D , as long as you made sure points discussed as comments are reflected (as needed) in the latest NEP, no. Once @near/wg-protocol nominates SMEs for the NEP review, we will move forward with review process.

@marcelo-gonzalez
Copy link
Contributor

this looks like a cool proposal! I have had some annoying experiences with nonces in the past so would def welcome something like this. One thing I'm wondering about, is what kind of assumptions we would have to make about the guarantees you get from expires_at field. As far as I know, in practice block timestamps do look roughly like unix timestamps separated by about one second or so, but that isn't enforced by the protocol. If you make the below change to a validator, its blocks will continue to be accepted as valid:

diff --git a/core/primitives/src/block.rs b/core/primitives/src/block.rs
index 49f1c9503..fb8b01125 100644
--- a/core/primitives/src/block.rs
+++ b/core/primitives/src/block.rs
@@ -12,9 +12,7 @@ use crate::sharding::{
     ChunkHashHeight, EncodedShardChunk, ReedSolomonWrapper, ShardChunk, ShardChunkHeader,
     ShardChunkHeaderV1,
 };
-use crate::static_clock::StaticClock;
 use crate::types::{Balance, BlockHeight, EpochId, Gas, NumBlocks, StateRoot};
-use crate::utils::to_timestamp;
 use crate::validator_signer::{EmptyValidatorSigner, ValidatorSigner};
 use crate::version::{ProtocolVersion, SHARD_CHUNK_HEADER_UPGRADE_VERSION};
 use borsh::{BorshDeserialize, BorshSerialize};
@@ -240,7 +238,7 @@ impl Block {
         signer: &dyn ValidatorSigner,
         next_bp_hash: CryptoHash,
         block_merkle_root: CryptoHash,
-        timestamp_override: Option<DateTime<chrono::Utc>>,
+        _timestamp_override: Option<DateTime<chrono::Utc>>,
     ) -> Self {
         // Collect aggregate of validators and gas usage/limits from chunks.
         let mut prev_validator_proposals = vec![];
@@ -270,9 +268,7 @@ impl Block {
         );
 
         let new_total_supply = prev.total_supply() + minted_amount.unwrap_or(0) - balance_burnt;
-        let now = to_timestamp(timestamp_override.unwrap_or_else(StaticClock::utc));
-        let time = if now <= prev.raw_timestamp() { prev.raw_timestamp() + 1 } else { now };
-
+        let time = prev.raw_timestamp()+1;
         let (vrf_value, vrf_proof) = signer.compute_vrf_with_proof(prev.random_value().as_ref());
         let random_value = hash(vrf_value.0.as_ref());

so here we are just incrementing the unix timestamp by one nanosecond every time instead of taking a "real" timestamp. So that would imply that the expires_at field isn't guaranteed by the protocol to match up to anything like what you would expect. But maybe it's okay since in practice validators aren't going to do that and nothing seriously bad will happen if they do?

@bowenwang1996
Copy link
Collaborator

As a working group member, I'd like to nominate @jakmeier and @birchmd as subject matter experts to review this NEP.

@bowenwang1996
Copy link
Collaborator

@birchmd do I understand correctly that in #522 (comment), there can be at most W transactions in flight at the same time?

@ilblackdragon
Copy link
Member

ilblackdragon commented Mar 31, 2024

There are two sets of problems that users have as far as I see:

  1. Submitting batch of transactions with strict ordering. Currently solved by waiting for a transaction to get fully submitted before sending the second transaction.
  2. Submitting a set of transactions without ordering as fast as possible. Currently solved by using separate keys but limited in usability in wallet use cases which don't have a lot of keys.

For the (1) wallets already send one transaction at time. Submitting a batch of transactions at the same time will not work in many cases (e.g. ft_transfer_call + unwrap gets submitted at the same time - there is nothing to unwrap yet).

For this to work effectively, there needs to be a new approach to attaching more payload after one transaction executed to kick off another one even if method doesn't support call forward.

Given above, why would we not switch all access keys to use the "nonce set" approach described in #522 (comment)? If there is a need to submit transactions in order, one can use + window_size to previous nonce to always require window to move.

We also need a way to tell wallet if transactions need to be sent in order or as a set.
Probably some format of [(tx1, tx2), (tx3, tx4)] is required where (tx1, tx2) sent in parallel and two groups are sent sequentially.

@nujabes403
Copy link
Contributor

nujabes403 commented Apr 1, 2024

We also need a way to tell wallet if transactions need to be sent in order or as a set.
Probably some format of [(tx1, tx2), (tx3, tx4)] is required where (tx1, tx2) sent in parallel and two groups are sent sequentially.

Yes, it would be good to have developers choose the transaction ordering.
So if the order of transactions from the dapp is not critical, they can improve UX much better than now

@birchmd
Copy link
Contributor

birchmd commented Apr 1, 2024

@birchmd do I understand correctly that in #522 (comment), there can be at most W transactions in flight at the same time?

@bowenwang1996 , yes that is correct.

@birchmd
Copy link
Contributor

birchmd commented Apr 1, 2024

why would we not switch all access keys to use the "nonce set" approach

It would be a large state migration because all access keys would take up 256 bits more space (if we choose a window size of 256). This might be ok, but generally the core team tends to shy away from these kinds of state migrations because they are risky.

@victorchimakanu
Copy link

victorchimakanu commented May 7, 2024

NEP Status (Updated by NEP Moderators)

Status: Not eligible for Voting

SME reviews:

Protocol Work Group voting indications (❔ | 👍 | 👎 ):

Based on the reviews above, This proposal did not pass the SME reviews, therefore it cannot enter the final voting stage.

@DavidM-D
Copy link
Contributor Author

DavidM-D commented May 8, 2024

@birchmd that's an interesting proposal, but it seems like there'd be the same number of IO ops (1 per transaction), more storage use I think (32 bytes per key, ~2 million key per shard vs 8 bytes per transaction in the last 100 seconds, ~33,000 txs per shard) and a slightly more finicky interface (clients remembering what nonce's they've used).

Is the main advantage that it's easier to implement?

@birchmd
Copy link
Contributor

birchmd commented May 15, 2024

Is the main advantage that it's easier to implement?

Yes, being easier to implement is a key advantage of the "nonce set" proposal. Specifically, since it addresses the problem at the access key level instead of the transaction level, there is no need to add new logic for tracking expiry. Notably, this means the cost of increased storage usage is already accounted for without any additional change.

the same number of IO ops (1 per transaction)

I think the nonce set proposal is better in terms of IO ops because it is bounded by the amount of gas that can be spent in a single chunk (since state changes happen eagerly when the transaction is converted to a receipt). For the random nonce proposal the bound is more complicated due to the future expiry. Suppose that for 100 blocks in a row a malicious user submits transactions with the same expires_at value; once that time passes all transactions from all of those blocks suddenly need to be removed from the trie at the same time. As Jakob pointed out above, needing to do many IO ops all at once has been a performance bottleneck in the past so this presents an issue. It's possible this could be avoided by lazily removing elements from the expiry mapping in a background task but that increases the complexity of the random nonce solution.

more storage

I don't think the total amount of storage used is a problem as long as it's properly costed and only small parts of the total storage need to be used at a single time. As I discussed above, the nonce set proposal fits into Near's current cost model and the amount needed at a given time is bounded.

slightly more finicky interface

I agree the nonce set interface is not as nice as the random nonce one, but I still think it is a big improvement over the current status quo.

@flmel flmel added S-retracted A NEP that was retracted by the author or had no activity for over two months. and removed S-draft/needs-implementation A NEP in the DRAFT stage that needs implementation. S-review/needs-wg-to-assign-sme A NEP that needs working group to assign two SMEs. labels Feb 20, 2025
@flmel
Copy link
Collaborator

flmel commented Feb 20, 2025

I will assume this NEP is inactive and label as retracted for now. If anyone picks the NEP up again, please ping NEP moderators to make it active.

@DavidM-D
Copy link
Contributor Author

RIP this NEP.

Here's some software written by Near One (the protocol team) which is double spending the signature response. Not a security vuln but certainly embarrassing.

If even the protocol team can't use their own nonces correctly (read securely), how can mere mortals be expected to!

@r-near
Copy link

r-near commented May 16, 2025

I like the sliding bit-window concept from @birchmd. I wanted to propose a tiny refinement that I think addresses some of the remaining state bloat/resilience issues...

Rather than introducing a brand-new "windowed" access key variant, we simply augment the existing AccessKey struct with two u64 fields:

 pub struct AccessKey {
     pub nonce: u64,               // unchanged "high-water" field
+    pub first_tracked_nonce: u64, // the window’s starting index
+    pub nonce_bitmap: u64,       // W=64 bitmask
     pub permission: AccessKeyPermission,
 }

Everything else (the bitset logic, window eviction at n >= start+W, order-agnostic acceptance) stays exactly the same. By folding the bitmap into AccessKey, we avoid a new key variant (so no extra ACL/tooling changes), get automatic replay across restarts and forks, zero manual GC, and just 16 bytes of extra storage per key.

@birchmd would you be open to reviving this NEP? Or @DavidM-D?

@birchmd
Copy link
Contributor

birchmd commented May 16, 2025

@r-near thanks for continuing the discussion here. Could you elaborate more on how access keys will work after adding the two new fields? Would all users automatically be migrated to unordered transaction acceptance?

I also worry about the migration process. 16 bytes per key is not a huge amount, but could still cause problems. For example, what if an account has close to all its $NEAR locked in storage staking such that adding 16 more bytes of storage usage renders the account completely locked?

@garikbesson
Copy link
Contributor

The proposal has already been retracted here. And since there has been no activity for four months, I'm closing the PR as well.

Please feel free to open it again if someone would like to continue working on it.

@github-project-automation github-project-automation bot moved this from NEW❗ to Shipped 🚀 in DevRel Sep 22, 2025
@DavidM-D
Copy link
Contributor Author

To be clear I wasn't retracting it, I was lamenting it's rejection. I'm still very much in favour of this proposal.

@ewiner
Copy link
Contributor

ewiner commented Sep 22, 2025

I'll join you in the lament. There's a few ways to solve this problem, none of them are simple and easy, but it's still a huge scale-limiting problem. So many NEAR devs I've talked to have hit nonce ordering issues then had to build a workaround in their app, sometimes using >100 access keys where one would've been sufficient if we could fix nonce issues.

@gagdiez
Copy link
Contributor

gagdiez commented Sep 23, 2025

following the comments from @DavidM-D and @ewiner I am reopening this NEP

@DavidM-D would you still be open to champion this NEP? I see that @r-near and @birchmd commented but the discussion was left there, otherwise I can help find someone to tag along and help

@gagdiez gagdiez reopened this Sep 23, 2025
@github-project-automation github-project-automation bot moved this from Shipped 🚀 to NEW❗ in DevRel Sep 23, 2025
@bucanero bucanero temporarily deployed to random-nonces - nomicon PR #522 September 23, 2025 14:25 — with Render Destroyed
@DavidM-D
Copy link
Contributor Author

I'm happy to keep pushing, but I don't have a crazy amount of time at the moment, so some help would be much appreciated!

@mfornet
Copy link
Member

mfornet commented Nov 8, 2025

The sliding bitset idea (#522 (comment)) is very elegant, but I don't think it solves the main problem this NEP is trying to address, which is the having ability to crafting and sending concurrent txs from multiple clients that uses the same key without coordination. The sliding bitset idea requires coordination for maximal efficiency. This is the same problem that has the idea proposed by @mm-near (#522 (comment)).

The original idea from this NEP using random nonce is the most effective at dealing with this problem, at the cost of using too much memory. I think the idea from @birchmd (#522 (comment)) of using bloom filters to store all txs would address this problem. Removals was a concern, but it can be handled via sliding bloom filters that get dropped from memory all together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-NEP A NEAR Enhancement Proposal (NEP). S-retracted A NEP that was retracted by the author or had no activity for over two months. WG-protocol Protocol Standards Work Group should be accountable WG-tools Tools Working Group should be accountable.

Projects

Status: NEW❗
Status: RETRACTED

Development

Successfully merging this pull request may close these issues.