Skip to content

Dangling wantlists #1064

@LeeTeng2001

Description

@LeeTeng2001

We have a small cluster with approximately 30 machines, we notice bitswap stat reports non-zero wantlist length even though there is no task running. Something like this

Bitswap Stat
  WantlistLength: 31
  Peers:
  BlocksReceived: 8178
  DataReceived: 2088954888
  DupBlksReceived: 40
  DupDataReceived: 28360
  MessagesReceived: 24028
  BlocksSent: 0
  DataSent: 0

After running ipfs log to investigate, seems like there's a race condition between CANCEL_WANT_BLOCK & WANT_HAVE after receiving valid block from any peer. Here's a relevant log for cid bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke

{"level":"debug","ts":1761898062.2018778,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2019646,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2021472,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:735","msg":"sent message","type":"WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2020547,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.202884,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWJksuwoouCGnk21on4ffc13KrQSF92J1ruo37bCmxcS7c","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2080243,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCStHbtFFkWsZ6McYoeWJbeacT9thscVG2JAQe49sEqe4","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2114546,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118468,"logger":"bitswap/client","caller":"client/client.go:586","msg":"[recv] block; cid=bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke, peer=12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2118742,"logger":"bitswap/session","caller":"session/session.go:234","msg":"Bitswap <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118876,"logger":"bitswap/server/decision","caller":"decision/engine.go:936","msg":"Bitswap engine <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","size":262144}
{"level":"debug","ts":1761898062.2137315,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWE1SnSRTx2aQxp6Fmz3xAnEHBX7T3Bu9k3eem3xKyvkqT"}
{"level":"debug","ts":1761898062.2141423,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWEQNG2vZBxhJjtQt9eAmYqL2UhiRU5iCpU2W5CPvEm4ut"}
{"level":"debug","ts":1761898062.214188,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWFUrnAeYWHMtrYBPUq666dpgWa9KGXwYKwZmkqVMWvcSH"}
{"level":"debug","ts":1761898062.2145784,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJWeM6eA9hUtgw75aBmZKokRU6oinH5o2JPXZSkrAQuHi"}
{"level":"debug","ts":1761898062.2148564,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWP1af4GXUjTvFNGfXh8e1vyCLC4gJkoznkGGmmTeyxT5L"}
{"level":"debug","ts":1761898062.2149634,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWLtCDdQDMSqp5ggXd7DDdgCyV5Cuy6nQwSqn4S5k2ge6q"}
{"level":"debug","ts":1761898062.214888,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWRykdrWrTa5LayXuA4khk6EFbxPQGZyH7thh1XFoKnpJ1"}
{"level":"debug","ts":1761898062.215218,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWBGQWvUHrdh5BwAqbDE214toNf85mVhaPomyvgyLYeZPd"}
{"level":"debug","ts":1761898062.2172887,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWDtgqQnGyn8tK9nqUYVSm8EhV4AY1DUXpS6V9W3xnW4W4"}
{"level":"debug","ts":1761898062.217481,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJLKnTVEKYEEX6vksGwVMbcxZhAYwcNLSkVgXWJghxxCV"}
{"level":"debug","ts":1761898062.2184074,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWACHwpXfeUyrzUbFoHoma4TrkuiobYnRXCVeLshxaaZzq"}
{"level":"debug","ts":1761898062.2201893,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWN1XZcjdHPh3C7uWFa7cquB1KYMr8xZGa9E1z5X9bSPAP"}
{"level":"debug","ts":1761898062.220416,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWK7onE4x7b1pnfYfQFWeGPG141Zc45H43ZbsNpJdy5FBT"}
{"level":"debug","ts":1761898062.2210016,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3"}
{"level":"debug","ts":1761898062.2212029,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9"}
{"level":"debug","ts":1761898062.2212284,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3"}
{"level":"debug","ts":1761898062.2213898,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:728","msg":"sent message","type":"WANT_HAVE","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd"}

Notice how bitswap still send WANT_HAVE to one of the peers even though it already has a valid response.

From what I understand, there might be two goroutines running receive message loop and send loop, a race condition might happen in the following scenario:

  1. sendloop is looping through all of the session's peers to send WANT_HAVE
  2. receiveloop received a valid block, it loop through all session's peers, check if peer has already sent WANT_HAVE to send CANCEL_WANT_BLOCK, let's say an arbitrary peer P is in a state of hasn't send WANT_HAVE, so it'll be ignored.
  3. sendloop is currently looping peer P, it notice that it hasn't send WANT_HAVE to this peer so it'll send a request and record it in peermanager
  4. receiveloop exit, session will cleanup all resources & existing wantlist, when peer manager receive HAVE response from peer P, it cannot find interested session to process this event, thus this cid's wantlist will never be cleanup

I can help with code if anybody is kind enough to provide insight into the loop mechanism in ipfs.

Metadata

Metadata

Assignees

Labels

P2Medium: Good to have, but can wait until someone steps upneed/triageNeeds initial labeling and prioritization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions