-
Notifications
You must be signed in to change notification settings - Fork 146
Description
We have a small cluster with approximately 30 machines, we notice bitswap stat reports non-zero wantlist length even though there is no task running. Something like this
Bitswap Stat
WantlistLength: 31
Peers:
BlocksReceived: 8178
DataReceived: 2088954888
DupBlksReceived: 40
DupDataReceived: 28360
MessagesReceived: 24028
BlocksSent: 0
DataSent: 0
After running ipfs log to investigate, seems like there's a race condition between CANCEL_WANT_BLOCK & WANT_HAVE after receiving valid block from any peer. Here's a relevant log for cid bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke
{"level":"debug","ts":1761898062.2018778,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2019646,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2021472,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:735","msg":"sent message","type":"WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2020547,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.202884,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWJksuwoouCGnk21on4ffc13KrQSF92J1ruo37bCmxcS7c","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2080243,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCStHbtFFkWsZ6McYoeWJbeacT9thscVG2JAQe49sEqe4","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2114546,"logger":"bitswap/session","caller":"session/session.go:237","msg":"Bitswap <- HAVE","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118468,"logger":"bitswap/client","caller":"client/client.go:586","msg":"[recv] block; cid=bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke, peer=12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo"}
{"level":"debug","ts":1761898062.2118742,"logger":"bitswap/session","caller":"session/session.go:234","msg":"Bitswap <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","session":758}
{"level":"debug","ts":1761898062.2118876,"logger":"bitswap/server/decision","caller":"decision/engine.go:936","msg":"Bitswap engine <- block","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","from":"12D3KooWCqktkwMQ8tH3mkVsnt5SNTtH1GqK37N6mMSxX6NCQszo","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","size":262144}
{"level":"debug","ts":1761898062.2137315,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWE1SnSRTx2aQxp6Fmz3xAnEHBX7T3Bu9k3eem3xKyvkqT"}
{"level":"debug","ts":1761898062.2141423,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWEQNG2vZBxhJjtQt9eAmYqL2UhiRU5iCpU2W5CPvEm4ut"}
{"level":"debug","ts":1761898062.214188,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWFUrnAeYWHMtrYBPUq666dpgWa9KGXwYKwZmkqVMWvcSH"}
{"level":"debug","ts":1761898062.2145784,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJWeM6eA9hUtgw75aBmZKokRU6oinH5o2JPXZSkrAQuHi"}
{"level":"debug","ts":1761898062.2148564,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWP1af4GXUjTvFNGfXh8e1vyCLC4gJkoznkGGmmTeyxT5L"}
{"level":"debug","ts":1761898062.2149634,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWLtCDdQDMSqp5ggXd7DDdgCyV5Cuy6nQwSqn4S5k2ge6q"}
{"level":"debug","ts":1761898062.214888,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWRykdrWrTa5LayXuA4khk6EFbxPQGZyH7thh1XFoKnpJ1"}
{"level":"debug","ts":1761898062.215218,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWBGQWvUHrdh5BwAqbDE214toNf85mVhaPomyvgyLYeZPd"}
{"level":"debug","ts":1761898062.2172887,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWDtgqQnGyn8tK9nqUYVSm8EhV4AY1DUXpS6V9W3xnW4W4"}
{"level":"debug","ts":1761898062.217481,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWJLKnTVEKYEEX6vksGwVMbcxZhAYwcNLSkVgXWJghxxCV"}
{"level":"debug","ts":1761898062.2184074,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWACHwpXfeUyrzUbFoHoma4TrkuiobYnRXCVeLshxaaZzq"}
{"level":"debug","ts":1761898062.2201893,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWN1XZcjdHPh3C7uWFa7cquB1KYMr8xZGa9E1z5X9bSPAP"}
{"level":"debug","ts":1761898062.220416,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWK7onE4x7b1pnfYfQFWeGPG141Zc45H43ZbsNpJdy5FBT"}
{"level":"debug","ts":1761898062.2210016,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWQgvqtKy8g4kHcBz7ky7gcN6FDLUoHq6mPa9HYjYaVMc3"}
{"level":"debug","ts":1761898062.2212029,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWSnJZfW3Be9jxXSQdrNiZNHvBfDn4XVUaEKsvyo1G2qm9"}
{"level":"debug","ts":1761898062.2212284,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:719","msg":"sent message","type":"CANCEL_WANT_BLOCK","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWMsPSSxRjTyMuBk3FSfdjhitAvYCuFpM8ZMStLQCPZCy3"}
{"level":"debug","ts":1761898062.2213898,"logger":"bitswap/client/msgq","caller":"messagequeue/messagequeue.go:728","msg":"sent message","type":"WANT_HAVE","cid":"bafkreig4xh7u5iveim7sxhbngvr4daggxxtsdatl5wejk4rea6saeyx5ke","local":"12D3KooW9za2vSSD1M23FGoonoUigTXoAKCrpAvnU3oJMUPtUzhq","to":"12D3KooWPQDZ6FJNJiAdQizsoX6brop1HXifUoYnd6smPjnVBgqd"}Notice how bitswap still send WANT_HAVE to one of the peers even though it already has a valid response.
From what I understand, there might be two goroutines running receive message loop and send loop, a race condition might happen in the following scenario:
- sendloop is looping through all of the session's peers to send
WANT_HAVE - receiveloop received a valid block, it loop through all session's peers, check if peer has already sent
WANT_HAVEto sendCANCEL_WANT_BLOCK, let's say an arbitrary peer P is in a state of hasn't sendWANT_HAVE, so it'll be ignored. - sendloop is currently looping peer P, it notice that it hasn't send
WANT_HAVEto this peer so it'll send a request and record it in peermanager - receiveloop exit, session will cleanup all resources & existing wantlist, when peer manager receive
HAVEresponse from peer P, it cannot find interested session to process this event, thus this cid's wantlist will never be cleanup
I can help with code if anybody is kind enough to provide insight into the loop mechanism in ipfs.