-
Notifications
You must be signed in to change notification settings - Fork 539
Recover bids for unresolved auction on restart #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Recover bids for unresolved auction on restart #3109
Conversation
The Bid Validator sends validated bids to the Autonomous Auctioneer by means of Redis streams, using our pubsub package. This commit changes the Auctioneer to only mark bid messages as fully consumed after the auction they were part of is resolved, or if the auction resolution window has closed and can no longer be resolved. This means if the Auctioneer is restarted mid-round before auction resolution, then it will fetch any bids from earlier in the round from redis.
// Once resolveAuction returns, we acknowledge all bids to remove them from redis. | ||
// We remove them unconditionally, since resolveAuction retries until the round ends, | ||
// and there is no way to use them after the round ends. | ||
defer a.acknowledgeAllBids(ctx, upcomingRound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine if this function exits early because of an error and we still ack all bids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is fine because resolveAuction is only called once per auction and not retried.
timeboost/auctioneer.go
Outdated
if uint64(bid.Round) <= round { | ||
if err := a.consumer.SetResult(ctx, msgID, nil); err != nil { | ||
log.Error("Error acknowledging bid after auction resolution", "msgID", msgID, "error", err) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we continue because of an error here, is it ok if we have an unacked bid that is not cleared in the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it in the latest version.
I am reworking this because I had misunderstood how our pubsub abstraction worked. Problem with my first attempt(For tests the heartbeat is every 3ms, idle timeout is 30ms.) The problem with this is that the messages will be continuously replayed because they will have timed out (no heartbeat), and the bid cache uses whatever message it got last. Even if there wasn't the following issue with reading the pending messages manually, the auction may end in the middle of replaying the messages and this isn't good. The behavior of the consumer picking 1 out of 50 of the oldest messages randomly will break chronological ordering of bids. Bids need to be re-inserted into the bid cache in order since newer bids from the same account overwrite older bids. Other problemsThe production default autoclaim time is too short for the rebooted consumer case we're trying to handle -> if it's set to 5 minutes then messages will never be able to be reclaimed in a given 1 minute round |
For this, on the consumer side (i.e auctioneer), why dont we just Ack One way to do this is bidcache will store ack functions for last bids from a controller-address along with the validated bid and upon receiving another bid from the same address you can then call that ack function from the cache before calling
|
It's a good idea but so far I'm unconvinced that it's worth the extra complexity, especially since we're limiting bids per account to 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test! Requesting minor changes required for production
@@ -119,7 +131,7 @@ func (c *Consumer[Request, Response]) Consume(ctx context.Context) (*Message[Req | |||
Group: c.redisGroup, | |||
Start: "-", | |||
End: "+", | |||
Count: 50, | |||
Count: c.claimAmongOldestIdleN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need IdletimeToAutoclaim
to be much lower than default which is currently 5 minutes for the redis streams here! something around 3-5 seconds should be good enough so that when auctioneer restarts the old bids are idle enough to be picked up by the xpendingext scan and xautoclaimed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had even pointed this out in my comments but forgot about it. Will include it.
@@ -159,6 +159,185 @@ func TestBidValidatorAuctioneerRedisStream(t *testing.T) { | |||
require.Equal(t, bobAddr, result.secondPlace.Bidder) | |||
} | |||
|
|||
func TestAuctioneerRecoversBidsOnRestart(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great if we could add more than one bids from each address before the restart in order to confirm the claimAmongOldestIdleN
logic. It should work but better to be safe
The Bid Validator sends validated bids to the Autonomous Auctioneer by means of Redis streams, using our pubsub package.
This commit changes the Auctioneer to only mark bid messages as fully consumed after the auction they were part of is resolved, or if the auction resolution window has closed and can no longer be resolved. This means if the Auctioneer is restarted mid-round before auction resolution, then it will fetch any bids from earlier in the round from redis.