Skip to content

Recover bids for unresolved auction on restart #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Tristan-Wilson
Copy link
Member

The Bid Validator sends validated bids to the Autonomous Auctioneer by means of Redis streams, using our pubsub package.

This commit changes the Auctioneer to only mark bid messages as fully consumed after the auction they were part of is resolved, or if the auction resolution window has closed and can no longer be resolved. This means if the Auctioneer is restarted mid-round before auction resolution, then it will fetch any bids from earlier in the round from redis.

The Bid Validator sends validated bids to the Autonomous Auctioneer by
means of Redis streams, using our pubsub package.

This commit changes the Auctioneer to only mark bid messages as fully
consumed after the auction they were part of is resolved, or if the
auction resolution window has closed and can no longer be resolved. This
means if the Auctioneer is restarted mid-round before auction
resolution, then it will fetch any bids from earlier in the round from
redis.
// Once resolveAuction returns, we acknowledge all bids to remove them from redis.
// We remove them unconditionally, since resolveAuction retries until the round ends,
// and there is no way to use them after the round ends.
defer a.acknowledgeAllBids(ctx, upcomingRound)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine if this function exits early because of an error and we still ack all bids?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is fine because resolveAuction is only called once per auction and not retried.

if uint64(bid.Round) <= round {
if err := a.consumer.SetResult(ctx, msgID, nil); err != nil {
log.Error("Error acknowledging bid after auction resolution", "msgID", msgID, "error", err)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we continue because of an error here, is it ok if we have an unacked bid that is not cleared in the map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it in the latest version.

@Tristan-Wilson
Copy link
Member Author

Tristan-Wilson commented Apr 10, 2025

I am reworking this because I had misunderstood how our pubsub abstraction worked.

Problem with my first attempt

(For tests the heartbeat is every 3ms, idle timeout is 30ms.)
msg.Ack() as soon as they are consumed -> this turns off the heartbeat
Messages are put into list to be SetResult-ed at end of round

The problem with this is that the messages will be continuously replayed because they will have timed out (no heartbeat), and the bid cache uses whatever message it got last. Even if there wasn't the following issue with reading the pending messages manually, the auction may end in the middle of replaying the messages and this isn't good.

The behavior of the consumer picking 1 out of 50 of the oldest messages randomly will break chronological ordering of bids. Bids need to be re-inserted into the bid cache in order since newer bids from the same account overwrite older bids.

Other problems

The production default autoclaim time is too short for the rebooted consumer case we're trying to handle -> if it's set to 5 minutes then messages will never be able to be reclaimed in a given 1 minute round

@ganeshvanahalli
Copy link
Contributor

I am reworking this because I had misunderstood how our pubsub abstraction worked.

Problem with my first attempt

(For tests the heartbeat is every 3ms, idle timeout is 30ms.) msg.Ack() as soon as they are consumed -> this turns off the heartbeat Messages are put into list to be SetResult-ed at end of round

The problem with this is that the messages will be continuously replayed because they will have timed out (no heartbeat), and the bid cache uses whatever message it got last. Even if there wasn't the following issue with reading the pending messages manually, the auction may end in the middle of replaying the messages and this isn't good.

The behavior of the consumer picking 1 out of 50 of the oldest messages randomly will break chronological ordering of bids. Bids need to be re-inserted into the bid cache in order since newer bids from the same account overwrite older bids.

For this, on the consumer side (i.e auctioneer), why dont we just Ack all-but-last bid from an address? Because anyway newer bids should replace the older ones so theres not point in not ack-ing old bids. This way even if the auctioneer (consumer) restarts- it will try to reclaim bids from the PEL that are already from unique addresses!

One way to do this is bidcache will store ack functions for last bids from a controller-address along with the validated bid and upon receiving another bid from the same address you can then call that ack function from the cache before calling bidcache.add(). This way acknowledgeAllBids function will just be iterating over bidcache and calling the ack functions after successful auction resolution or at round end.

validateBidTemporal is a great idea that and also probably highlights a possible bug where old bids could've been processed in the next round- though very very unlikely!

@Tristan-Wilson Tristan-Wilson marked this pull request as ready for review April 16, 2025 20:24
@Tristan-Wilson
Copy link
Member Author

For this, on the consumer side (i.e auctioneer), why dont we just Ack all-but-last bid from an address? Because anyway newer bids should replace the older ones so theres not point in not ack-ing old bids. This way even if the auctioneer (consumer) restarts- it will try to reclaim bids from the PEL that are already from unique addresses!

One way to do this is bidcache will store ack functions for last bids from a controller-address along with the validated bid and upon receiving another bid from the same address you can then call that ack function from the cache before calling bidcache.add(). This way acknowledgeAllBids function will just be iterating over bidcache and calling the ack functions after successful auction resolution or at round end.

It's a good idea but so far I'm unconvinced that it's worth the extra complexity, especially since we're limiting bids per account to 5.

Copy link
Contributor

@ganeshvanahalli ganeshvanahalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test! Requesting minor changes required for production

@@ -119,7 +131,7 @@ func (c *Consumer[Request, Response]) Consume(ctx context.Context) (*Message[Req
Group: c.redisGroup,
Start: "-",
End: "+",
Count: 50,
Count: c.claimAmongOldestIdleN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need IdletimeToAutoclaim to be much lower than default which is currently 5 minutes for the redis streams here! something around 3-5 seconds should be good enough so that when auctioneer restarts the old bids are idle enough to be picked up by the xpendingext scan and xautoclaimed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had even pointed this out in my comments but forgot about it. Will include it.

@@ -159,6 +159,185 @@ func TestBidValidatorAuctioneerRedisStream(t *testing.T) {
require.Equal(t, bobAddr, result.secondPlace.Bidder)
}

func TestAuctioneerRecoversBidsOnRestart(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if we could add more than one bids from each address before the restart in order to confirm the claimAmongOldestIdleN logic. It should work but better to be safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants