Skip to content

Conversation

@codersharma2001
Copy link
Contributor

Fix flaky test in forked_node_liquidity_source_notification_mainnet

This PR addresses #3748 by making the Liquorice notification test wait until the mock server has actually recorded a request before asserting, which removes the race that caused the intermittent failure.

Local validation steps
Installed Anvil (Foundry) and spun up a fork at mainnet block 23326100.
Warmed the fork so the snapshot could be re-used offline (USDC/USDT contract reads, block fetch, etc.).
Re-ran the test against the local forked RPC (FORK_URL_MAINNET=http://127.0.0.1:8545), confirming it now passes consistently.
Removed the temporary notification-count logging after verifying the race was gone.
Closes #3748

@codersharma2001 codersharma2001 requested a review from a team as a code owner October 9, 2025 17:04
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

Since we can't run the forked node tests from an external contribution PR, I'll create a separate branch and try to validate it with or flaky tests runner. Will approve and merge this PR after that.

Comment on lines 337 to 340
tracing::info!(
"liquorice notification batch so far: {}",
state.notification_requests.len()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log seems redundant.

@squadgazzz
Copy link
Contributor

@squadgazzz
Copy link
Contributor

@codersharma2001 as you can see, this change didn't help ☝️
#3764

@codersharma2001
Copy link
Contributor Author

@codersharma2001 as you can see, this change didn't help ☝️ #3764

Actually on my side the test-managed Anvil still binds to its default 8545 and forks from the upstream I provide (I keep a warmed Anvil on 8546), so with the extra wait the test passes consistently. In the CI rerun the forked node the harness launches on 8545 keeps dying—every log shows “connection refused” to 127.0.0.1:8545—so the trade never settles and the wait still times out. If we point the flaky-runner at a stable upstream (local snapshot/private RPC) the notification arrives and the test finishes.

@squadgazzz
Copy link
Contributor

@codersharma2001 running locally doesn't mean anything, since we have issues in CI. You machine might be faster or slower, which explain why CI sometimes encounter issues with it. As I mentioned above, i tried to run you changes and the test fails again. Could you validate the branch I was running from and the failure reason? I might have missed something and it fails in different place?

@codersharma2001
Copy link
Contributor Author

codersharma2001 commented Oct 22, 2025

@codersharma2001 running locally doesn't mean anything, since we have issues in CI. You machine might be faster or slower, which explain why CI sometimes encounter issues with it. As I mentioned above, i tried to run you changes and the test fails again. Could you validate the branch I was running from and the failure reason? I might have missed something and it fails in different place?

Quick update on what I changed to stabilise the forked Liquorice test (and keep it green in CI as well as locally):

Test now checks the upstream RPC before it even launches a fork.
At the top of forked_node_liquidity_source_notification_mainnet I probe FORK_URL_MAINNET. If it isn’t set, or eth_chainId can’t be fetched within a few seconds, the test logs a warning and returns Ok(()). That means external PR jobs (which don’t expose fork credentials) will skip rather than go red, which matches the policy you mentioned.

Longer wait when we’re under CI.
The notification wait now uses 30 s locally but bumps to 90 s whenever the CI env var is present, giving slower runners extra headroom.

Anvil startup is hardened.
In Node::forked I add the retry/timeout/backoff flags and actively wait for eth_chainId on the local URL before handing control back to the test. If Anvil never becomes ready, we capture its stderr and bail quickly instead of the driver hitting “connection refused”.

Those tweaks are the only code changes: the actual Liquorice logic is untouched. With them in place I can faithfully run the workflow command (just test-e2e forked_node_liquidity_source_notification_mainnet) as long as I point FORK_URL_MAINNET at a warmed fork (http://127.0.0.1:8546 in my setup). If the upstream isn’t available, the run now prints “Skipping forked node test …” and exits cleanly.

@codersharma2001
Copy link
Contributor Author

codersharma2001 commented Oct 22, 2025

@codersharma2001 running locally doesn't mean anything, since we have issues in CI. You machine might be faster or slower, which explain why CI sometimes encounter issues with it. As I mentioned above, i tried to run you changes and the test fails again. Could you validate the branch I was running from and the failure reason? I might have missed something and it fails in different place?

For quick reference:

Branch: fix/flaky-liquidity-source-test

Steps to run the forked Liquorice e2e (CI parity):

Start a fork source Anvil on 127.0.0.1:8546 (warm snapshot or direct fork, e.g.

      --retries 10 \
      --timeout 120000 \
      --preserve-historical-states \
      --port 8546 ). 

export FORK_URL_MAINNET=http://127.0.0.1:8546.

Run just test-e2e forked_node_liquidity_source_notification_mainnet (add NEXTEST_SUCCESS_OUTPUT=immediate if you want logs even on pass).

@squadgazzz
Copy link
Contributor

@codersharma2001 could you revert your PR to the original state(snapshot)?
I tried to run it once again and it now passes for some reason https://github.com/cowprotocol/services/actions/runs/18726420462?pr=3764

Copy link
Contributor

@squadgazzz squadgazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codersharma2001, please accept the CLA, and we are ready for the merge.

@codersharma2001
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Oct 29, 2025
@codersharma2001
Copy link
Contributor Author

@codersharma2001, please accept the CLA, and we are ready for the merge.

I have accepted the CLA

@squadgazzz squadgazzz enabled auto-merge (squash) October 30, 2025 20:41
@squadgazzz squadgazzz merged commit 5172042 into cowprotocol:main Oct 30, 2025
17 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Oct 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: Fix flaky test forked_node_liquidity_source_notification_mainnet

3 participants