Skip to content

tests: fix flaky test caused by rapid IO reading and writing #2221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

WilliamBergamin
Copy link
Contributor

@WilliamBergamin WilliamBergamin commented Apr 2, 2025

Summary

All operating systems mask the fact that storage devices are slow by caching reads and writes. When you write to a file, it doesn't immediately write to the actual storage medium; it'll capture it in a cache, tell your program that the write has completed, and go and write the contents to the storage device in the background instead.

It also seems like Node.js can close a file and then try to interact with it before all data has been written to disk. In the context of the test (should close all files after writing them), the FileStateStore is asked to perform a burst of I/O disk writing followed by a read. This leads me to believe that a race condition is created where Node.js tries to interact with a file before the OS has completed data persistence to storage.

This PR updates the behavior of the flaky unit test by

  1. reducing the number files written to disk
  2. waiting between the write operation and read operation

Code cov test results

Requirements (place an x in each [ ])

@WilliamBergamin WilliamBergamin added tests M-T: Testing work only pkg:oauth applies to `@slack/oauth` labels Apr 2, 2025
@WilliamBergamin WilliamBergamin self-assigned this Apr 2, 2025
@WilliamBergamin
Copy link
Contributor Author

  1. Attempt 1 🟢
  2. Attempt 2 🟢
  3. Attempt 3 🟢
  4. Attempt 4 🟢
  5. Attempt 5 🟢
  6. Attempt 6 🔴
  7. Attempt 7 🟢

This did not solve the flakiness but did improve it

@WilliamBergamin WilliamBergamin marked this pull request as ready for review April 2, 2025 21:31
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.65%. Comparing base (31c60f8) to head (0d83a32).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2221   +/-   ##
=======================================
  Coverage   92.65%   92.65%           
=======================================
  Files          38       38           
  Lines       10527    10527           
  Branches      677      677           
=======================================
  Hits         9754     9754           
  Misses        761      761           
  Partials       12       12           
Flag Coverage Δ
cli-hooks 95.23% <ø> (ø)
cli-test 94.76% <ø> (ø)
oauth 77.39% <ø> (ø)
socket-mode 61.82% <ø> (ø)
web-api 97.94% <ø> (ø)
webhook 96.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@zimeg zimeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WilliamBergamin This is a super interesting deep dive 🙏 ✨

I'm now curious about the fs operations happening since AFAICT these should by synchronous? Some behind the scenes magic might not be out of the question...

Approving this now since I think it's great continuations of the epic #2159 but I'm also wondering if you noticed certain failures during development? I left comments about logging that might be related!

Overall too I'm a fan of experimenting on these tests in main after a few tests like you've done. The current failure rate is 8.97% in the last 7 days and I'm hoping these changes decrease that 🏁

@@ -53,11 +53,15 @@ export class StateStoreChaiTestRunner {
it('should detect multiple consumption', async () => {
const { stateStore } = this;
const installUrlOptions = { scopes: ['channels:read'] };
for (let i = 0; i < 200; i++) {
for (let i = 0; i < 100; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is reducing the limit by 100 an arbitrary choice for testing or is there a reason behind it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:oauth applies to `@slack/oauth` tests M-T: Testing work only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants