fix: add events_checker() fixture, use for tests #1151

nascheme · 2026-01-06T02:08:33Z

This makes the unit tests more reliable by using a helper class to handle the checking of received events. On the MacOS platform, events are allowed to be received in different orders and additional (non-expected) events are also allowed. On other platforms (Linux, Windows, BSD), the checker confirms that exactly the events expected are received and confirms the ordering.

There is a new pytest fixture events_checker(). Typical usage is as follows:

with events_checker() as ec:
    ec.add(FileOpenedEvent, "a")
    ec.add(FileClosedNoWriteEvent, "a")

I did not change all unit tests to use this fixture, mostly just the ones in test_emitter. This change removes most of the pytest.mark.flaky() markers since the tests are more reliable.

BoboTiG · 2026-01-06T18:58:06Z

Hmm this seems like a critical change: events order is actually relevant to test. We need to be sure that the library will emit events in order, else we will have weird issues. And not testing that, we will move that implicit check to downstream projects: this is not OK.

I get it that tests are quite flaky, and this is a problem for the comminity.

The proposed change reduce the code to write, that's cool and more readable.

Honestly, I do not know what to do with that right now. Lets move forward on the rest of the stuff, and look back to this one later.

Maybe others have opinions too.

Anyway, thanks a lot the for effort you've put here. I am not throwing away everything but delaying for now.

ngoldbaum

It looks like flaky is only used in three places once this PR is applied:

goldbaum at Nathans-MBP in ~/Documents/watchdog on events-checker!
± rg flaky
requirements-tests.txt
2:flaky==3.8.1

tests/utils.py
94:        Provides some robustness for the otherwise flaky nature of asynchronous notifications.

tests/test_0_watchmedo.py
176:@pytest.mark.flaky(max_runs=5, min_passes=1)

tests/test_delayed_queue.py
10:@pytest.mark.flaky(max_runs=5, min_passes=1)
21:@pytest.mark.flaky(max_runs=5, min_passes=1)

Are those still needed?

It's also not clear to me when one would use the new EventChecker vs the old EventQueue. Should all the tests that still use EventQueue get migrated?

If not, maybe it's worth adding some text to docs/source/hacking.rst that explains when you'd want to use one or the other and also just generally explains how to work with the test suite.

Reading over the PR, it occurs to me that you could make save a bit of boilerplate by making the events_checker a context manager. Then there's no need to explicitly call check_events - exiting the context manager does that automatically.

I only got about halfway through test_mitter.py before running out of steam. If the missing checks weren't intentional it may be a good idea to go over test_emitter.py one more time to make sure all the old checks are still there.

ngoldbaum · 2026-01-06T18:54:29Z

tests/utils.py

+
+class _EventsChecker:
+    # If True, output verbose debugging to stderr.
+    DEBUG = False


it's a personal style thing but I'd tend to leave out disabled debug code once I have the code working. IMO code that is disabled statically and not tested is destined to inevitably bitrot.

I would remove it but when you have tests failing, it's extremely useful to have this debugging output to see what events are expected and what events are generated. Maybe it should just always be turned on. I'd like to keep the debug logic in any case since I think you'd likely re-write it down the road when adding a new test or trying to debug a failing one.

I tried changing the code to be DEBUG = os.environ.get('EVENTS_CHECKER_DEBUG') == '1' but it doesn't work with tox. It seems tox will sanitize the process environment before running tests.

ngoldbaum · 2026-01-06T20:54:16Z

tests/test_emitter.py

-    expect_event(FileCreatedEvent(p("a")))
-
-    if not platform.is_windows():
-        expect_event(DirModifiedEvent(p()))


did this check get intentionally deleted?

Yup. I can put it back though. I think that event doesn't appear on other platforms as well so the condition might need to be if platform.is_linux() instead. I'll check. My logic was that you really care about the FileCreatedEvent event being generated.

ngoldbaum · 2026-01-06T20:58:03Z

tests/test_emitter.py

+    checker.add(FileCreatedEvent, "a_\udce4")
    if not platform.is_windows():
-        event = event_queue.get(timeout=5)[0]
-        assert os.path.normpath(event.src_path) == os.path.normpath(p(""))


Is this assert preserved?

Yes, it's checked by checker.add(DirModifiedEvent, ".")

ngoldbaum · 2026-01-06T20:59:22Z

tests/test_emitter.py

-
-    event = event_queue.get(timeout=5)[0]
-    assert event.src_path in [p("dir1"), p("dir2")]
-    assert isinstance(event, DirModifiedEvent)


was this check intentionally skipped?

The new checking code is as follows and I think it matches the spirit of the original test. The order of the two DirModifiedEvents is not specified by the original test logic.

checker = events_checker() if not platform.is_windows(): checker.add(FileMovedEvent, "dir1/a", dest_path="dir2/b") checker.add(DirModifiedEvent, "dir1") checker.add(DirModifiedEvent, "dir2") else: checker.add(FileDeletedEvent, "dir1/a") checker.add(FileCreatedEvent, "dir2/b") checker.check_events()

ngoldbaum · 2026-01-06T21:00:17Z

tests/test_emitter.py

-        assert event.src_path in [p("dir1"), p("dir2")]
-        assert isinstance(event, DirModifiedEvent)
+        checker.add(DirModifiedEvent, "dir1")
+        checker.add(DirModifiedEvent, "dir2")


This is only getting checked on Windows now, but there were checks outside this if statement that are missing

New code is as follows. Again I think that matches the spirit of the original test:

checker = events_checker() if not platform.is_windows(): checker.add(FileMovedEvent, "dir1/file", dest_path="dir2/FILE") checker.add(DirModifiedEvent, "dir1") checker.add(DirModifiedEvent, "dir2") else: checker.add(FileDeletedEvent, "dir1/file") checker.add(FileCreatedEvent, "dir2/FILE") checker.check_events()

ngoldbaum · 2026-01-06T21:01:33Z

tests/test_emitter.py

-    if platform.is_windows():
-        expected_events = [a_deleted, d_created]
-
-    if platform.is_bsd():


this isn't needed anymore because we no longer care about ordering, right?

ngoldbaum · 2026-01-06T21:02:19Z

tests/test_emitter.py

-@pytest.mark.flaky(max_runs=5, min_passes=1, rerun_filter=rerun_filter)
+
+    checker = events_checker()
+    for _ in range(times):


the old code used times * 4

As above, using times is correct. We add 4 expected events each time through loop. Old code consumed events one-by-one and so needed the 4x.

ngoldbaum · 2026-01-06T21:02:53Z

tests/test_emitter.py

-        DirCreatedEvent: times,
-        DirModifiedEvent: times * 2,
-        DirDeletedEvent: times,
-    }


Any way to get these count checks back using EventChecker?

The checker is already doing that. New code is as follows:

checker = events_checker() for _ in range(times): checker.add(DirCreatedEvent, "dir1/sub_dir1") checker.add(DirModifiedEvent, "dir1") checker.add(DirDeletedEvent, "dir1/sub_dir1") checker.add(DirModifiedEvent, "dir1")

The checker uses a list not a set for the expected events. So there will be times*2 occurrences of DirModifiedEvent expected. Using times rather than times*4 in the loop is correct.

nascheme · 2026-01-06T21:45:59Z

Hmm this seems like a critical change: events order is actually relevant to test. We need to be sure that the library will emit events in order, else we will have weird issues. And not testing that, we will move that implicit check to downstream projects: this is not OK.

I'm not intimately familiar with the OS APIs being used but, based on my readings, we cannot assume they give you file change events in some specific order. It depends on the OS and even the underlying filesystem being used. So checking for event ordering in tests doesn't seem like a good idea. Also, I think telling users that events are returned in some defined order when the underlying API can't assure that seems bad. The users will build unreliable software as well.

I can understand that watchdog is doing a "best effort" in trying to return events in some order. So we want the unit tests to try to check that we haven't broken that.

I can modify the PR to keep checking order, that's not a hard change. I'll modify the events_checker() function to check ordering of events by default. An additional enhancement would be to check ordering based on the underlaying OS API. E.g. if Linux inotify does return events in a deterministic order, turn on the ordering check, otherwise don't care about order.

Anyway, thanks a lot the for effort you've put here. I am not throwing away everything but delaying for now.

That's fine and I appreciate the prompt feedback. It's a large PR and so I know that's a problem for reviewing. Also, the change in what we are actually checking makes it more difficult in determining if it's okay. What I could do to make progress is to turn on the ordering checking by default and then see which tests become unreliable and on what platforms. Then we can potentially deal with these on a case-by-case basis.

An example where I think it doesn't make sense to check the order. I haven't tested this but I would assume those two events can be swapped, depending on OS and filesystem.

rm("dir1/a")
expect_event(FileDeletedEvent("dir1/a"))
expect_event(DirModifiedEvent("dir1"))

nascheme · 2026-01-07T03:53:02Z

Some progress on improving this. It seems that on Linux (with inotify?) the ordering of events is deterministic. For Windows (at least on my Windows 10 VM) they are also deterministic. On my MacBook, they are not (get both events out of order and sometimes unexpected events). I'm working on an improved version of this patch.

nascheme · 2026-01-07T05:05:00Z

Revised version, passing on MacOS, Linux and Windows for me. I didn't test BSD. The events_checker() now confirms that the expected events exactly match the received events, if you are not on MacOS (Linux, Windows, BSD). On MacOS, it uses the more relaxed check, ensuring that at least the excepted events are present but doesn't care about order. I think that's the best with can do when using the fsevents API.

I switching to using it as a context manager, as suggested by Nathan. It only saves one line of code but it seems slightly cleaner to me. I also removed the DEBUG class variable and made it so you can call events_checker(verbose=True). I've found that's useful when developing or debugging a single test. Generally the verbose output is too much to have it always enabled, IMHO.

This branch has been rebased on top of the double-buffer winapi change. That's required to make the Windows tests reliable, based on my testing. Once GH-1152 is passing CI and is merged, I can rebase this again if needed or merge with "master".

nascheme · 2026-01-07T06:10:26Z

Are those [flaky markers] still needed?

I suspect they are not but I was going to leave them in until we can confirm, via CI runs, that they can be removed.

It's also not clear to me when one would use the new EventChecker vs the old EventQueue. Should all the tests that still use EventQueue get migrated?

It would probably be cleanest to migrate all tests. However, it takes some work since some test files don't use the same fixtures. In order to use events_checker(), the test needs to use the utils fixtures.

If not, maybe it's worth adding some text to docs/source/hacking.rst that explains when you'd want to use one or the other and also just generally explains how to work with the test suite.

I think hacking.rst should be updated in any case, suggesting to use the new fixture and why you would want to.

Reading over the PR, it occurs to me that you could make save a bit of boilerplate by making the events_checker a context manager.

Good idea, changed.

I only got about halfway through test_mitter.py before running out of steam. If the missing checks weren't intentional it may be a good idea to go over test_emitter.py one more time to make sure all the old checks are still there.

Now that the fixture is checking for exactly the expected list of events, I think it's unlikely things were missed (otherwise the test would be failing). My only concern now is that different OS versions or filesystems might generate different sequences of events. I tested on MacOS Sonoma, Windows 10, Debian Linux on a XFS filesystem. Perhaps if CI passes we can consider it good enough?

BoboTiG · 2026-01-07T09:05:18Z

That's way better with ordering preserved, thank you!

This makes the unit tests more reliable by allowing events to be received in different orders and by allowing additional (non-expected) events.

This matches the original test logic, before addition of the events_checker() fixture.

Rather than allow events in any order, match the order of the events as added. On MacOS, fsevents does not produce events in a determistic order so allow any ordering on that platform.

This reduces the code a little. Use `ec` as the short local name of the checker instance. Add `verbose` argument to `events_checker()` that turns on verbose debugging output.

Remove events that are sometimes not generated.

nascheme · 2026-01-07T20:57:44Z

CI testing will hopefully be fixed now. There was some MacOS failures because the CI runs return different events vs what I get on my MacBook. E.g. for test_create my MacBook consistently gives:

            ec.add(FileCreatedEvent, "a")
            ec.add(DirModifiedEvent, ".")
            ec.add(FileModifiedEvent, "a")

But in CI it only gives the FileCreatedEvent.

On Windows, the test_create test was returning no events in CI. I think that's a kind of race condition with the ReadDirectoryChangesW() actually starting inside the worker thread. I added some synchronization that makes the race window smaller. The real fix would be overlapped IO or completion ports.

The Windows test_move_nested_subdirectories_on_windows test was being unreliable, sometimes giving a DirModifiedEvent, "dir2/dir3" event and sometimes not. I changed the test to make that optional, by calling ec.allow_extra_events(). The checker is still requiring the expected events in the defined order and so the test is equivalent to what was being tested before this PR.

Note that the events_checker() fixture added by this PR is generally more strict than what was previously being tested. Previously the unit tests would typically get events one-by-one from the queue and then check they match what is expected and in the order. This PR will get all available events from the queue. I think as long as this doesn't make the tests flaky, it's better to be strict.

I see one failure yet on Linux 3.13t but I think that's unrelated to this change. It is test_late_double_deletion() and that doesn't use the new fixture.

ngoldbaum · 2026-01-07T21:04:18Z

If 3.13t is annoying to support, it's ok to skip IMO. Anyone newly working with the free-threaded will probably be using 3.14t.

nascheme mentioned this pull request Jan 6, 2026

Improve test reliability and fix multi-threading issues #1150

Draft

ngoldbaum reviewed Jan 6, 2026

View reviewed changes

nascheme force-pushed the events-checker branch from 5c16703 to 7cbc5ec Compare January 7, 2026 04:56

nascheme added 7 commits January 7, 2026 12:16

Wait for directory reader thread to start.

4b21b2f

fix: add events_checker() fixture, use for tests

09607de

This makes the unit tests more reliable by allowing events to be received in different orders and by allowing additional (non-expected) events.

Check for DirModifiedEvent on file create.

b158ea6

This matches the original test logic, before addition of the events_checker() fixture.

Make events_checker() validate order by default.

5fc59e3

Rather than allow events in any order, match the order of the events as added. On MacOS, fsevents does not produce events in a determistic order so allow any ordering on that platform.

Use events_checker() as a context manager.

51603b3

This reduces the code a little. Use `ec` as the short local name of the checker instance. Add `verbose` argument to `events_checker()` that turns on verbose debugging output.

Fix lint warnings.

2221da6

Unit test fixes for MacOS.

93b8d86

Remove events that are sometimes not generated.

nascheme force-pushed the events-checker branch from 7cbc5ec to 93b8d86 Compare January 7, 2026 20:19

Fix unreliable test for Window.

fd7d7b3

Fix lint warning, improve comment.

24aab60

nascheme marked this pull request as ready for review January 7, 2026 22:21

BoboTiG merged commit 3f8a12f into gorakhargosh:master Jan 8, 2026
30 checks passed

Uh oh!

fix: add events_checker() fixture, use for tests #1151

fix: add events_checker() fixture, use for tests #1151

Conversation

nascheme commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BoboTiG commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nascheme Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nascheme commented Jan 6, 2026

Uh oh!

nascheme commented Jan 7, 2026

Uh oh!

nascheme commented Jan 7, 2026

Uh oh!

nascheme commented Jan 7, 2026

Uh oh!

BoboTiG commented Jan 7, 2026

Uh oh!

nascheme commented Jan 7, 2026

Uh oh!

ngoldbaum commented Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nascheme commented Jan 6, 2026 •

edited

Loading

BoboTiG commented Jan 6, 2026 •

edited

Loading

nascheme Jan 6, 2026 •

edited

Loading