Skip to content

Conversation

@nascheme
Copy link
Contributor

@nascheme nascheme commented Jan 6, 2026

This makes the unit tests more reliable by using a helper class to handle the checking of received events. On the MacOS platform, events are allowed to be received in different orders and additional (non-expected) events are also allowed. On other platforms (Linux, Windows, BSD), the checker confirms that exactly the events expected are received and confirms the ordering.

There is a new pytest fixture events_checker(). Typical usage is as follows:

with events_checker() as ec:
    ec.add(FileOpenedEvent, "a")
    ec.add(FileClosedNoWriteEvent, "a")

I did not change all unit tests to use this fixture, mostly just the ones in test_emitter. This change removes most of the pytest.mark.flaky() markers since the tests are more reliable.

@BoboTiG
Copy link
Collaborator

BoboTiG commented Jan 6, 2026

Hmm this seems like a critical change: events order is actually relevant to test. We need to be sure that the library will emit events in order, else we will have weird issues. And not testing that, we will move that implicit check to downstream projects: this is not OK.

I get it that tests are quite flaky, and this is a problem for the comminity.

The proposed change reduce the code to write, that's cool and more readable.

Honestly, I do not know what to do with that right now. Lets move forward on the rest of the stuff, and look back to this one later.

Maybe others have opinions too.

Anyway, thanks a lot the for effort you've put here. I am not throwing away everything but delaying for now.

Copy link
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like flaky is only used in three places once this PR is applied:

goldbaum at Nathans-MBP in ~/Documents/watchdog on events-checker!
± rg flaky
requirements-tests.txt
2:flaky==3.8.1

tests/utils.py
94:        Provides some robustness for the otherwise flaky nature of asynchronous notifications.

tests/test_0_watchmedo.py
176:@pytest.mark.flaky(max_runs=5, min_passes=1)

tests/test_delayed_queue.py
10:@pytest.mark.flaky(max_runs=5, min_passes=1)
21:@pytest.mark.flaky(max_runs=5, min_passes=1)

Are those still needed?

It's also not clear to me when one would use the new EventChecker vs the old EventQueue. Should all the tests that still use EventQueue get migrated?

If not, maybe it's worth adding some text to docs/source/hacking.rst that explains when you'd want to use one or the other and also just generally explains how to work with the test suite.

Reading over the PR, it occurs to me that you could make save a bit of boilerplate by making the events_checker a context manager. Then there's no need to explicitly call check_events - exiting the context manager does that automatically.

I only got about halfway through test_mitter.py before running out of steam. If the missing checks weren't intentional it may be a good idea to go over test_emitter.py one more time to make sure all the old checks are still there.

tests/utils.py Outdated

class _EventsChecker:
# If True, output verbose debugging to stderr.
DEBUG = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a personal style thing but I'd tend to leave out disabled debug code once I have the code working. IMO code that is disabled statically and not tested is destined to inevitably bitrot.

Copy link
Contributor Author

@nascheme nascheme Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove it but when you have tests failing, it's extremely useful to have this debugging output to see what events are expected and what events are generated. Maybe it should just always be turned on. I'd like to keep the debug logic in any case since I think you'd likely re-write it down the road when adding a new test or trying to debug a failing one.

I tried changing the code to be DEBUG = os.environ.get('EVENTS_CHECKER_DEBUG') == '1' but it doesn't work with tox. It seems tox will sanitize the process environment before running tests.

expect_event(FileCreatedEvent(p("a")))

if not platform.is_windows():
expect_event(DirModifiedEvent(p()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this check get intentionally deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I can put it back though. I think that event doesn't appear on other platforms as well so the condition might need to be if platform.is_linux() instead. I'll check. My logic was that you really care about the FileCreatedEvent event being generated.

checker.add(FileCreatedEvent, "a_\udce4")
if not platform.is_windows():
event = event_queue.get(timeout=5)[0]
assert os.path.normpath(event.src_path) == os.path.normpath(p(""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assert preserved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's checked by checker.add(DirModifiedEvent, ".")


event = event_queue.get(timeout=5)[0]
assert event.src_path in [p("dir1"), p("dir2")]
assert isinstance(event, DirModifiedEvent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this check intentionally skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new checking code is as follows and I think it matches the spirit of the original test. The order of the two DirModifiedEvents is not specified by the original test logic.

    checker = events_checker()
    if not platform.is_windows():
        checker.add(FileMovedEvent, "dir1/a", dest_path="dir2/b")
        checker.add(DirModifiedEvent, "dir1")
        checker.add(DirModifiedEvent, "dir2")
    else:
        checker.add(FileDeletedEvent, "dir1/a")
        checker.add(FileCreatedEvent, "dir2/b")
    checker.check_events()

assert event.src_path in [p("dir1"), p("dir2")]
assert isinstance(event, DirModifiedEvent)
checker.add(DirModifiedEvent, "dir1")
checker.add(DirModifiedEvent, "dir2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only getting checked on Windows now, but there were checks outside this if statement that are missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code is as follows. Again I think that matches the spirit of the original test:

    checker = events_checker()
    if not platform.is_windows():
        checker.add(FileMovedEvent, "dir1/file", dest_path="dir2/FILE")
        checker.add(DirModifiedEvent, "dir1")
        checker.add(DirModifiedEvent, "dir2")
    else:
        checker.add(FileDeletedEvent, "dir1/file")
        checker.add(FileCreatedEvent, "dir2/FILE")
    checker.check_events()

if platform.is_windows():
expected_events = [a_deleted, d_created]

if platform.is_bsd():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed anymore because we no longer care about ordering, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

@pytest.mark.flaky(max_runs=5, min_passes=1, rerun_filter=rerun_filter)

checker = events_checker()
for _ in range(times):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old code used times * 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, using times is correct. We add 4 expected events each time through loop. Old code consumed events one-by-one and so needed the 4x.

DirCreatedEvent: times,
DirModifiedEvent: times * 2,
DirDeletedEvent: times,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to get these count checks back using EventChecker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checker is already doing that. New code is as follows:

    checker = events_checker()
    for _ in range(times):
        checker.add(DirCreatedEvent, "dir1/sub_dir1")
        checker.add(DirModifiedEvent, "dir1")
        checker.add(DirDeletedEvent, "dir1/sub_dir1")
        checker.add(DirModifiedEvent, "dir1")

The checker uses a list not a set for the expected events. So there will be times*2 occurrences of DirModifiedEvent expected. Using times rather than times*4 in the loop is correct.

@nascheme
Copy link
Contributor Author

nascheme commented Jan 6, 2026

Hmm this seems like a critical change: events order is actually relevant to test. We need to be sure that the library will emit events in order, else we will have weird issues. And not testing that, we will move that implicit check to downstream projects: this is not OK.

I'm not intimately familiar with the OS APIs being used but, based on my readings, we cannot assume they give you file change events in some specific order. It depends on the OS and even the underlying filesystem being used. So checking for event ordering in tests doesn't seem like a good idea. Also, I think telling users that events are returned in some defined order when the underlying API can't assure that seems bad. The users will build unreliable software as well.

I can understand that watchdog is doing a "best effort" in trying to return events in some order. So we want the unit tests to try to check that we haven't broken that.

I can modify the PR to keep checking order, that's not a hard change. I'll modify the events_checker() function to check ordering of events by default. An additional enhancement would be to check ordering based on the underlaying OS API. E.g. if Linux inotify does return events in a deterministic order, turn on the ordering check, otherwise don't care about order.

Anyway, thanks a lot the for effort you've put here. I am not throwing away everything but delaying for now.

That's fine and I appreciate the prompt feedback. It's a large PR and so I know that's a problem for reviewing. Also, the change in what we are actually checking makes it more difficult in determining if it's okay. What I could do to make progress is to turn on the ordering checking by default and then see which tests become unreliable and on what platforms. Then we can potentially deal with these on a case-by-case basis.

An example where I think it doesn't make sense to check the order. I haven't tested this but I would assume those two events can be swapped, depending on OS and filesystem.

rm("dir1/a")
expect_event(FileDeletedEvent("dir1/a"))
expect_event(DirModifiedEvent("dir1"))

@nascheme
Copy link
Contributor Author

nascheme commented Jan 7, 2026

Some progress on improving this. It seems that on Linux (with inotify?) the ordering of events is deterministic. For Windows (at least on my Windows 10 VM) they are also deterministic. On my MacBook, they are not (get both events out of order and sometimes unexpected events). I'm working on an improved version of this patch.

@nascheme
Copy link
Contributor Author

nascheme commented Jan 7, 2026

Revised version, passing on MacOS, Linux and Windows for me. I didn't test BSD. The events_checker() now confirms that the expected events exactly match the received events, if you are not on MacOS (Linux, Windows, BSD). On MacOS, it uses the more relaxed check, ensuring that at least the excepted events are present but doesn't care about order. I think that's the best with can do when using the fsevents API.

I switching to using it as a context manager, as suggested by Nathan. It only saves one line of code but it seems slightly cleaner to me. I also removed the DEBUG class variable and made it so you can call events_checker(verbose=True). I've found that's useful when developing or debugging a single test. Generally the verbose output is too much to have it always enabled, IMHO.

This branch has been rebased on top of the double-buffer winapi change. That's required to make the Windows tests reliable, based on my testing. Once GH-1152 is passing CI and is merged, I can rebase this again if needed or merge with "master".

@nascheme
Copy link
Contributor Author

nascheme commented Jan 7, 2026

Are those [flaky markers] still needed?

I suspect they are not but I was going to leave them in until we can confirm, via CI runs, that they can be removed.

It's also not clear to me when one would use the new EventChecker vs the old EventQueue. Should all the tests that still use EventQueue get migrated?

It would probably be cleanest to migrate all tests. However, it takes some work since some test files don't use the same fixtures. In order to use events_checker(), the test needs to use the utils fixtures.

If not, maybe it's worth adding some text to docs/source/hacking.rst that explains when you'd want to use one or the other and also just generally explains how to work with the test suite.

I think hacking.rst should be updated in any case, suggesting to use the new fixture and why you would want to.

Reading over the PR, it occurs to me that you could make save a bit of boilerplate by making the events_checker a context manager.

Good idea, changed.

I only got about halfway through test_mitter.py before running out of steam. If the missing checks weren't intentional it may be a good idea to go over test_emitter.py one more time to make sure all the old checks are still there.

Now that the fixture is checking for exactly the expected list of events, I think it's unlikely things were missed (otherwise the test would be failing). My only concern now is that different OS versions or filesystems might generate different sequences of events. I tested on MacOS Sonoma, Windows 10, Debian Linux on a XFS filesystem. Perhaps if CI passes we can consider it good enough?

@BoboTiG
Copy link
Collaborator

BoboTiG commented Jan 7, 2026

That's way better with ordering preserved, thank you!

This makes the unit tests more reliable by allowing events to be
received in different orders and by allowing additional (non-expected)
events.
This matches the original test logic, before addition of the
events_checker() fixture.
Rather than allow events in any order, match the order of the events as
added.  On MacOS, fsevents does not produce events in a determistic
order so allow any ordering on that platform.
This reduces the code a little.  Use `ec` as the short local name of
the checker instance.

Add `verbose` argument to `events_checker()` that turns on verbose
debugging output.
Remove events that are sometimes not generated.
@nascheme
Copy link
Contributor Author

nascheme commented Jan 7, 2026

CI testing will hopefully be fixed now. There was some MacOS failures because the CI runs return different events vs what I get on my MacBook. E.g. for test_create my MacBook consistently gives:

            ec.add(FileCreatedEvent, "a")
            ec.add(DirModifiedEvent, ".")
            ec.add(FileModifiedEvent, "a")

But in CI it only gives the FileCreatedEvent.

On Windows, the test_create test was returning no events in CI. I think that's a kind of race condition with the ReadDirectoryChangesW() actually starting inside the worker thread. I added some synchronization that makes the race window smaller. The real fix would be overlapped IO or completion ports.

The Windows test_move_nested_subdirectories_on_windows test was being unreliable, sometimes giving a DirModifiedEvent, "dir2/dir3" event and sometimes not. I changed the test to make that optional, by calling ec.allow_extra_events(). The checker is still requiring the expected events in the defined order and so the test is equivalent to what was being tested before this PR.

Note that the events_checker() fixture added by this PR is generally more strict than what was previously being tested. Previously the unit tests would typically get events one-by-one from the queue and then check they match what is expected and in the order. This PR will get all available events from the queue. I think as long as this doesn't make the tests flaky, it's better to be strict.

I see one failure yet on Linux 3.13t but I think that's unrelated to this change. It is test_late_double_deletion() and that doesn't use the new fixture.

@ngoldbaum
Copy link
Contributor

If 3.13t is annoying to support, it's ok to skip IMO. Anyone newly working with the free-threaded will probably be using 3.14t.

@nascheme nascheme marked this pull request as ready for review January 7, 2026 22:21
@BoboTiG BoboTiG merged commit 3f8a12f into gorakhargosh:master Jan 8, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants