Skip to content

Refactor FileWatchSensor to remove logshipper #5096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 44 commits into
base: master
Choose a base branch
from
Draft

Conversation

blag
Copy link
Contributor

@blag blag commented Nov 29, 2020

This PR refactors the FileWatchSensor from the linux pack, to remove it's dependency on logshipper, and use watchdog instead.

Upstream logshipper hasn't been touched in years, so we switched to our own fork. That hasn't been well maintained either, and the Tail implementation, which is all we use from it, is a bit of a mess on top of depending on pyinotify has been itself unmaintained for even longer.

This PR borrows some code from the previous sensor, but also breaks up the code into well structured classes that all have a single focus, possibly allowing this code to be reused elsewhere. Additionally, you can run the sensor Python script itself, which should help serve as an example of how to write a "good" sensor.

The watchdog project is now being actively maintained, has wide support for Linux, macOS, BSD, Windows, and a polling fallback implementation. The pyinotify package refuses to even install on macOS (not being able to run is understandable, but preventing installation is too limiting IMO).

I realize that it's a bit odd to make the linux depend less on Linux-specific features, but since pyinotify refused to even install on macOS, I couldn't do simple things like run make requirements without Linux running in a full Docker or VM environment.

This PR also fixes a bug in logshipper (has to do with lines that do not end in newlines - previous code only took the last character, new code takes the whole line), and also presents a workaround for another possible bug in the comments.

Currently being held up by #5095.

@blag blag added this to the 3.4.0 milestone Nov 29, 2020
@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Nov 29, 2020
@blag blag force-pushed the remove-logshipper branch from 3b77378 to 0a2f6ae Compare December 12, 2020 04:39
@blag blag force-pushed the remove-logshipper branch from 285c6ba to d175507 Compare December 13, 2020 01:24
@arm4b arm4b requested a review from Kami December 14, 2020 12:00
@Kami
Copy link
Member

Kami commented Dec 16, 2020

Overall, I'm fine with the change, but some tests would be good :)

@pull-request-size pull-request-size bot added size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several. and removed size/L PR that changes 100-499 lines. Requires some effort to review. labels Dec 22, 2020
@blag
Copy link
Contributor Author

blag commented Dec 22, 2020

Just running these tests gives us excellent coverage:

................................................
Name                   Stmts   Miss Branch BrPart  Cover
--------------------------------------------------------
file_watch_sensor.py     221      8     68     11    93%
----------------------------------------------------------------------
Ran 48 tests in 14.800s

OK

Almost all of the untested code was in the sensor itself.

@arm4b arm4b requested a review from Kami December 22, 2020 23:56
@blag
Copy link
Contributor Author

blag commented Dec 25, 2020

I tested this manually, both on macOS bare metal and in a Docker container (running on macOS).

contrib/linux/Dockerfile:

FROM ubuntu:18.04

RUN apt update && apt install --yes git make python3-dev python3-pip

RUN python3 -m pip install coverage eventlet mock nose watchdog

VOLUME /code

WORKDIR /code
ENV PYTHONPATH=actions:sensors
ENV NOSE_COVER_CONFIG_FILE=.coveragerc
CMD nosetests --with-coverage --cover-branches --cover-html tests/test_file_watch_sensor.py

And the coverage configuration contrib/linux/.coveragerc:

[run]
include = sensors/file_watch_sensor.py

[report]
include = sensors/file_watch_sensor.py

Commands (on bare metal):

cd contrib/linux
docker build --tag contrib-linux-pack-test .
docker run --rm --volume $(pwd):/code --workdir /code contrib-linux-pack-test

If you want to debug anything, run:

docker run -it --rm --volume $(pwd):/code --workdir /code contrib-linux-pack-test /bin/bash

Commands (within running Docker container):

# run a single test (specified at the bottom of test_file_watch_sensor.py)
python3 tests/test_file_watch_sensor.py

# run all tests and report coverage results in HTML
nosetests --with-coverage --cover-branches --cover-html tests/test_file_watch_sensor.py

then you can view results in your web browser.

I'm having an issue with the Travis tests where inotify events don't seem to be delivered. I'll have to debug that. But these testing instructions should suffice for anybody else who is interested in testing this out.

@cognifloyd
Copy link
Member

The requirements compile issues have been resolved.
The pylint error about importing pika in a tools/ script has been resolved.

Now there's one consistent error left in the unit tests, though I don't know why this PR would cause that:

======================================================================
1) FAIL: test_process_error_handling (tests.unit.test_workflow_engine.WorkflowExecutionHandlerTest)
----------------------------------------------------------------------
   Traceback (most recent call last):
    virtualenv/lib/python3.6/site-packages/mock/mock.py line 1346 in patched
      return func(*newargs, **newkeywargs)
    st2actions/tests/unit/test_workflow_engine.py line 203 in test_process_error_handling
      self.assertEqual(t1_ex_db.status, wf_statuses.FAILED)
   AssertionError: 'succeeded' != 'failed'
   - succeeded
   + failed

@cognifloyd
Copy link
Member

Well. I can't reproduce that test failure in CI. It might be related to #5358 which was just merged and touched the test that was failing. But, it's not failing anymore. If it shows up again, we can look into it.

Anyway, all tests are green now.

@cognifloyd
Copy link
Member

cognifloyd commented Oct 15, 2021

I'm currently testing this.

  • ✅ The first issue is that the Readme is not correct (and wasn't correct for the previous sensor either). You do NOT add a list of files to watch in the pack config. Instead, you must create a rule that uses the linux.file_watch.line trigger. You define the file_path trigger parameter in that rule.

  • Next issue: Using the original sensor, I've tried several variations on the rule, but I don't see any trigger instances or rule enforcements. So, the status quo does not work for me. Interestingly, the sensorcontainer logs do show that it adds and watches the correct file.

@cognifloyd
Copy link
Member

cognifloyd commented Oct 15, 2021

  • ✅ Next issue: If a rule is updated (eg change file path being watched) then the sensor needs to handle the update and change the watched path. (testing the sensor in this PR)

@cognifloyd
Copy link
Member

OK. With debug logging (using the sensor from this PR), I can see that:

  • TailManager.run() logs its Running TailManager line
  • FileWatchSensor.add_trigger() gets called as expected due to a rule that uses the appropriate parametrized trigger
    • tail_manager.tail_file() gets called
      • SingleFileTail inits correctly
        • opens the file
        • runs observer.schedule() to start the file and parent dir watches
    • tail_manager.start() gets called logging its Starting TailManager line
      • the observer.start() method does not seem to be returning. I do not see the Started Observer, emitters log line.
        • observer.start() loops through its emitters, calling emitter.start(), but that is already called in observer.schedule() so that's probably not it
        • observer.start() also calls BaseThread.start() (via super(): approx mro BaseObserver(EventDispatcher(watchdog.utils.BaseThread)))
        • BaseThread(threading.Thread).start() calls self.on_thread_start()
          • This inits an InotifyBuffer which is itself a BaseThread.
            • inotify_buffer is geting the file change notifications correctly because its log lines are showing up
        • BaseThread(threading.Thread).start() should finally call threading.Thread.start(self) but I can't tell if this gets called or if it completes
    • so, add_trigger never returns - this is probably one reason updates don't happen.

observer is a watchdog.observers.Observer instance.

We might also need to modify:

  • TailManager.run() as it uses time.sleep ... not sure if we need to use eventlet's version

@cognifloyd cognifloyd marked this pull request as draft October 15, 2021 04:32
@cognifloyd
Copy link
Member

Since this doesn't work with eventlet yet, I marked it as draft.

@cognifloyd
Copy link
Member

Using eventlet.debug.spew() it looks like watchdog's observer is blocking on trying to read events with os.read(self._inotify_fd, event_buffer_size). If the thread patching had worked it would have switched contexts back to something else (another greenlet / thread).

When I modify the file, this forces it to continue on to the next line (because it was able to read the fd). Then it seems to get lost in the logger:

Subsequent file edits never spew anything else from watchdog. Instead there are some lines from eventlet.green.threading and a loop in eventlet.patcher and a couple lines from threading about a limbo lock... Something is not respecting eventlet. Not sure what.

Lockfile diff: lockfiles/st2.lock [st2]

==                      Added dependencies                      ==

  watchdog                       3.0.0

==                     Removed dependencies                     ==

  logshipper                     0.0.0
  pyinotify                      0.9.6
@cognifloyd
Copy link
Member

...

  • tail_manager.start() gets called logging its Starting TailManager line

    • the observer.start() method does not seem to be returning. I do not see the Started Observer, emitters log line.

      • observer.start() loops through its emitters, calling emitter.start(), but that is already called in observer.schedule() so that's probably not it

      • observer.start() also calls BaseThread.start() (via super(): approx mro BaseObserver(EventDispatcher(watchdog.utils.BaseThread)))

      • BaseThread(threading.Thread).start() calls self.on_thread_start()

        • This inits an InotifyBuffer which is itself a BaseThread.

          • inotify_buffer is geting the file change notifications correctly because its log lines are showing up
      • BaseThread(threading.Thread).start() should finally call threading.Thread.start(self) but I can't tell if this gets called or if it completes

  • so, add_trigger never returns - this is probably one reason updates don't happen.

observer is a watchdog.observers.Observer instance.

Watchdog uses threading. But the threading module is not getting monkey patched. I tried adjusting the monkey patching, monkey_patch(patch_thread=True), but that made no difference.

Actually, it looks like threading is partially getting monkey patched, but adjusting that monkey patch call didn't change anything.

For reference, I used this to determine the state of monkey patching, just before the observer "thread" is supposed to start. In every scenario I tried, I got eventlet.green.threading and False, so it is monkey patched, but eventlet doesn't think it is. 🤷 So, this is going to have to wait until we can get rid of eventlet. 😭

    def start(self):
        if self.tails and not self.started:
            self.logger.debug("Starting TailManager")
            import threading
            self.logger.debug(f"threading monkey_patch: {threading.current_thread.__module__}")
            import eventlet
            self.logger.debug(f"threading monkey_patch: {eventlet.patcher.is_monkey_patched(threading)}")

            self.observer.start()

We might also need to modify:

  • TailManager.run() as it uses time.sleep ... not sure if we need to use eventlet's version

I tested using eventlet.sleep, but that didn't make a difference because the context never goes back to that loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement external dependency packs refactor size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several.
Projects
No open projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

6 participants