Add E2E tests for the msentraid broker #659

denisonbarbosa · 2025-09-04T13:02:32Z

We've been needing those for a while now. Although they're not fully stable and automated yet, we should add the tests here to follow up with the other tasks and to discuss implementation details.

More details are explained in the individual commits.

UDENG-7424

codecov-commenter · 2025-09-04T13:07:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.33%. Comparing base (868dd8a) to head (1e6b111).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #659   +/-   ##
=======================================
  Coverage   78.33%   78.33%           
=======================================
  Files          22       22           
  Lines        1274     1274           
=======================================
  Hits          998      998           
  Misses        276      276

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

e2e-tests/common/utils.resource

e2e-tests/msentraid/common/broker.resource

e2e-tests/resources/authd-msentraid/broker.resource

e2e-tests/msentraid/common/BrowserWindow.py

3v1n0 · 2025-09-15T13:07:43Z

e2e-tests/msentraid/common/BrowserWindow.py

+    browser.send_key_taps([Gdk.KEY_Return])
+    browser.wait_for_stable_page()
+
+    sleep(5)


Should we wait here or in the caller?

I think we are sleeping at both... The sleep here is to make sure we wait for the request to fully finish before exiting it.

Ok, that's acceptable... But... We should never sleep this way in UI applications, but rely instead on the main loop, so can do instead something like:

def wait_for_timeout(timeout): loop = GLib.MainLoop() def on_timeout(): loop.quit() return False GLib.timeout_add(timeout, on_timeout) loop.run()

I feel like that is what the wait_for_stable_page function was supposed to be doing. Maybe we have a logic flaw somewhere?

wait_for_stable_page is only waiting if there are no page updates, we can maybe tune the timeout though.

3v1n0 · 2025-09-15T13:09:30Z

e2e-tests/msentraid/common/broker.resource

+    ${text} =    PlatformVideoInput.Read Text
+    ${user_code} =    StringUtils.FirstMatch    (https://)?microsoft.com/devicelogin\n((Login code: )?([A-Z0-9]+))    ${text}
+
+    Start Process    ${ENTRAID_COMMON}/BrowserWindow.py    ${username}@${domain}    %{E2E_PASSWORD}    ${user_code}    alias=RemoteAuth


Do we want to run it with RUNNING_OFFSCREEN or we prefer to see the UI to record it, just in case?

Sometimes it bugs out and types the password in the incorrect field (I couldn't understand why exactly), but it's something that we definitely do not want to record

Mh, I see... One thing I wanted to add is the ability to save the page content on failure and we can do quite easily, but it was just a quick hack at the time :)

That would definitely be nice to have. I used the script pretty much as you wrote it, so we can definitely improve some stuff there (although I don't really have experience with the GTK flow)

Let's keep this open, and see if I can handle it :)

3v1n0

Great stuff, I hope it's quite stable but we have a great base now

e2e-tests/resources/authd/browser_window.py

3v1n0 · 2025-10-01T16:59:47Z

e2e-tests/msentraid/common/BrowserWindow.py

+    browser.send_key_taps([Gdk.KEY_Return])
+    browser.wait_for_stable_page()
+
+    sys.exit(0)


3v1n0 · 2025-10-01T17:01:53Z

e2e-tests/common/utils.resource

+# Run Command In GNOME Terminal
+#     [Arguments]    ${command}
+#     ${ampersand}    Create List    Shift_L    7
+#     Hid.Type String    clear
+#     Hid.Keys Combo    space
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    space
+#     Hid.Type String    ${command}
+#     Hid.Keys Combo    space
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    space
+#     Hid.Type String    clear
+#     Hid.Keys Combo    space
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    ${ampersand}
+#     Hid.Keys Combo    space
+#     Hid.Type String    echo cmd-finished
+#     # Sleep a bit to ensure everything gets typed properly
+#     BuiltIn.Sleep    2    # hacky
+#     Hid.Keys Combo    Return
+#     # Sleep a bit to avoid matching the text on the command itself
+#     BuiltIn.Sleep    2    # hacky
+#     Match Text    cmd-finished    600


Do we need to keep this?

oops, forgot to remove it

3v1n0 · 2025-10-01T17:02:29Z

e2e-tests/common/utils.resource

+    BuiltIn.Sleep    2
+    Match Text    cmd-finished    600


Isn't sleep already part of matching (600)? Why sleeping more?

Not really... We need the sleep to make sure we don't match the "cmd-finished" of the command itself

But didn't we clear it? I'd say we should do something command & clear && wait && echo cmd-finished, so extra sleep should not be needed

We do clear it, but there's a sync issue between the commands. If the VM command does not run fast enough and YARF progresses to the next instruction, it can match the text before the screen gets updated.

I see, BTW please check with the new syntax and let's see if we can get such race, in case we can add the sleep back (even less than a a second though)

AFAIK, there isn't any other way to avoid this issue. It's the same thing that's done in the installer tests, if I'm not mistaken.

e2e-tests/common/utils.resource

e2e-tests/run_tests.sh

3v1n0 · 2025-10-01T17:32:55Z

e2e-tests/common/images/show-password.png

Is this used anywhere?

e2e-tests/common/images/circle-of-friends.png

e2e-tests/common/utils.resource

e2e-tests/common/BrowserWindow.py

e2e-tests/authd-msentraid/common/browser_login.py

3v1n0 · 2025-10-16T01:52:48Z

e2e-tests/resources/authd-msentraid/browser_login.py

+    if os.getenv("RUN_ONSCREEN") is None and "RUNNING_OFFSCREEN" not in os.environ:
+        os.execv(
+            "/usr/bin/env",
+            [
+                "/usr/bin/env",
+                "RUNNING_OFFSCREEN=1",
+                "GDK_BACKEND=x11",
+                "xvfb-run",
+                "-a",
+                sys.executable,
+            ]
+            + sys.argv,
+        )


Can this be moved to an utility function too?

I think it could, but the code spawns a subprocess on itself, so I think it's good to have a "in your face" approach with it, wdyt?

It was mostly for avoiding to copy this for each test case we'll have (vanilla broker)

3v1n0 · 2025-10-16T01:54:13Z

e2e-tests/resources/authd-msentraid/browser_login.py

+    if len(sys.argv) < 4:
+        print("Usage: BrowserWindow.py <username> <password> <code>")
+        sys.exit(1)
+
+    username = sys.argv[1]
+    password = sys.argv[2]
+    device_code = sys.argv[3]


Not a big deal, but I feel you can have this with 4 lines of argparse

I thought about it, but it's not worth adding the dependency for something as simple as this, especially for a script that is not intended to be run by people.

FYI argparse is part of the standard library

Ok, it definitely shows that it's been a while since I've written Python 😂

e2e-tests/common/utils.resource

e2e-tests/testing.md

3v1n0 · 2025-10-16T02:09:22Z

e2e-tests/vm/cloud-init-template.yaml

+  # Disable automatic updates and remove some unnecessary packages
+  - apt-get remove -y update-manager gnome-initial-setup
+  # We don't need Firefox for the tests, so we can remove it to save some space
+  - snap remove firefox --purge


I feel you can save even more by dropping the whole gnome-* runtimes

Not necessary, IMO. The cached image is already small enough. If we need more space in the future, we could that.

Check it out, since these are quite big eh:

-rw------- 2 root root 517M set 27 02:27 /var/lib/snapd/snaps/gnome-42-2204_226.snap -rw------- 2 root root 619M set 9 10:23 /var/lib/snapd/snaps/gnome-46-2404_125.snap

3v1n0 · 2025-10-16T02:21:22Z

e2e-tests/vm/runner-cloud-cfg.yaml

+  # We don't need Firefox for the tests, so we can remove it to save some space
+  - snap remove firefox --purge
+  # Configure SSH to allow interactive authentication
+  - sed -i 's/KbdInteractiveAuthentication no/KbdInteractiveAuthentication yes/' /etc/ssh/sshd_config


Maybe can use the exact code that is in https://github.com/ubuntu/authd/blob/main/docs/reference/cloud-init-deploy.md ?

Sure. I'll update it (didn't even know we had a guide for cloud-init deployment)

e2e-tests/common/BrowserWindow.py

3v1n0 · 2025-10-16T02:23:06Z

e2e-tests/resources/authd/utils.resource

-    Hid.Keys Combo    Control_L    Alt_L    t
-    Match Text    @ubuntu:~$    15
+Open Terminal
+    Run Command    /usr/bin/x-terminal-emulator


Can drop /usr/bin and maybe call xdg-terminal, since we are not tergetting noble anyways here

Even though we are not targeting Noble (yet, at least), I think it's better to keep the tests as release-independent as possible. If we change that, we would need to have multiple versions of the resource files (and although this might eventually become the case in the future, we should avoid it for now)

As you prefer, the /us/bin/ drop still stands :)

Allows to debug the browser login via the webview. Replaces the functionality of the RUN_ONSCREEN environment variable.

We were frequently seeing errors like the password being pasted into the username field. Lets try to make this more robust by not only waiting for the page to stop loading but also for specific text to be visible.

adombeck · 2025-10-24T15:43:23Z

e2e-tests/resources/authd/utils.resource

+
+
+Update System Time
+    Run Command In Terminal    sudo service chronyd restart && sudo chronyc waitsync


Is chrony even installed? It's not installed by default on Ubuntu AFAICT, and I don't see it being installed explicitly, so if it installed I wonder which package pulls it in as a dependency.

Can't we just use systemd-timesyncd?

Suggested change

Run Command In Terminal sudo service chronyd restart && sudo chronyc waitsync

Run Command In Terminal sudo systemctl restart systemd-timesyncd.service

It's an Ubuntu server image, so it comes with chrony rather than systemd-timesyncd and they are mutually exclusive

Ah ok, I didn't know that. That seems like the right approach then. Let's just use the systemd executable instead of the sys V init one to restart the service:

Suggested change

Run Command In Terminal sudo service chronyd restart && sudo chronyc waitsync

Run Command In Terminal sudo systemctl restart chronyd && sudo chronyc waitsync

3v1n0 · 2025-10-24T17:18:34Z

e2e-tests/resources/authd-msentraid/browser_login.py

+def screenshot_window(window: Gtk.Window, filename: str):
+    # Get widget allocation (size)
+    alloc = window.get_allocation()
+    width, height = alloc.width, alloc.height


I guess we can avoid these repetitions

3v1n0 · 2025-10-24T17:19:35Z

e2e-tests/resources/authd-msentraid/browser_login.py

+    # Get widget allocation (size)
+    alloc = window.get_allocation()


Comment here is unneeded, it's more than clear.

However, maybe we can assert that window.is_drawable() and window.get_mapped()

3v1n0 · 2025-10-24T17:20:35Z

e2e-tests/resources/authd-msentraid/browser_login.py

+    screenshot_window(browser, os.path.join(screenshot_dir, "05-device-login-success.png"))
+
+
+def screenshot_window(window: Gtk.Window, filename: str):


I'd call it: save_window_snapshot

3v1n0 · 2025-10-24T17:25:54Z

e2e-tests/resources/authd-msentraid/browser_login.py

+    if len(sys.argv) < 4:
+        print(f"Usage: {sys.argv[0]} <username> <password> <code> [<output-dir>]")
+        sys.exit(1)
+
+    username = sys.argv[1]
+    password = sys.argv[2]
+    device_code = sys.argv[3]
+    output_dir = sys.argv[4] if len(sys.argv) > 4 else "."
+    screenshot_dir = os.path.join(output_dir, "webview")
+    os.makedirs(screenshot_dir, exist_ok=True)
+
+    try:
+        login(username, password, device_code, screenshot_dir)
+    finally:
+        write_video(screenshot_dir, os.path.join(output_dir, "webview_recording.webm"))


Suggested change

if len(sys.argv) < 4:

print(f"Usage: {sys.argv[0]} <username> <password> <code> [<output-dir>]")

sys.exit(1)

username = sys.argv[1]

password = sys.argv[2]

device_code = sys.argv[3]

output_dir = sys.argv[4] if len(sys.argv) > 4 else "."

screenshot_dir = os.path.join(output_dir, "webview")

os.makedirs(screenshot_dir, exist_ok=True)

try:

login(username, password, device_code, screenshot_dir)

finally:

write_video(screenshot_dir, os.path.join(output_dir, "webview_recording.webm"))

parser = argparse.ArgumentParser()

parser.add_argument("username")

parser.add_argument("password")

parser.add_argument("device_code")

parser.add_argument("--output-dir", required=False, default=os.path.realpath(os.curdir))

args = parser.parse_args()

screenshot_dir = os.path.join(args.output_dir, "webview")

os.makedirs(screenshot_dir, exist_ok=True)

try:

login(args.username, args.password, args.device_code, screenshot_dir)

finally:

write_video(screenshot_dir, os.path.join(args.output_dir, "webview_recording.webm"))

3v1n0 · 2025-10-24T17:30:10Z

e2e-tests/resources/authd-msentraid/browser_login.py

+def write_video(screenshot_dir: str, video_path: str):
+    subprocess.check_call([
+        "ffmpeg",
+        "-y",
+        "-framerate", "1",
+        "-pattern_type", "glob",
+        "-i", f"{screenshot_dir}/*.png",
+        video_path,
+    ])


While this is taking the snapshots and could be fine for some basic debugging, it doesn't catch potential problems IMHO as it relies on us taking snapshots.

So, you can instead just call save_window_snapshot on window draw signal, so that you can write all the frames (or a subset of them, maybe one once second is fine), and you'll get a proper video

3v1n0 · 2025-10-24T17:33:16Z

e2e-tests/resources/authd/browser_window.py

+        def on_timeout():
+            nonlocal poll_id
+            if poll_id:
+                GLib.source_remove(poll_id)


Suggested change

GLib.source_remove(poll_id)

GLib.source_remove(poll_id)

poll_id = 0

3v1n0 · 2025-10-24T17:33:41Z

e2e-tests/resources/authd/browser_window.py

+        poll_id = None
+        timeout_id = None


They're int values

Suggested change

poll_id = None

timeout_id = None

poll_id = 0

timeout_id = 0

3v1n0 · 2025-10-24T17:39:13Z

e2e-tests/resources/authd/browser_window.py

+                     "document.body.innerText.indexOf(%s) !== -1)"
+                 ) % json.dumps(text)
+            self.web_view.run_javascript(js, None, on_js_finished, None)
+            return True  # keep polling until callback quits the loop


Instead of doing this here, I'd return False here and instead restart the polling (calling poll_id = GLib.timeout_add(poll_interval_ms, poll_fn) after the finish function has returned, so that there's no possibility that we're polling twice while the same async function has not returned yet.

3v1n0 · 2025-10-24T17:40:53Z

e2e-tests/resources/authd/browser_window.py

+                    GLib.source_remove(timeout_id)
+                if poll_id:
+                    GLib.source_remove(poll_id)
+                loop.quit()


Suggested change

loop.quit()

def on_js_finished(web_view, result, user_data):

nonlocal poll_id, timeout_id, found

try:

res = web_view.run_javascript_finish(result)

js_value = res.get_js_value()

found = bool(js_value.to_boolean())

except Exception as e:

print(f"Failed checking page JS: {e}", file=sys.stderr)

pass

if found:

if timeout_id:

GLib.source_remove(timeout_id)

timeout_id = 0

GLib.source_remove(poll_id)

poll_id = 0

loop.quit()

3v1n0 · 2025-10-24T17:55:38Z

e2e-tests/resources/authd-msentraid/browser_login.py

+
+    browser.web_view.load_uri("https://microsoft.com/devicelogin")
+
+    browser.wait_for_text_visible("Enter code to allow access")


I think we should force the browser to use english language, otherwise all these are hard to test locally, so we should also set locale.setlocale(locale.LC_ALL, "C")

So running the browser in any setup works with the the website defaults without making them translate

Avoid having to track the browser screenshots indexes manually

While here we should ideally just look for visible items, this is fine for now. However, we can make the check oneline and meanwhile we can ensure that the value we're looking is properly parsed, so dump it as JS and parse it back as such

We were triggering a warning there for this reason, so fix that

The main goal of this commit was to ensure that the poll function only runs *after* the previous request has actually been completed, and this was not the case with the current code that was polling every few milliseconds, but without being sure that the previous JS request was actually completed. So now, inject the JS code after a small timeout, if that fails for an error we just throw (exiting the loop), otherwise: - if the JS returns a true value we can quit the loop - if the JS was evaluated to false, then we can retry, again after that the poll timeout is over. Use a cancellable to control the async function cancellation, as we need to ensure that the underlying thread is stopped when cancelling, but also we need to stop the loop and other cleanup actions

This is a generic feature that does not depend on the actual test, so move it the main window implementation

In case of failures it's when the snapshots are important, so let's record one during that phase

Gdk provides an optimized way to create image surfaces, so use it when possible, although that's not always the case as pygobject bindings are broken in older versions, so add a fallback in case. In any case, respect the surface scaling

Use the GTask thread pools to perform the IO events on threads rather than saving in the main thread

Draw events in gtk3 are special, since it's where widgets are also actually drawing into the surfaces, so we should use them with care and avoid to slow them down. We've now also a problem with it since we're explicitly drawing the window to take snapshots, but this triggers the "draw" signal too, and it creates problem when waiting the page stabilization Since we rely on it to check on UI changes, and we may need to use it in future, it's better to abstract the signal connection so that: - We connect only once to it - Multiple handlers can be added at runtime - Handlers are called in idle functions instead - No handler is called when explicitly drawing the window for creating a snapshot

Support recording the window fully instead on request only so that the video we produce includes the whole session

While show it only if a specific option is passed

denisonbarbosa force-pushed the msentraid-e2e-tests branch from 6d4630c to dbac5b5 Compare September 4, 2025 13:03

denisonbarbosa force-pushed the msentraid-e2e-tests branch from dbac5b5 to ac663de Compare September 8, 2025 10:50