Skip to content

Add GTK4 TextInput support #3239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

Conversation

danyeaw
Copy link
Member

@danyeaw danyeaw commented Mar 6, 2025

Adds TextInput widget for GTK4 as part of #3069.

I'm getting a local test cleanup failure when trying to check that the weakref is None that I haven't been able to resolve.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

image

@danyeaw
Copy link
Member Author

danyeaw commented Mar 6, 2025

It looks like the test cleanup errors are reproducible on CI. Things I have tried so far:

  1. Define a __del__ method like:
def __del__(self):
    """Break reference cycles with GTK4 controllers."""
        self.native.remove_controller(self.focus_controller)
        self.native.remove_controller(self.key_controller)
                
        self.focus_controller = None
        self.key_controller = None

I also tried other combinations of this, by forcing a destroy with self.native.destroy(), and also running del self.native and the controllers.
2. Similar to 1, I also tried forcing a disconnect of the event handlers first during the __del__:

            for handler_id in getattr(self.focus_controller, "handler_ids", []):
                self.focus_controller.disconnect(handler_id)
  1. Modify how we create and connect the controllers to avoid storing direct references to them by only defining them as key_controller and focus_controller instead of self.key_controller and self.focus_controller
  2. Modify the cleanup declaration by updating the test_textinput module to add gtk a skip or xfail with: test_cleanup = build_cleanup_test(toga.TextInput, xfail_platforms=("android", "gtk")). I need to continue to look at why this doesn't seem to have an effect on preventing the failure.

I'm beginning to think that this is more deeply rooted in GTK4's internal reference management.

@freakboy3742
Copy link
Member

I'm beginning to think that this is more deeply rooted in GTK4's internal reference management.

That's certainly possible - at which point it's a little outside our control, so skips may be our only option.

Two related things that might be worth trying:

First - duplicate the gc.collect() line in build_cleanup_test. If there's a circular reference on the Python side, it's possible that it requires 2 GC passes to clear the reference.

If that doesn't work - use gc.get_referrers() to check which objects are preventing the weak reference from being disposed. That should at least point at the culprit object - and if it turns out the culprit is internal to GTK, then we'd have a lot more confidence before using...

  1. Modify the cleanup declaration by updating the test_textinput module to add gtk a skip or xfail with: test_cleanup = build_cleanup_test(toga.TextInput, xfail_platforms=("android", "gtk")). I need to continue to look at why this doesn't seem to have an effect on preventing the failure.

The check is being performed against toga.platform.current_platform, so you need to use linux rather than gtk as the condition here.

@danyeaw
Copy link
Member Author

danyeaw commented Mar 6, 2025

First - duplicate the gc.collect() line in build_cleanup_test. If there's a circular reference on the Python side, it's possible that it requires 2 GC passes to clear the reference.

That didn't have an impact, a second gc.collect() returns 0 so it didn't find any more objects.

If that doesn't work - use gc.get_referrers() to check which objects are preventing the weak reference from being disposed. That should at least point at the culprit object - and if it turns out the culprit is internal to GTK, then we'd have a lot more confidence before using...

print(gc.get_referrers(ref()))
[<toga_gtk.widgets.textinput.TextInput object at 0x7f3d9366b080>, <toga.style.applicator.TogaApplicator object at 0x7f3d93253e80>, <Box (0x0 @ 0,0)>, {'interface': <TextInput:0x7f3d9325ca60>, '_impl': <toga_gtk.widgets.textinput.TextInput object at 0x7f3d9366b080>}, <cell at 0x7f3d93294dd0: TextInput object at 0x7f3d9325ca60>, <cell at 0x7f3d93294d50: TextInput object at 0x7f3d9325ca60>, <cell at 0x7f3d93295b50: TextInput object at 0x7f3d9325ca60>, <cell at 0x7f3d93294f10: TextInput object at 0x7f3d9325ca60>]

The check is being performed against toga.platform.current_platform, so you need to use linux rather than gtk as the condition here.

Duh, of course, thanks!

I created a test of GTK4 by itself, and unfortunately I wasn't able to reproduce the reference counting issue. https://gist.github.com/danyeaw/6cd58338753fd8fbb8360dcad0bc891c

@freakboy3742
Copy link
Member

Ok - I've done some poking around, and I've found the source of the problem.

Since we were getting the GTK widget impl on the list of referents, I figured it might be useful to look at that object's referents. Most of the referents to the GTK impl are of the form <bound method TextInput.gtk_focus_in_event... - so, it looks like it's the event handlers causing a problem.

As an experiment, I modified the cleanup test so that it would explicitly remove the focus controller and key controller from the GTK impl before performing the GC pass (i.e., ref()._impl.native.remove_controller(...)) - and if you do that, a single GC pass cleans up the widget, and the test passes.

This is exactly the same logic that you tried earlier in a __del__. However, when it's in a __del__, the logic never runs - because the GTK impl is never deleted... because the impl object still has live references.

This hasn't historically been an issue with handlers on the GTK backend - I'm guessing the extra level of indirection introduced by the controller is making the GC loop impossible to resolve.

As for how to fix this - there's a slightly analogous situation on the Winforms backend. Handlers need to be passed into Python.net, at which point the references become opaque to the Python GC, so the object can't be disposed. The solution there is to use WeakrefCallable - essentially a wrapper around the bound method that makes it a weak reference that doesn't prevent garbage collection. As a quick-and-dirty experiment, I copied that code over to TextInput, and wrapped the handlers passed to the controllers... and the cleanup test passes!

So - I'm guessing the fix here is that we need to promote WeakrefCallable from a winforms utility to a core utility (toga.handlers would be as good a place as any), and then use it whenever a handler is installed through a Controller (or any other situation that stymies the GC process).

@danyeaw
Copy link
Member Author

danyeaw commented Mar 7, 2025

Hi @freakboy3742, thanks so much for the help with the investigation! I updated the PR with that approach.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So - the code for all this looks great... but when I looked at the tests on GTK4, it looks like they're all skipped, except for the cleanup test. I can reproduce the same thing locally - AFAICT, the only test that should be skipped (based on the probe definition) is the undo-redo test.

From the look of it, the problem is caused by the SimpleProbe base class, which is skipping all tests on GTK4. Uncommenting that skip then releases a cascade of errors...

@danyeaw
Copy link
Member Author

danyeaw commented Mar 14, 2025

We now have 80% of the tests passing which good.

The test_on_change_programmatic tests are failing. It appears that the changed signal is getting emitted twice for each widget.value = "value" event. I haven't uncovered why this is happening yet or where to put in a test skip for just this portion.

I also decided to skip the tests simulating a keypress by the user, this is going to require a new approach for GTK4 due to the changes in events and it seemed like a containable part for another contribution.

@freakboy3742
Copy link
Member

We now have 80% of the tests passing which good.

🎉

The test_on_change_programmatic tests are failing. It appears that the changed signal is getting emitted twice for each widget.value = "value" event. I haven't uncovered why this is happening yet or where to put in a test skip for just this portion.

This is something we've seen before; it's usually an indication that the programmatic change is triggering a signal, but but then that change is altering the value of the widget, which is also detected and reported as a "change in value". However, from a quick look at the code, I agree it's not obvious why GTK4 would be firing twice here. Does raw GTK4 usage fire 2 signals on a set_text()?

I also decided to skip the tests simulating a keypress by the user, this is going to require a new approach for GTK4 due to the changes in events and it seemed like a containable part for another contribution.

Sure - that makes sense. We don't need to solve every problem at once.

One issue though - CI looks like it's stalling after the tests/test_statusicons.py::test_activate_status_menu_item test skip. The next test should be tests/widgets/test_activityindicator.py::test_enable_noop... but that should also be a skip AFAICT, so it's not obvious to me why that would be happening. My best guess is that something is crashing during test teardown/setup, which can cause the test suite to stop running - unfortunately silently.

@danyeaw
Copy link
Member Author

danyeaw commented Mar 21, 2025

I created a minimum reproducible example with a PyGObject app and it is an upstream issue with GTK4. If you have text in an Entry, and then do a set_text, it was sending two events, one with no text and a second with the text entered. I also found the upstream bug which is here: https://gitlab.gnome.org/GNOME/gtk/-/issues/7077. I found a workaround to connect to the notify signal of the text parameter.

I have some additional tests to skip or fix, hopefully I can get that finished up soon.

@danyeaw danyeaw force-pushed the gtk4-textinput branch 2 times, most recently from 5289ae8 to 70ed79a Compare March 28, 2025 21:18
@danyeaw
Copy link
Member Author

danyeaw commented Mar 28, 2025

As you mentioned, tests are hanging in our CI environment (Ubuntu 24.04) but work fine for me locally (openSUSE Tumbleweed). All the problematic tests use probe.redraw() to wait for GTK to process events. I used the tmate runner to try to debug this for almost 3 hours. 🙃

On the test suite freezing, it shows the frozen test when I hit Ctrl+C, for example:

tests/widgets/test_activityindicator.py::test_enable_noop <- build/testbed/ubuntu/noble/testbed-0.0.1/usr/lib/testbed/app/tests/widgets/properties.py 

Examples of the initial tests that were hanging include: test_enable_noop, other tests fail like test_focus_noop, test_start_stop, test_fixed_square_widget_size, test_visibility.

Running with --slow doesn't help so it's not a timing issue. I also tried adding a delay of 0.1 to calling redraw().

In GTK4 on Ubuntu 24.04, the event processing appears to behave differently:

  • Either the system has events that are perpetually pending (causing an infinite loop in our wait condition)
  • Or events are pending for an unusually long time

I have added some logic to the redraw that xfails the test after a waiting period, which was working on the runner when I tmated in, but now it looks like the runner is stuck again.

I will try to also run this on Distrobox with Ubuntu 24.04 locally to see if I can reproduce the redraw errors.

@freakboy3742
Copy link
Member

Running with --slow doesn't help so it's not a timing issue. I also tried adding a delay of 0.1 to calling redraw().

One thing to dig into here is asyncio exceptions. Consider the following:

    def startup(self):
        ...
        self.name = toga.Label(text="Hello")
        asyncio.create_task(self.broken_task())
        ...

    async def broken_task():
        while True:
            print("tick")
            await asyncio.sleep(1)
            self.name.style = Pack(align_items="left")

left is an invalid value for align_items (it uses start, end, and center). If you were to apply that style in the startup method, you'll get a stack trace with a helpful error. But if you apply the style in the async task - it fails silently, and the async task stops running.

This is, unfortunately, asyncio operating as expected; if you want to print the exception, you need to add a done callback to the task, or explicitly put a "try: except:" wrapped around every async method.

I suspect this may be what is happening in your test. The testbed suite uses pytest-async so that test cases and fixtures can be async; If something is crashing during setup or teardown of the test, the async task will also be stopping without displaying the error. But because the entire testbed test suite is async... the controlling task awaiting that result will also crash, and eventually, the test suite stops. But it doesn't exit, because the event loop is still running. It presents as the test suite hanging... but it's actually a complete crash of the test suite with no errors displayed.

I've seen similar presentations in the past - complex dialog behaviors where you work out you're calling a method on a None object (or similar), and the test locks up as a result. I have no idea what could be causing the problem here, but it at least seems plausible as a cause.

@danyeaw danyeaw force-pushed the gtk4-textinput branch 4 times, most recently from ebbdc4a to 5de3cee Compare April 4, 2025 21:35
@danyeaw
Copy link
Member Author

danyeaw commented Apr 4, 2025

I was not able to reproduce the test failures using Ubuntu 24.04 locally. However, I was able to fix the freezing on CI by reworking the probe.rehint() to queue a redraw and use a frame clock to determine when it is complete. I'm still having issues with the textinput width and height not updating on CI consistently that I'll need to keep working at.

@danyeaw
Copy link
Member Author

danyeaw commented May 3, 2025

I can now reproduce the GitHub Actions failures locally, so some progress. Here are the steps:

  1. Created a distrobox Ubuntu 24.04 environment
  2. Installed the same packages as CI (mutter pkg-config python3-dev libgirepository-2.0-dev libcairo2-dev gir1.2-webkit-6.0 gir1.2-xapp-1.0 gir1.2-geoclue-2.0 gir1.2-flatpak-1.0 gir1.2-gtk-4.0 build-essential xvfb)
  3. Created a new venv using the system Python 3.12.3
  4. Ensured pip was the latest and installed Briefcase from the latest main branch
  5. Launched a xvfb and then launched mutter inside of it using the same arguments as CI
  6. Set the XDG_RUNTIME_DIR to /tmp
  7. I can reproduce the test failure if I run the whole testbed only, it passes if I filter out only that test or run only textinput tests

More work to do to figure out why changes to flex styling aren't propagating correctly when running the full testbed.

@danyeaw
Copy link
Member Author

danyeaw commented May 12, 2025

More troubleshooting:

  1. It doesn't appear to be time related, adding a long redraw delay doesn't resolve the test failure
  2. Randomizing the tests with pytest-randomly results in a new failure from tests/widgets/test_textinput.py::test_font <- tests/widgets/properties.py, which is in the same widgets/properties module.

danyeaw and others added 28 commits May 20, 2025 15:50
Co-authored-by: Muhammad Murad <[email protected]>
Co-authored-by: Muhammad Murad <[email protected]>
This commit resolves variable binding issues in the test_cleanup function
when used with pytest's parameterization. The errors appear as runtime
NameErrors.
@danyeaw
Copy link
Member Author

danyeaw commented May 20, 2025

I worked on this over the last couple of days at the PyCon 2025 sprints, and also did some troubleshooting in person with @freakboy3742 (thanks Russ!).

During the set-up of the test, a Box is created on the Window and the TextInput is added to it. Then it runs queue_draw, but the there are no pending events from GLib. We have traced through debug print statements and it seems like the right steps are being taken by Toga, and the queue_draw is being dropped or filtered by GTK. The next step is for me to try to investigate the GTK code for why this is happening under this very specific condition.

[DEBUG Applicator] set_bounds called <Box:0x7f0e7291ac70> with layout <Box (640x443 @ 0,0)>
[DEBUG CONTAINER] Container marked as dirty (no specific widget)
[DEBUG CONTAINER] Resize queued
[DEBUG CONTAINER] Container refreshed called
[DEBUG CONTAINER] Widget <toga_gtk.widgets.textinput.TextInput object at 0x7f0e72906e00> marked as dirty, dirty count: 1
[DEBUG CONTAINER] Resize queued
[DEBUG CONTAINER] Widget <toga_gtk.widgets.box.Box object at 0x7f0e72918fb0> marked as dirty, dirty count: 2
[DEBUG CONTAINER] Resize queued
[DEBUG CONTAINER] recompute() called. Has content: True, Dirty widgets: 2
[DEBUG CONTAINER] Rehinting 2 dirty widgets
[DEBUG CONTAINER] Rehinting widget <toga_gtk.widgets.textinput.TextInput object at 0x7f0e72906e00>
[DEBUG TextInput] REHINT <toga_gtk.widgets.textinput.TextInput object at 0x7f0e72906e00> 190 34
[DEBUG CONTAINER] Rehinting widget <toga_gtk.widgets.box.Box object at 0x7f0e72918fb0>
[DEBUG CONTAINER] Recomputing layout for content
[DEBUG CONTAINER] width property: returning 640
[DEBUG CONTAINER] height property: returning 443
[DEBUG CONTAINER] Min size updated: 0x0 -> 100x34
[DEBUG CONTAINER] width property: returning 640
[DEBUG CONTAINER] height property: returning 443
[DEBUG Applicator] set_bounds called <Box:0x7f0e7291ac70> with layout <Box (640x443 @ 0,0)>
[DEBUG CONTAINER] Container marked as dirty (no specific widget)
[DEBUG CONTAINER] Resize queued
[DEBUG Applicator] set_bounds called <TextInput:0x7f0e72eeddc0> with layout <Box (100x34 @ 0,0)>
[DEBUG CONTAINER] Container marked as dirty (no specific widget)
[DEBUG CONTAINER] Resize queued
[DEBUG CONTAINER] Container refreshed called
[DEBUG REDRAW] Redraw called with message: 
Constructing TextInput probe, delay: 0
[DEBUG REDRAW] Native widget available, calling queue_draw on <Gtk.Entry object at 0x7f0e7293f1d0 (GtkEntry at 0x251af140)>
[DEBUG REDRAW] Processing events to ensure UI is fully updated
[DEBUG REDRAW] GTK4: No more events pending after 0 iterations
[DEBUG REDRAW] Yielding control with asyncio.sleep(0)
[DEBUG REDRAW] Redraw method complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants