Skip to content

Conversation

@corranwebster
Copy link
Contributor

@corranwebster corranwebster commented Dec 31, 2025

This implements the first phase discussed in #4041 - a refactor of Table and Tree that pulls common logic into a Column object. This is a purely implementation detail at present, the Column objects are not exposed to the users in any way.

In particular, all current tests should pass unmodified.

Internally the Tree and Table have grown three new attributes:

  • _columns: the list of column objects
  • _show_headers: a bool that is true when no headings were passed
  • _data_accessors: these are the accessors used when creating an ad-hoc ListSource from lists or dicts. The behaviour of accessors on column changes is a bit unclear, but looking at the code you can have more accessors than just those used in the headers. In future it might be good to make the behaviour a bit more specific, particularly around adding/removing columns (eg. should a completely new accessor be added to the list, where should it be added, and what does that do to existing data sources? Similarly for removal.).

This is at a state where it can be reviewed as a purely internal refactor. We probably want to expose some of these changes as part of the public API, but given that there is some complexity in this PR it's probably better to make those changes in a separate PR.

Ref #4041.
Fixes #4071.

To do:

  • Refactor Table and Tree interfaces.
  • Add text, icon, etc. methods
  • Tests for Column objects.
  • Make backend implementations use column interface.
    • Dummy Nothing to do.
    • Android
    • Cocoa
    • Gtk
    • iOS Nothing to do, no Table or Tree widget.
    • Winforms
  • Remove all mention of accessors from implementations (they're still used as column ids in a few backends, I think) Add an id property and use that where a column needs a name or identifier.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

This should be semanitically neutral: no changes to API, all tests should pass
unmodified.
@corranwebster corranwebster marked this pull request as ready for review January 1, 2026 13:51
@corranwebster
Copy link
Contributor Author

This is ready for a review.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad strokes of this look really good. I've flagged a couple of things inline. The biggest issue I can see is the data_accessors inconsistency; otherwise, the issues are mostly cosmetic/borderline bike shed, or likely oversights.

if self._headings is not None:
del self._headings[index]
del self._accessors[index]
del self._columns[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no need to clean up _data_accessors here?

Copy link
Contributor Author

@corranwebster corranwebster Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional - the link between headings and accessors is no longer via two synchronised lists, but via the columns.

But I'm not sure I properly understand what the expected behaviour of accessors were previously (that's what I'm getting at in the comment at the top of the PR). For ListSources of mappings everything is fine, because the Rows get their data from the keys of the mappings and as long as the column accessors match the keys, everything's good.

But for lists of sequences the mapping from accessors to rows is purely positional so you get situations like this:

data = [
    ((yak, "The Secret Life of Bees"), 2008, (green, 7.3), "Drama"),
    ((None, "Bee Movie"), 2007, (red, 6.1), "Animation, Adventure"),
    ...
]
# We don't show the last column initially
table = Table(headings=['Title', 'Year', 'Rating'], data=data) 
# table.data is now a ListSource where the row objects have 3 attributes: `title`, `year` and `rating`.
table.append_column('Genre')
# We've added a 4th column, but it's empty even though we gave the table 4 columns,
# because the ListSource was constructed with only three values.

# You can get around this by passing in 4 accessors directly:
table = Table(headings=['Title', 'Year', 'Rating'], accessors=['title', 'year', 'rating', 'genre'], data=data) 
# because then table.data has rows with all 4 accessors.
# But when we do:
table.append_column('Genre')
# it shouldn't add an additional 'genre' accessor at the end.

You can get something similar in the example app by doing: delete genre column, reset data, restore genre column:

Image

Or when you remove a column, things seem to work but then get odd when if you change the data:

table = Table(headings=['Title', 'Year', 'Rating', 'Genre'], data=data) 
# delete the rating column
table.delete_column('Year')
# all looks good, but `_accessors` is now `["title", "rating", "genre"]` so if you reload the data
table.data = data
# table.data is now a new ListSource but with only the three accessors and they no longer match the order. So you get years in the Rating column and ratings in the Genre column

With a tweak to the table example to delete years instead of Genre you can see this:
Image
(Change it to delete 'year', then press the delete button, then the 'reset data' button)

Neither of these behaviours seem right, but I'm not sure if that's a bug in the example and my expectation of how things should work For example, it might be reasonable to say that if you set the data using a list of sequences after changing the heading structure you need to make sure the order of the sequences matches the order of the columns.

So the _data_accessors is my attempt to square this circle: the idea is that it is the initial set of accessors passed in which tells the table the order of all known data columns when given sequential data, independent of how the table/view columns might change during the life-cycle of the app.

Again, this only affects data which is passed in as lists of sequences.

So I guess the question is what is the design intent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the question is what is the design intent?

Fundamentally, the use case was to allow the Row/Node objects to store data that isn't necessarily rendered to the user in a column - e.g., an ID number for objects that are being represented in the table. To that end, the interpretation you've taken with _data_accessors sees fairly close to the design intent - at time of construction, capture the attributes that will be retained in the Source.

As you note, the gap you've found with adding/removing columns is almost entirely because the original data was specified as tuples/lists. Accessors have two purposes - identifying the name of the attribute to retrieve from the row; and determining how to map a tuple into a list of attributes. The second use case only exists when data is provided as a tuple or list; when a datum is provided as a dictionary, the dictionary keys give you the full list of attributes to preserve.

I think I'm happy calling this an edge case that we can document - that the mapping of lists to Row is baked at time of construction, and if you want fully dynamic column addition/removal, you probably want to use a dictionary or custom Row.

The only other thought I've got would be to replace _data_accessors with the accessor list on the Source.

We make accessors an optional property on the ListSource that returns the list of accessors that are "known" to the Source. We create an empty ListSource when the Table is constructed using the accessors that come from construction.

When new data is assigned, the list of accessors is preserved from the existing ListSource. If the user provides a custom Source, that logic is ignored; and if the user tries to replace a custom Source with an ad-hoc one, we raise an error (or maybe we build an accessor list based on the current columns).

If the user tries to insert a column, we can validate if that column is on the list of known columns, and raise an error if it isn't. If the source doesn't provide an accessor list, then we don't do any validation.

That removes the need for the duplicated accessor tracking, as well as providing a way to give additional validation for the edge cases where a problem could exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other thought I've got would be to replace _data_accessors with the accessor list on the Source.

There's a certain sense to this it really isn't the widget's business to keep track of how the data should handle adding sequence data to itself. This use of the accessors is really a data model thing. It might even make sense to call it something like "accessor_order".

But for this PR, maybe we shouldn't try to change the data model. Perhaps re-name the _data_accessors to _data_accessor_order or something to make the role clearer, document the behaviour, and then have a separate PR to re-work accessors where there may be some new features.

Copy link
Contributor Author

@corranwebster corranwebster Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've turned this into an issue: #4071

Edit: and another to track the idea of making accessors optional and public on TableSource and TreeSource objects, #4072

@corranwebster
Copy link
Contributor Author

I think this is ready for a re-review.

Regarding _show_headings: I deliberately didn't make this a change to the public API in this PR, but if you think that it is OK I can add a simple read-only property

@property
def show_headings(self):
    return self._show_headings

and use that in the implementations.

@freakboy3742
Copy link
Member

Regarding _show_headings: I deliberately didn't make this a change to the public API in this PR, but if you think that it is OK I can add a simple read-only property

I wasn't thinking about making it part of the public API; I was more concerned about inadvertently changing a flag which can't change after it is initially set. However, it is marked as a private property, so I guess the overhead of making it a private property isn't really worth it.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed some minor cleanups to the Source test case; plus the outstanding (and ongoing) question around data_accessors.

The other question I have is what you see as the "end game" of removing accessors entirely. You've flagged accessor as a "to be removed" feature of the Column protocol, but I'm not sure I understand what final state would permit that. This possibly ties into the discussion about data_accessors.

("Heading", None, "Heading", "heading"),
],
)
def test_accessor_column(heading, accessor, heading_property, accessor_property):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to add a comment on every test giving a 1-line summary of the purpose of the test.


def test_accessor_column_failure():
with pytest.raises(
ValueError, match="Cannot create a column without either headings or accessors"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is valid black style, we try to avoid the "indented on line below, all on one line" format. If a line is long enough that we need to worry about length, and it has multiple arguments, it's worth splitting to 1-argument-per-line,

if self._headings is not None:
del self._headings[index]
del self._accessors[index]
del self._columns[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the question is what is the design intent?

Fundamentally, the use case was to allow the Row/Node objects to store data that isn't necessarily rendered to the user in a column - e.g., an ID number for objects that are being represented in the table. To that end, the interpretation you've taken with _data_accessors sees fairly close to the design intent - at time of construction, capture the attributes that will be retained in the Source.

As you note, the gap you've found with adding/removing columns is almost entirely because the original data was specified as tuples/lists. Accessors have two purposes - identifying the name of the attribute to retrieve from the row; and determining how to map a tuple into a list of attributes. The second use case only exists when data is provided as a tuple or list; when a datum is provided as a dictionary, the dictionary keys give you the full list of attributes to preserve.

I think I'm happy calling this an edge case that we can document - that the mapping of lists to Row is baked at time of construction, and if you want fully dynamic column addition/removal, you probably want to use a dictionary or custom Row.

The only other thought I've got would be to replace _data_accessors with the accessor list on the Source.

We make accessors an optional property on the ListSource that returns the list of accessors that are "known" to the Source. We create an empty ListSource when the Table is constructed using the accessors that come from construction.

When new data is assigned, the list of accessors is preserved from the existing ListSource. If the user provides a custom Source, that logic is ignored; and if the user tries to replace a custom Source with an ad-hoc one, we raise an error (or maybe we build an accessor list based on the current columns).

If the user tries to insert a column, we can validate if that column is on the list of known columns, and raise an error if it isn't. If the source doesn't provide an accessor list, then we don't do any validation.

That removes the need for the duplicated accessor tracking, as well as providing a way to give additional validation for the edge cases where a problem could exist.

@corranwebster
Copy link
Contributor Author

The other question I have is what you see as the "end game" of removing accessors entirely. You've flagged accessor as a "to be removed" feature of the Column protocol, but I'm not sure I understand what final state would permit that. This possibly ties into the discussion about data_accessors.

Currently there are a few places in the implementations where the code was using accessors directly - I converted those to use column.accessor - but they are still in the code. For example, here:
https://github.com/corranwebster/toga/blob/885f5e5d8746e405a0eae6d0803b9c54ecde12bf/cocoa/src/toga_cocoa/widgets/tree.py#L257
and here:
https://github.com/corranwebster/toga/blob/885f5e5d8746e405a0eae6d0803b9c54ecde12bf/winforms/src/toga_winforms/widgets/table.py#L182
So currently any Column subclass must have an accessor attribute. They seem to be being used as an internal id for the column. As far as I can tell, we aren't using these column ids ourselves, but they are on backends where I am unfamiliar with the requirements of the tables so I didn't want to just get rid of them.

But when thinking about Columns I can see situations where you don't have just a single accessor. For example, a "TotalColumn" that uses multiple accessors to look up a bunch of column values and add them together. Or a Column subclass that uses different accessors for icons and text. So I don't see the accessor as an intrinsic part of the Column API.

So there might need to be an id property for columns that can be used by the implementations when they need an internal name and the title won't do or isn't available.

I could implement the id that as part of this PR, where the concrete column implementation uses the accessor as its id. But if the code in the backends isn't needed at all, then the id is just noise.

So that's the main thought behind that comment. Longer term, if there isn't a 1-to-1 relationship between columns and accessors, then the accessors property on the Table and Tree classes isn't really needed, and I think longer term it would be replaced by a property that just returned the actual column objects.

@corranwebster
Copy link
Contributor Author

I think everything is addressed:

@freakboy3742
Copy link
Member

  • I've removed accessor from the Column protocol and replaced it with id, which for AccessorColumn is just the accessor value. The backend implementations now use column.id instead of column.accessor which means the way is clear for column objects that have different strategies for getting values from rows.

The idea makes sense; but is there any reason we couldn't just use id(column) or hash(column) (or str() of those where a string is required) here? The Column instance should be unique and persistent; is there any reason to add an extra id attribute when we can generate one? That makes the protocol one step easier to implement.

+1 to both of these.

+1 to the docs additions; however, the "notes" section should be more for "platform notes". I'd suggest the added paragraph should be in the body text in the section talking about how accessors are interpreted; with another note in the documentation of the accessors argument to the Table/Tree constructor.

@corranwebster
Copy link
Contributor Author

The idea makes sense; but is there any reason we couldn't just use id(column) or hash(column) (or str() of those where a string is required) here? The Column instance should be unique and persistent; is there any reason to add an extra id attribute when we can generate one? That makes the protocol one step easier to implement.

I think str(id(column)) that would work, and would be safer if the identifiers need to be distinct. I implemented it this way simply because the behaviour would be exactly the same as before this PR, for better or worse.

@freakboy3742
Copy link
Member

I implemented it this way simply because the behaviour would be exactly the same as before this PR, for better or worse.

I see where that motivation comes from, but the internal identification of native widgets is very much an implementation detail, not a public API surface. I'm find if it changes between versions.

@corranwebster
Copy link
Contributor Author

Fixed the id and documentation issues.

I've opened #4074 because the accessor docstring doesn't look like it's being rendered right for Trees and Tables.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Even if we don't do any further work on Table/Tree columns, this is a significant cleanup and simplification of a bunch of duplicated logic; but it also provides a really solid launchpad for a better public API for Column representations, and it's pointed out a bunch of edge cases and bugs along the way. Thanks for all your work on this!

@freakboy3742 freakboy3742 merged commit e91af18 into beeware:main Jan 13, 2026
112 of 113 checks passed
@freakboy3742 freakboy3742 mentioned this pull request Jan 13, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accessors can get out of sync when adding/removing columns in Table widget

3 participants