-
-
Notifications
You must be signed in to change notification settings - Fork 791
Refactor Table and Tree to use Column objects internally
#4042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should be semanitically neutral: no changes to API, all tests should pass unmodified.
Should be no change in behaviour.
|
This is ready for a review. |
freakboy3742
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad strokes of this look really good. I've flagged a couple of things inline. The biggest issue I can see is the data_accessors inconsistency; otherwise, the issues are mostly cosmetic/borderline bike shed, or likely oversights.
| if self._headings is not None: | ||
| del self._headings[index] | ||
| del self._accessors[index] | ||
| del self._columns[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there no need to clean up _data_accessors here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional - the link between headings and accessors is no longer via two synchronised lists, but via the columns.
But I'm not sure I properly understand what the expected behaviour of accessors were previously (that's what I'm getting at in the comment at the top of the PR). For ListSources of mappings everything is fine, because the Rows get their data from the keys of the mappings and as long as the column accessors match the keys, everything's good.
But for lists of sequences the mapping from accessors to rows is purely positional so you get situations like this:
data = [
((yak, "The Secret Life of Bees"), 2008, (green, 7.3), "Drama"),
((None, "Bee Movie"), 2007, (red, 6.1), "Animation, Adventure"),
...
]
# We don't show the last column initially
table = Table(headings=['Title', 'Year', 'Rating'], data=data)
# table.data is now a ListSource where the row objects have 3 attributes: `title`, `year` and `rating`.
table.append_column('Genre')
# We've added a 4th column, but it's empty even though we gave the table 4 columns,
# because the ListSource was constructed with only three values.
# You can get around this by passing in 4 accessors directly:
table = Table(headings=['Title', 'Year', 'Rating'], accessors=['title', 'year', 'rating', 'genre'], data=data)
# because then table.data has rows with all 4 accessors.
# But when we do:
table.append_column('Genre')
# it shouldn't add an additional 'genre' accessor at the end.You can get something similar in the example app by doing: delete genre column, reset data, restore genre column:
Or when you remove a column, things seem to work but then get odd when if you change the data:
table = Table(headings=['Title', 'Year', 'Rating', 'Genre'], data=data)
# delete the rating column
table.delete_column('Year')
# all looks good, but `_accessors` is now `["title", "rating", "genre"]` so if you reload the data
table.data = data
# table.data is now a new ListSource but with only the three accessors and they no longer match the order. So you get years in the Rating column and ratings in the Genre columnWith a tweak to the table example to delete years instead of Genre you can see this:

(Change it to delete 'year', then press the delete button, then the 'reset data' button)
Neither of these behaviours seem right, but I'm not sure if that's a bug in the example and my expectation of how things should work For example, it might be reasonable to say that if you set the data using a list of sequences after changing the heading structure you need to make sure the order of the sequences matches the order of the columns.
So the _data_accessors is my attempt to square this circle: the idea is that it is the initial set of accessors passed in which tells the table the order of all known data columns when given sequential data, independent of how the table/view columns might change during the life-cycle of the app.
Again, this only affects data which is passed in as lists of sequences.
So I guess the question is what is the design intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the question is what is the design intent?
Fundamentally, the use case was to allow the Row/Node objects to store data that isn't necessarily rendered to the user in a column - e.g., an ID number for objects that are being represented in the table. To that end, the interpretation you've taken with _data_accessors sees fairly close to the design intent - at time of construction, capture the attributes that will be retained in the Source.
As you note, the gap you've found with adding/removing columns is almost entirely because the original data was specified as tuples/lists. Accessors have two purposes - identifying the name of the attribute to retrieve from the row; and determining how to map a tuple into a list of attributes. The second use case only exists when data is provided as a tuple or list; when a datum is provided as a dictionary, the dictionary keys give you the full list of attributes to preserve.
I think I'm happy calling this an edge case that we can document - that the mapping of lists to Row is baked at time of construction, and if you want fully dynamic column addition/removal, you probably want to use a dictionary or custom Row.
The only other thought I've got would be to replace _data_accessors with the accessor list on the Source.
We make accessors an optional property on the ListSource that returns the list of accessors that are "known" to the Source. We create an empty ListSource when the Table is constructed using the accessors that come from construction.
When new data is assigned, the list of accessors is preserved from the existing ListSource. If the user provides a custom Source, that logic is ignored; and if the user tries to replace a custom Source with an ad-hoc one, we raise an error (or maybe we build an accessor list based on the current columns).
If the user tries to insert a column, we can validate if that column is on the list of known columns, and raise an error if it isn't. If the source doesn't provide an accessor list, then we don't do any validation.
That removes the need for the duplicated accessor tracking, as well as providing a way to give additional validation for the edge cases where a problem could exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other thought I've got would be to replace
_data_accessorswith the accessor list on the Source.
There's a certain sense to this it really isn't the widget's business to keep track of how the data should handle adding sequence data to itself. This use of the accessors is really a data model thing. It might even make sense to call it something like "accessor_order".
But for this PR, maybe we shouldn't try to change the data model. Perhaps re-name the _data_accessors to _data_accessor_order or something to make the role clearer, document the behaviour, and then have a separate PR to re-work accessors where there may be some new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I think this is ready for a re-review. Regarding and use that in the implementations. |
I wasn't thinking about making it part of the public API; I was more concerned about inadvertently changing a flag which can't change after it is initially set. However, it is marked as a private property, so I guess the overhead of making it a private property isn't really worth it. |
freakboy3742
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed some minor cleanups to the Source test case; plus the outstanding (and ongoing) question around data_accessors.
The other question I have is what you see as the "end game" of removing accessors entirely. You've flagged accessor as a "to be removed" feature of the Column protocol, but I'm not sure I understand what final state would permit that. This possibly ties into the discussion about data_accessors.
| ("Heading", None, "Heading", "heading"), | ||
| ], | ||
| ) | ||
| def test_accessor_column(heading, accessor, heading_property, accessor_property): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to add a comment on every test giving a 1-line summary of the purpose of the test.
core/tests/sources/test_columns.py
Outdated
|
|
||
| def test_accessor_column_failure(): | ||
| with pytest.raises( | ||
| ValueError, match="Cannot create a column without either headings or accessors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is valid black style, we try to avoid the "indented on line below, all on one line" format. If a line is long enough that we need to worry about length, and it has multiple arguments, it's worth splitting to 1-argument-per-line,
| if self._headings is not None: | ||
| del self._headings[index] | ||
| del self._accessors[index] | ||
| del self._columns[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the question is what is the design intent?
Fundamentally, the use case was to allow the Row/Node objects to store data that isn't necessarily rendered to the user in a column - e.g., an ID number for objects that are being represented in the table. To that end, the interpretation you've taken with _data_accessors sees fairly close to the design intent - at time of construction, capture the attributes that will be retained in the Source.
As you note, the gap you've found with adding/removing columns is almost entirely because the original data was specified as tuples/lists. Accessors have two purposes - identifying the name of the attribute to retrieve from the row; and determining how to map a tuple into a list of attributes. The second use case only exists when data is provided as a tuple or list; when a datum is provided as a dictionary, the dictionary keys give you the full list of attributes to preserve.
I think I'm happy calling this an edge case that we can document - that the mapping of lists to Row is baked at time of construction, and if you want fully dynamic column addition/removal, you probably want to use a dictionary or custom Row.
The only other thought I've got would be to replace _data_accessors with the accessor list on the Source.
We make accessors an optional property on the ListSource that returns the list of accessors that are "known" to the Source. We create an empty ListSource when the Table is constructed using the accessors that come from construction.
When new data is assigned, the list of accessors is preserved from the existing ListSource. If the user provides a custom Source, that logic is ignored; and if the user tries to replace a custom Source with an ad-hoc one, we raise an error (or maybe we build an accessor list based on the current columns).
If the user tries to insert a column, we can validate if that column is on the list of known columns, and raise an error if it isn't. If the source doesn't provide an accessor list, then we don't do any validation.
That removes the need for the duplicated accessor tracking, as well as providing a way to give additional validation for the edge cases where a problem could exist.
Currently there are a few places in the implementations where the code was using accessors directly - I converted those to use But when thinking about Columns I can see situations where you don't have just a single accessor. For example, a "TotalColumn" that uses multiple accessors to look up a bunch of column values and add them together. Or a Column subclass that uses different accessors for icons and text. So I don't see the accessor as an intrinsic part of the Column API. So there might need to be an I could implement the So that's the main thought behind that comment. Longer term, if there isn't a 1-to-1 relationship between columns and accessors, then the |
|
I think everything is addressed:
|
The idea makes sense; but is there any reason we couldn't just use
+1 to both of these.
+1 to the docs additions; however, the "notes" section should be more for "platform notes". I'd suggest the added paragraph should be in the body text in the section talking about how accessors are interpreted; with another note in the documentation of the |
I think |
I see where that motivation comes from, but the internal identification of native widgets is very much an implementation detail, not a public API surface. I'm find if it changes between versions. |
|
Fixed the id and documentation issues. I've opened #4074 because the accessor docstring doesn't look like it's being rendered right for Trees and Tables. |
freakboy3742
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Even if we don't do any further work on Table/Tree columns, this is a significant cleanup and simplification of a bunch of duplicated logic; but it also provides a really solid launchpad for a better public API for Column representations, and it's pointed out a bunch of edge cases and bugs along the way. Thanks for all your work on this!
This implements the first phase discussed in #4041 - a refactor of
TableandTreethat pulls common logic into aColumnobject. This is a purely implementation detail at present, theColumnobjects are not exposed to the users in any way.In particular, all current tests should pass unmodified.
Internally the Tree and Table have grown three new attributes:
_columns: the list of column objects_show_headers: a bool that is true when no headings were passed_data_accessors: these are the accessors used when creating an ad-hocListSourcefrom lists or dicts. The behaviour of accessors on column changes is a bit unclear, but looking at the code you can have more accessors than just those used in the headers. In future it might be good to make the behaviour a bit more specific, particularly around adding/removing columns (eg. should a completely new accessor be added to the list, where should it be added, and what does that do to existing data sources? Similarly for removal.).This is at a state where it can be reviewed as a purely internal refactor. We probably want to expose some of these changes as part of the public API, but given that there is some complexity in this PR it's probably better to make those changes in a separate PR.
Ref #4041.
Fixes #4071.
To do:
TableandTreeinterfaces.text,icon, etc. methodsColumnobjects.DummyNothing to do.iOSNothing to do, no Table or Tree widget.Remove all mention of accessors from implementations (they're still used as column ids in a few backends, I think)Add anidproperty and use that where a column needs a name or identifier.PR Checklist: