-
Notifications
You must be signed in to change notification settings - Fork 0
TIMX 385 - browsable, filterable Records table #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
21e469c to
7b7c0e6
Compare
Pull Request Test Coverage Report for Build 11801095979Details
💛 - Coveralls |
7b7c0e6 to
6d28eb0
Compare
Why these changes are being introduced:
One affordance of the webapp is viewing individual records from
a run, seeing a summary of A/B differences, the full A/B records,
and a side-by-side comparison. But when runs may contain lots of
records, some kind of interface is needed to identify records for
viewing (where once the timdex_record_id is known, the Record page
only requires that).
Previously, a "Record Samples" section appeared on the Run page
that linked to a standalone table of records that met some kind
of criteria (e.g. source = X, or field Y has diffs). This was
functional, but had drawbacks:
- this static HTML could not handle large number of records, meaning
a representative sample was used, which prevented access to all records
- the combinations of dimensions to drill down into was limited by the
static sample pages of records
How this addresses that need:
* Removes all "Record Samples" approaches
* Replaces with a single table in the Run page
* filterable by source, whether records had modified specific modified
fields, even full-text search of the records themselves
* This single table provides a mechanism to browse and filter records
from the run, from a single interface, with arguably simpler logic
under the hood to power it
Side effects of this change:
* None
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-385
6d28eb0 to
b9c9b72
Compare
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my review, I followed the instructions outlined in the PR description, and I cannot express how awesome these changes are! I was amazed at how quickly it was able to run queries against the record dataset and how user-friendly the interface is. Excellent job. I did pose one question + potential request for change!
Thanks! FWIW, once we get into the 3-4m records, I think we'll see a significant delay in the responsiveness of the table. At that time, we can see if we need optimizations, or if it's just 1-2 seconds, perhaps that is okay for this diagnostic tool. |
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Purpose and background context
This PR reworks how to review individual records from a run. This is accomplished by removing the "Record Samples" section in the Run page, which pointed to a standalone, static HTML page of Record links, and replacing it with a single, filterable, browsable table of all records.
The previous approach was functional, but had some drawbacks. The "Samples" page was static HTML, and did not lend itself to large numbers of records. So, a "sample" of Records that matched the criteria was displayed, but this necessarily was not showing all records that matched that critiera. And, this approach had no inroads for viewing records with no diff, where it could still be helpful to look at the A/B versions, even if identical. The list goes on.
The approach taken here is a single table, capable of handling huge numbers of records. By layering on filtering and searching, it avoids the issue of thinking of combinations in advance of records to view. It also -- by default -- shows all records, allowing for a more browsing experience. The following is a screenshot of what it looks like:
The Javascript library DataTables is used for this table, which is highly configurable. The approach of "server-side" data preparation is used, so our Flask webapp is doing the heavy lifting of performing queries, sorting results, etc. The actual table then just makes lightweight requests to the flask data endpoint each time information is needed, or filters applied, sorting, etc. In this way, while the records dataset may contain 4-5m records, we'd only ever be sending 10-100 records (depending on "Records per Page" setting) to the table itself.
How can a reviewer manually see the effects of these changes?
1- Set production AWS credentials
2- Init job (where
4fcb617is an fairly old commit that generates some diffs):3- Generate a CSV of input files and perform a run:
4- View the job:
Here are some things to try with the "Records" table at the bottom of the run page. Recommended to click "Reset Filters" button between them (NOTE: the file counts are approximate, given changes in records depending on when run):
libguides, note count goes down to 372researchdatabases, note count goes back up to 1,290 because effectively all records againresearchdatabases+ fieldpublishers, note that zero records match this combotimdex_record_idas well, as it's part of the source recordIncludes new or updated dependencies?
YES: but only in the webapp, the DataTables javascript and CSS files
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)