TIMX 385 - browsable, filterable Records table #53

ghukill · 2024-11-07T19:58:51Z

Purpose and background context

This PR reworks how to review individual records from a run. This is accomplished by removing the "Record Samples" section in the Run page, which pointed to a standalone, static HTML page of Record links, and replacing it with a single, filterable, browsable table of all records.

The previous approach was functional, but had some drawbacks. The "Samples" page was static HTML, and did not lend itself to large numbers of records. So, a "sample" of Records that matched the criteria was displayed, but this necessarily was not showing all records that matched that critiera. And, this approach had no inroads for viewing records with no diff, where it could still be helpful to look at the A/B versions, even if identical. The list goes on.

The approach taken here is a single table, capable of handling huge numbers of records. By layering on filtering and searching, it avoids the issue of thinking of combinations in advance of records to view. It also -- by default -- shows all records, allowing for a more browsing experience. The following is a screenshot of what it looks like:

The Javascript library DataTables is used for this table, which is highly configurable. The approach of "server-side" data preparation is used, so our Flask webapp is doing the heavy lifting of performing queries, sorting results, etc. The actual table then just makes lightweight requests to the flask data endpoint each time information is needed, or filters applied, sorting, etc. In this way, while the records dataset may contain 4-5m records, we'd only ever be sending 10-100 records (depending on "Records per Page" setting) to the table itself.

How can a reviewer manually see the effects of these changes?

1- Set production AWS credentials

2- Init job (where 4fcb617 is an fairly old commit that generates some diffs):

pipenv run abdiff --verbose init-job -d output/jobs/records -a 4fcb617 -b 3f77c7c

3- Generate a CSV of input files and perform a run:

pipenv run abdiff --verbose timdex-sources-csv -o output/jobs/records/springshare.csv -s libguides,researchdatabases
pipenv run abdiff --verbose run-diff -d output/jobs/records -i output/jobs/records/springshare.csv

4- View the job:

pipenv run abdiff --verbose view-job -d output/jobs/records

Here are some things to try with the "Records" table at the bottom of the run page. Recommended to click "Reset Filters" button between them (NOTE: the file counts are approximate, given changes in records depending on when run):

Without any filters, note records count is 1,290
- click source libguides, note count goes down to 372
- hold ctrl (or cmd?) and click researchdatabases, note count goes back up to 1,290 because effectively all records again
- this is multi-entry selection, which works for fields as well
Click source researchdatabases + field publishers, note that zero records match this combo
Enter "Chemical Engineering" into the search box, note that two records show up
- this was fulltext searching the actual A and B records, so not obvious where it matched, but clicking into either record will reveal it's somewhere in the record body
Enter "libguides:guides-176330" in search box, one result comes back, showing how it could be used for timdex_record_id as well, as it's part of the source record

Includes new or updated dependencies?

YES: but only in the webapp, the DataTables javascript and CSS files

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/TIMX-385

Developer

All new ENV is documented in README
All new ENV has been added to staging and production environments
All related Jira tickets are linked in commit message(s)
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

The commit message is clear and follows our guidelines (not just this PR message)
There are appropriate tests covering any new functionality
The provided documentation is sufficient for understanding any new functionality introduced
Any manual tests have been performed or provided examples verified
New dependencies are appropriate or there were no changes

coveralls · 2024-11-07T21:26:14Z

Pull Request Test Coverage Report for Build 11801095979

Details

52 of 57 (91.23%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+1.4%) to 86.688%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
abdiff/webapp/app.py	5	10	50.0%

Files with Coverage Reduction	New Missed Lines	%
abdiff/webapp/app.py	1	88.73%

Totals
Change from base Build 11800813131:	1.4%
Covered Lines:	827
Relevant Lines:	954

💛 - Coveralls

Why these changes are being introduced: One affordance of the webapp is viewing individual records from a run, seeing a summary of A/B differences, the full A/B records, and a side-by-side comparison. But when runs may contain lots of records, some kind of interface is needed to identify records for viewing (where once the timdex_record_id is known, the Record page only requires that). Previously, a "Record Samples" section appeared on the Run page that linked to a standalone table of records that met some kind of criteria (e.g. source = X, or field Y has diffs). This was functional, but had drawbacks: - this static HTML could not handle large number of records, meaning a representative sample was used, which prevented access to all records - the combinations of dimensions to drill down into was limited by the static sample pages of records How this addresses that need: * Removes all "Record Samples" approaches * Replaces with a single table in the Run page * filterable by source, whether records had modified specific modified fields, even full-text search of the records themselves * This single table provides a mechanism to browse and filter records from the run, from a single interface, with arguably simpler logic under the hood to power it Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-385

jonavellecuerdo

For my review, I followed the instructions outlined in the PR description, and I cannot express how awesome these changes are! I was amazed at how quickly it was able to run queries against the record dataset and how user-friendly the interface is. Excellent job. I did pose one question + potential request for change!

abdiff/webapp/app.py

ghukill · 2024-11-12T17:52:05Z

For my review, I followed the instructions outlined in the PR description, and I cannot express how awesome these changes are! I was amazed at how quickly it was able to run queries against the record dataset and how user-friendly the interface is. Excellent job. I did pose one question + potential request for change!

Thanks! FWIW, once we get into the 3-4m records, I think we'll see a significant delay in the responsiveness of the table. At that time, we can see if we need optimizations, or if it's just 1-2 seconds, perhaps that is okay for this diagnostic tool.

ehanson8

Looks great!

abdiff/webapp/app.py

ghukill force-pushed the TIMX-385-records-preview branch from 21e469c to 7b7c0e6 Compare November 7, 2024 21:23

ghukill force-pushed the TIMX-385-records-preview branch from 7b7c0e6 to 6d28eb0 Compare November 12, 2024 15:54

ghukill marked this pull request as ready for review November 12, 2024 15:59

ghukill requested review from ehanson8 and jonavellecuerdo November 12, 2024 16:00

ghukill force-pushed the TIMX-385-records-preview branch from 6d28eb0 to b9c9b72 Compare November 12, 2024 16:05

jonavellecuerdo reviewed Nov 12, 2024

View reviewed changes

abdiff/webapp/app.py Show resolved Hide resolved

jonavellecuerdo approved these changes Nov 12, 2024

View reviewed changes

ehanson8 approved these changes Nov 12, 2024

View reviewed changes

abdiff/webapp/app.py Show resolved Hide resolved

ghukill merged commit 8f14417 into main Nov 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TIMX 385 - browsable, filterable Records table #53

TIMX 385 - browsable, filterable Records table #53

Uh oh!

ghukill commented Nov 7, 2024 •

edited

Loading

Uh oh!

coveralls commented Nov 7, 2024 •

edited

Loading

Uh oh!

jonavellecuerdo left a comment

Uh oh!

Uh oh!

ghukill commented Nov 12, 2024

Uh oh!

ehanson8 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TIMX 385 - browsable, filterable Records table #53

TIMX 385 - browsable, filterable Records table #53

Uh oh!

Conversation

ghukill commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose and background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Developer

Code Reviewer(s)

Uh oh!

coveralls commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 11801095979

Details

💛 - Coveralls

Uh oh!

jonavellecuerdo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghukill commented Nov 12, 2024

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ghukill commented Nov 7, 2024 •

edited

Loading

coveralls commented Nov 7, 2024 •

edited

Loading