Skip to content

Commit 79f7a78

Browse files
committed
[annotation](doc) Data stability requirements and other updates to design document.
1 parent b7cb4b7 commit 79f7a78

1 file changed

Lines changed: 24 additions & 21 deletions

File tree

vignettes/review_design_notes.Rmd

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ knitr::opts_chunk$set(
1616

1717
## User Requirements
1818

19-
These requirements are based on initial brainstorming conversations. Some of them are direct requests and some of them are educated guesses from the feature developer. All of them are subject to change until confirmed by the interested parties. The list is not exhaustive:
19+
These requirements are based on initial brainstorming conversations and a few rounds of user feedback. The list is not exhaustive:
2020

2121
- User can self-select a reviewer role and, under that capacity, annotate each row of any given dataset with a value chosen from those available on a dropdown menu.
2222
- Several users can interact under _strictly non-overlapping_ roles with the application, each annotating individual rows of _possibly overlapping_ datasets.
@@ -25,10 +25,7 @@ These requirements are based on initial brainstorming conversations. Some of the
2525
- A subset of the columns (which we call `tracked` and does not overlap the `identifier` columns) is considered necessary and sufficient for review purposes.
2626
- Updates to the provided datasets are expected during the course of a study.
2727
- Changes to contents of `tracked` columns of a previously reviewed dataset row will be highlighted in the user interface and require re-confirmation.
28-
29-
#### Open questions
30-
- Do all datasets share the same decision dropdown choices?
31-
- Should we guard against or track "disappearing" rows (those whose `identifier` values vanish during a dataset update)?
28+
- All datasets share the same decision dropdown choices.
3229

3330
## API
3431
This feature can be implemented by adding an extra parameter to `mod_listings`. The names of fields and subfields are all temporary placeholders:
@@ -56,34 +53,38 @@ A possible simplification would be to make `"USUBJID"` optional on `id_vars`, si
5653

5754
**Beware**: Once the application is configured and run once, the only change permitted to the `datasets` subfield will be to *add* extra datasets. Changes to previously configured `id_vars` or `tracked_vars` sub-subfields could potentially render the collected review information inconsistent. The module should disallow the editing controls until such a situation is addressed. Review choices and roles do not suffer from that problem.
5855

59-
#### Open questions
60-
- Do we need to keep track of row numbers? They don't have an assigned column name, so this draft API would be insufficient to specify that they should/should not be tracked.
56+
## Data Stability Requirements
57+
The module only has access to the latest version of any given dataset. In order to inform users about modified and newly added records, it relies on stored summary hashes of previously seen data. Thus, it is necessary that some aspects of the representation of data are kept constant over the life of a study. Currently, these are:
58+
59+
- Values assigned to the sub-parameters `id_vars` and `tracked_vars` are set once and remain the same for the duration of the study.
60+
- Variables identified by `id_vars` and `tracked_vars` retain their types (factor, numeric, ...) and are available on each revision of each dataset.
61+
- All rows of each provided dataset are identified uniquely by the combination of `id_vars` configured at the beginning of the study.
62+
- No data rows are dropped during the study. In other words, if a combination of `id_vars` is present on revision `n` of a dataset, it will be available on revision `n+1`.
6163

6264
## User Interface
63-
Basic features (sufficient for initial user feedback):
65+
Basic features:
6466

6567
- Isolated drop-down to choose reviewer role. Blank every time the application starts. Not bookmarked. Only when a non-empty role is selected can the user review data.
66-
- A listing set up for review will have *at least* two extra columns:
67-
- Latest decision
68-
- Row status: unreviewed data, reviewed data, data modified after review.
69-
Sorting/Filtering by "row status" should allow to conduct reviews of incremental changes to the underlying dataset.
68+
- A listing set up for review will have three extra columns:
69+
- Latest review decision
70+
- Latest reviewer role
71+
- Row status: unreviewed data, reviewed data, data modified after review, conflict across reviewers.
7072

71-
Future features (not requested, so not planned for this development phase):
73+
Sorting/Filtering by "row status" should allow to conduct reviews of incremental changes to the underlying dataset.
7274

73-
- Hover-on decision info detail: date and reviewer role.
74-
- Warn against simultaneous conflicting editing.
75+
Future features (not requested, so not planned for this development phase):
7576
- User upload/download of review information. For manual backup purposes. Stored data consists mostly of hashes, so plaintext download should be OK. However, if necessary we could encrypt it using a symmetric key configured as an app secret and provided as an extra parameter to the module.
76-
- Load content from concurrent sessions.
77-
- Warning of conflicting decisions.
77+
- Load content from concurrent review sessions.
7878
- Acceleration options (Bulk editting, keyboard controls, etc.) outside of initial implementation.
79-
- Latest reviewer role column to sort/filter.
80-
- Bulk editing.
79+
- Free text entry for each row and reviewer.
8180

8281
#### Open questions
8382
- The module allows to tweak column visibility. Is it OK to allow review actions performed while some `tracked_vars` are not visible?
8483

85-
8684
## Server storage
85+
86+
_None of the proposals of this section are in scope for the first version of the review functionality. Only the alternative "Client storage" explain in the next section is implemented_.
87+
8788
Currently, the two available forms of storage on Connect are:
8889

8990
- Pins
@@ -108,7 +109,7 @@ The optional `review_store_path` parameter allows to point to an arbitrary folde
108109
- Will client-controlled mount points become available at some point on Connect?
109110

110111
## Client Storage
111-
An alternative approach to review data storage is to use Google's [File System Access API](https://wicg.github.io/file-system-access/) that is currently available in Chrome-derived browsers. To use it, reviewers would have to point the app to a folder shared by the team.
112+
An alternative approach to review data storage is to use Google's [File System Access API](https://wicg.github.io/file-system-access/) that is currently available in Chrome-derived browsers. To use it, reviewers have to point the app to a folder shared by the team at the beginning of each session.
112113

113114
## Data structures
114115
There will use a small collection of files for each input dataset configured for review. If we take an imaginary "ae" domain, we would store the following files:
@@ -121,7 +122,9 @@ There will use a small collection of files for each input dataset configured for
121122
- 1 complete hash of "ae" data.frame
122123
- 1 domain string ("ae")
123124
- n `id_vars` column names
125+
- **MISSING**: n `id_vars` column types
124126
- m `tracked_vars` column names
127+
- **MISSING**: m `tracked_vars` column types
125128
- 1 row count
126129
- p (1 per "ae" row) `hash_id(ae[id_vars])`
127130
- p (1 per "ae" row, *m* bytes long) `hash_tracked(ae[tracked_vars])`

0 commit comments

Comments
 (0)