Update release stats #92

Open

opened

on Aug 12, 2020

After each release the stats have to be updated. Most figures can be acquired via omero fs usage and stats.py script.

Problem 1:

studies.tsv wants:
Study | Container | Introduced | Internal ID | Sets | Wells | Experiments (wells for screens, imaging experiments for non-screens) | Targets (genes, small molecules, geographic locations, or combination of factors (idr0019, 26, 34, 38) | Acquisitions | 5D Images | Planes | Size (TB) | Size | # of Files | avg. size (MB) | Avg. Image Dim (XYZCT)

From stats.py you'll get
Container | ID | Set | Wells | Images | Planes | Bytes
Example:
idr0052-walther-condensinmap/experimentA | 752 | 44 of 54 | 0 | 282 | 699360 | 85.4 GB
What does 44 of 54 sets mean? What is Bytes, does that have to be used for Size (TB) and Size?

omero fs usage give you something like
Total disk usage: 115773571855 bytes in 25 files . What about this size? And is the 25 files the # of Files?

The workflow doc has an hql query how to get the Avg. Image Dim (XYZCT), but only for projects not for screens.

And how to get Targets? As this can be multiple things, can't think of an easy/generic script which can go through any annotation.csv and pull the number of unique 'targets'.

Problem 2

releases.tsv wants:
Date | Data release | Code version | Sets | Wells | Experiments | Images | Planes | Size (TB) | Files (Million) | DB Size (GB)
From stats.py you'll get some of it:
Container | ID | Set | Wells | Images | Planes | Bytes
Total | | 13044 | 1213175 | 9150589 | 65571290 | 334.2 TB
But where to get Files (Million) from? And how to get DB Size (GB)?

/cc @sbesson wasn't really sure where to open the issue, here (stats) or idr-utils (stats.py script).

Metadata

Assignees

No one assigned

Labels

No labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests