After each release the stats have to be updated. Most figures can be acquired via omero fs usage and stats.py script.
Problem 1:
studies.tsv wants:
Study | Container | Introduced | Internal ID | Sets | Wells | Experiments (wells for screens, imaging experiments for non-screens) | Targets (genes, small molecules, geographic locations, or combination of factors (idr0019, 26, 34, 38) | Acquisitions | 5D Images | Planes | Size (TB) | Size | # of Files | avg. size (MB) | Avg. Image Dim (XYZCT)
From stats.py you'll get
Container | ID | Set | Wells | Images | Planes | Bytes
Example:
idr0052-walther-condensinmap/experimentA | 752 | 44 of 54 | 0 | 282 | 699360 | 85.4 GB
What does 44 of 54 sets mean? What is Bytes, does that have to be used for Size (TB) and Size?
omero fs usage give you something like
Total disk usage: 115773571855 bytes in 25 files . What about this size? And is the 25 files the # of Files?
The workflow doc has an hql query how to get the Avg. Image Dim (XYZCT), but only for projects not for screens.
And how to get Targets? As this can be multiple things, can't think of an easy/generic script which can go through any annotation.csv and pull the number of unique 'targets'.
Problem 2
releases.tsv wants:
Date | Data release | Code version | Sets | Wells | Experiments | Images | Planes | Size (TB) | Files (Million) | DB Size (GB)
From stats.py you'll get some of it:
Container | ID | Set | Wells | Images | Planes | Bytes
Total | | 13044 | 1213175 | 9150589 | 65571290 | 334.2 TB
But where to get Files (Million) from? And how to get DB Size (GB)?
/cc @sbesson wasn't really sure where to open the issue, here (stats) or idr-utils (stats.py script).
After each release the stats have to be updated. Most figures can be acquired via
omero fs usageandstats.pyscript.Problem 1:
studies.tsv wants:
Study | Container | Introduced | Internal ID | Sets | Wells | Experiments (wells for screens, imaging experiments for non-screens) | Targets (genes, small molecules, geographic locations, or combination of factors (idr0019, 26, 34, 38) | Acquisitions | 5D Images | Planes | Size (TB) | Size | # of Files | avg. size (MB) | Avg. Image Dim (XYZCT)From
stats.pyyou'll getContainer | ID | Set | Wells | Images | Planes | BytesExample:
idr0052-walther-condensinmap/experimentA | 752 | 44 of 54 | 0 | 282 | 699360 | 85.4 GBWhat does
44 of 54sets mean? What isBytes, does that have to be used forSize (TB)andSize?omero fs usagegive you something likeTotal disk usage: 115773571855 bytes in 25 files. What about this size? And is the25 filesthe# of Files?The workflow doc has an hql query how to get the
Avg. Image Dim (XYZCT), but only for projects not for screens.And how to get
Targets? As this can be multiple things, can't think of an easy/generic script which can go through any annotation.csv and pull the number of unique 'targets'.Problem 2
releases.tsv wants:
Date | Data release | Code version | Sets | Wells | Experiments | Images | Planes | Size (TB) | Files (Million) | DB Size (GB)From stats.py you'll get some of it:
Container | ID | Set | Wells | Images | Planes | BytesTotal | | 13044 | 1213175 | 9150589 | 65571290 | 334.2 TBBut where to get
Files (Million)from? And how to getDB Size (GB)?/cc @sbesson wasn't really sure where to open the issue, here (stats) or idr-utils (stats.py script).