Sortable, searchable online table of BioAnalyzer predictions for candidate curatable articles from PubMed, for real-world testing by curators (see Levi’s suggestion).
Zero-cost option: A static version of the curator table can be hosted on GitHub Pages (no server, no Docker). See docs/curator-table/README.md for setup. Use the Streamlit app when you need to collect curator feedback; use the static table for viewing and sharing predictions with curators at minimal cost.
Recommended – use the CLI (Streamlit is included in the main project):
BioAnalyzer run tableThen open http://localhost:8501. Use --port to change the port:
BioAnalyzer run table --port 8502Alternative – run Streamlit directly from the repo root (after pip install -e . or in Docker):
streamlit run curator_table/app.pyOr from this directory:
pip install -r requirements.txt # only if not using main project install
streamlit run app.pyThe app expects a CSV or Parquet file with at least:
- PMID (required)
- Title (recommended)
- The 6 status columns:
Host Species Status,Body Site Status,Condition Status,
Sequencing Type Status,Taxa Level Status,Sample Size Status
(values:PRESENT,PARTIALLY_PRESENT,ABSENT)
Optional: Journal, Summary, Year, Publication Date, Processing Time.
You can use:
- Exports from the BioAnalyzer CLI/API (e.g.
analysis_results.csv). - The validation dataset format (e.g. after merging predictions + metadata into one table with PMID, Title, and the six status columns).
- Search by PMID, title, journal, or summary.
- Sort by any column (PMID, Title, or any status).
- PubMed link per row (opens in a new tab).
- Curator feedback: record verdict (Correct / Incorrect / Uncertain) per PMID; feedback is saved to
curator_feedback.csvin the current working directory and can be downloaded for later analysis.
- First version: 1k–5k rows with a single CSV/Parquet file (as above).
- Larger runs: Run a big batch on SuperStudio, export results to CSV/Parquet (or a DB), then point the app at that file (or add a DB backend later).
See docs/CURATOR_TABLE_DESIGN.md for full design (scale, fields, feedback loop, APIs, and implementation plan).