Skip to content

Commit 21c3c3a

Browse files
authored
Update README.md
Added HF links and table of contents
1 parent c91f032 commit 21c3c3a

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,19 @@
88

99
A toolkit to download, augment, and benchmark OpenPMC-VL; a large dataset of image-text pairs extracted from open-access scientific articles on PubMedCentral.
1010

11+
## Table of Contents
12+
13+
1. [Hugging Face Dataset and Checkpoint](#hugging-face-dataset-and-checkpoint)
14+
2. [Installing Dependencies](#installing-dependencies)
15+
3. [Download and Parse Image-Caption Pairs](#download-and-parse-image-caption-pairs-from-pubmed-articles)
16+
4. [Run Benchmarking Experiments](#run-benchmarking-experiments)
17+
5. [References](#references)
18+
19+
## Hugging Face Dataset and Checkpoint
20+
21+
- **Dataset:** [Open_PMC Dataset on Hugging Face](https://huggingface.co/datasets/vector-institute/open_pmc)
22+
- **Checkpoint:** [Open_PMC_CLIP Model Checkpoint on Hugging Face](https://huggingface.co/vector-institute/open_pmc_clip)
23+
1124
## Installing dependencies
1225

1326
We use
@@ -75,7 +88,6 @@ python
7588

7689
**Note:** Since these submodules (`mmlearn` and `open_clip`) are only part of the main branch in a single repository, if you change your branch to a branch where these submodules don't exist, your python interpretor won't be able to find these packages and you will face errors.
7790

78-
7991
## Download and parse image-caption pairs from Pubmed Articles
8092
The codebase used to download Pubmed articles and parse image-text pairs from them is stored in `openpmcvl/foundation`.
8193
This codebase heavily relies on [Build PMC-OA](https://github.com/WeixiongLin/Build-PMC-OA) codebase[[1]](#1).
@@ -97,7 +109,6 @@ To download and parse open-access articles which other licenses than what is men
97109
python -u src/fetch_oa.py --num-retries 5 --extraction-dir path/to/download/directory/other --license-type other --volumes 0 1 2 3 4 5 6 7 8 9 10 11
98110
```
99111

100-
101112
## Run Benchmarking Experiments
102113
We use `mmlearn` to run benchmarking experiments.
103114
Many experiments can be run with our dataset and `mmlearn`.
@@ -136,8 +147,6 @@ mmlearn_run \
136147
For more comprehensive examples of shell scripts that run various experiments with OpenPMC-VL, refer to `openpmcvl/experiment/scripts`.
137148
For more information about `mmlearn`, please refer to the package's [official codebase](https://github.com/VectorInstitute/mmlearn).
138149

139-
140-
141150
## References
142151
<a id="1">[1]</a> PMC-OA paper:
143152
```latex

0 commit comments

Comments
 (0)