You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-4Lines changed: 13 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,19 @@
8
8
9
9
A toolkit to download, augment, and benchmark OpenPMC-VL; a large dataset of image-text pairs extracted from open-access scientific articles on PubMedCentral.
10
10
11
+
## Table of Contents
12
+
13
+
1.[Hugging Face Dataset and Checkpoint](#hugging-face-dataset-and-checkpoint)
-**Dataset:**[Open_PMC Dataset on Hugging Face](https://huggingface.co/datasets/vector-institute/open_pmc)
22
+
-**Checkpoint:**[Open_PMC_CLIP Model Checkpoint on Hugging Face](https://huggingface.co/vector-institute/open_pmc_clip)
23
+
11
24
## Installing dependencies
12
25
13
26
We use
@@ -75,7 +88,6 @@ python
75
88
76
89
**Note:** Since these submodules (`mmlearn` and `open_clip`) are only part of the main branch in a single repository, if you change your branch to a branch where these submodules don't exist, your python interpretor won't be able to find these packages and you will face errors.
77
90
78
-
79
91
## Download and parse image-caption pairs from Pubmed Articles
80
92
The codebase used to download Pubmed articles and parse image-text pairs from them is stored in `openpmcvl/foundation`.
81
93
This codebase heavily relies on [Build PMC-OA](https://github.com/WeixiongLin/Build-PMC-OA) codebase[[1]](#1).
@@ -97,7 +109,6 @@ To download and parse open-access articles which other licenses than what is men
0 commit comments