You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: openpmcvl/granular/README.md
+1-2
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ When a subfigure is successfully detected and separated:
55
55
When subfigure extraction fails:
56
56
-`id`: Generated ID that would have been used
57
57
-`source_fig_id`: ID of the original figure
58
-
-`PMC_ID`: PMC ID of the source article
58
+
-`PMC_ID`: PMC ID of the source article
59
59
-`media_name`: Original filename
60
60
61
61
This script saves extracted subfigures as .jpg files in the target directory. Metadata for each subfigure is stored in separate JSONL files, with unique IDs that link back to the original figure-caption pairs in the source JSONL files.
@@ -106,4 +106,3 @@ The non biomedical subfigures will be removed. The following fields are added to
106
106
107
107
The outputs from steps 3 and 5 contain labeled subcaptions and labeled subfigures respectively. By matching these labels (e.g. "Subfigure-A"), we can create the final subfigure-subcaption pairs. Any cases where labels are missing or captions couldn't be split will be handled in subsequent steps. Refer to notebook for more details.
0 commit comments