-
Notifications
You must be signed in to change notification settings - Fork 8
Description
I have noticed that for some tasks, e.g., UID0056, the diagrams are essential.
Task:
question: Between the years of 1950 and 1990, in which year did U.S personal saving rates (measured as household saving as a percent of after-tax income) peak?
source_docs: https://fraser.stlouisfed.org/title/treasury-bulletin-407/september-1991-7066?page=30&deep=true
source_files: treasury_bulletin_1991_09.txt
To answer this question, the model should have access to the second diagram on page 30:
However, I've noticed it's not available in the parsed data:
treasury_bulletin_1991_09.txt (content of page 30)
16
Profile of the Economy
June, the deficit totaled $235 billion, or $192 billion excluding outlays as part of the savings and loan situation. For the first 9 months of fiscal 1991, the deficit was $177 billion, compared with about $163 billion a year earlier.
FEDERAL OUTLAYS AND RECEIPTS AS A SHARE OF GROSS NATIONAL PRODUCT
FISCAL YEARS
The Federal budget outlay share of GNP averaged approximately 19 percent during the earlier postwar years, then rose to 23 percent in the 1980s. It is projected to reach a postwar high of 25 percent in fiscal 1992, including spending to deal with the savings and loan situation. The share declines to 20.2 percent by 1996, based on budget projections. Receipts were equal to 19.1 percent of GNP in fiscal 1990, and are projected to stay at 19.1 percent in the current fiscal year and to rise to 19.4 percent by 1996.
PERSONAL SAVING
Household Saving as a Percent of After-Tax Income, Through First Half 1991
The personal saving rate rose from a post-Depression low of 2.9 percent in 1987 to 4.6 percent in both 1989 and 1990, but remained well below the 6.7-percent long-term average. Saving appeared to be rising in early 1990, averaging 4.9 percent in the first half of the year. However, in the second half it dropped to only 4.2 percent as the slowing economy and increasing inflation reduced real incomes. The rate dipped to 3.7 percent in the second quarter of 1991, allowing only a 4-percent average for the first half of the year.
treasury_bulletin_1991_09.json (page 30)
{
"document": {
"elements": [
...
{
"bbox": [{ "coord": [41, 44, 96, 82], "page_id": 30 }],
"content": "16",
"description": null,
"id": 0,
"type": "page_number"
},
{
"bbox": [{ "coord": [488, 86, 967, 144], "page_id": 30 }],
"content": "Profile of the Economy",
"description": null,
"id": 1,
"type": "title"
},
{
"bbox": [{ "coord": [53, 188, 1403, 259], "page_id": 30 }],
"content": "June, the deficit totaled $235 billion, or $192 billion excluding outlays as part of the savings and loan situation. For the first 9 months of fiscal 1991, the deficit was $177 billion, compared with about $163 billion a year earlier.",
"description": null,
"id": 2,
"type": "text"
},
{
"bbox": [{ "coord": [332, 294, 1181, 390], "page_id": 30 }],
"content": "FEDERAL OUTLAYS AND RECEIPTS AS A SHARE OF GROSS NATIONAL PRODUCT",
"description": null,
"id": 3,
"type": "section_header"
},
{
"bbox": [{ "coord": [182, 405, 1269, 903], "page_id": 30 }],
"content": null,
"description": null,
"id": 4,
"type": "figure"
},
{
"bbox": [{ "coord": [639, 921, 864, 965], "page_id": 30 }],
"content": "FISCAL YEARS",
"description": null,
"id": 5,
"type": "text"
},
{
"bbox": [{ "coord": [46, 996, 1405, 1120], "page_id": 30 }],
"content": "The Federal budget outlay share of GNP averaged approximately 19 percent during the earlier postwar years, then rose to 23 percent in the 1980s. It is projected to reach a postwar high of 25 percent in fiscal 1992, including spending to deal with the savings and loan situation. The share declines to 20.2 percent by 1996, based on budget projections. Receipts were equal to 19.1 percent of GNP in fiscal 1990, and are projected to stay at 19.1 percent in the current fiscal year and to rise to 19.4 percent by 1996.",
"description": null,
"id": 6,
"type": "text"
},
{
"bbox": [{ "coord": [554, 1158, 935, 1211], "page_id": 30 }],
"content": "PERSONAL SAVING",
"description": null,
"id": 7,
"type": "text"
},
{
"bbox": [{ "coord": [290, 1229, 1212, 1267], "page_id": 30 }],
"content": "Household Saving as a Percent of After-Tax Income, Through First Half 1991",
"description": null,
"id": 8,
"type": "text"
},
{
"bbox": [{ "coord": [179, 1269, 1259, 1692], "page_id": 30 }],
"content": null,
"description": null,
"id": 9,
"type": "figure"
},
{
"bbox": [{ "coord": [41, 1725, 1406, 1847], "page_id": 30 }],
"content": "The personal saving rate rose from a post-Depression low of 2.9 percent in 1987 to 4.6 percent in both 1989 and 1990, but remained well below the 6.7-percent long-term average. Saving appeared to be rising in early 1990, averaging 4.9 percent in the first half of the year. However, in the second half it dropped to only 4.2 percent as the slowing economy and increasing inflation reduced real incomes. The rate dipped to 3.7 percent in the second quarter of 1991, allowing only a 4-percent average for the first half of the year.",
"description": null,
"id": 10,
"type": "text"
},
...
],
"pages": [
...
{ "id": 30, "image_uri": null },
...
]
},
"error_status": null,
"metadata": {}
}
I’m wondering how the LLM with Oracle Parsed PDF Page(s) and Pre-parsed Full Corpus evaluations were done. Were they performed only using the parsed data, or did the model also have access to the no-OCR PDF version?
I’m also wondering why the figures as base64 data weren’t added to the parsed data, or why placeholders for figures (like [Stripped figure]) weren’t included in the compiled .txt format, so the model would know that figures exist on those pages.
I’d appreciate it if you could clarify these points. Thank you.