-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Description
Fatal R error when attempting to use extract_text
on a PDF that includes
Reproducible example
I have constructed a simple example PDF, attached xbar.pdf, that gives the error. (I made this using Microsoft Word, inserting the
As this crashes R I can't use the reprex
package for this, as far as I know...
library(tabulapdf)
# First try getting the text up to but not including the x-bar
out1 <- extract_text("xbar.pdf", area = list(c(0,0,200,193)))
# This works
# Get the whole text
out2 <- extract_text("xbar.pdf")
# This gives a fatal error
# Get the text for just the x-bar area
out3 <- extract_text("xbar.pdf", area = list(c(0,193,200,210)))
# This gives a fatal error
Note that if I call the tabula.jar
bundled with the R package directly from the command line like this
java -jar C:\Users\<username>\AppData\Local\R\win-library\4.4\tabulapdf\java\tabula.jar xbar.pdf
I get the following output (which is fine for my purposes - I am not particularly concerned about the
Aug 06, 2024 10:03:59 AM org.apache.fontbox.ttf.CmapSubtable processSubtype14
WARNING: Format 14 cmap table is not supported and will be ignored
The mean of x is denoted ???
Expected result
No fatal error: I would expect any issues with reading/rendering the
Session info
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tabulapdf_1.0.5-3
loaded via a namespace (and not attached):
[1] utf8_1.2.4 R6_2.5.1 tzdb_0.4.0 magrittr_2.0.3 glue_1.7.0 tibble_3.2.1
[7] pkgconfig_2.0.3 png_0.1-8 rJava_1.0-11 lifecycle_1.0.4 readr_2.1.5 cli_3.6.2
[13] fansi_1.0.6 vctrs_0.6.5 compiler_4.4.0 rstudioapi_0.16.0 tools_4.4.0 hms_1.1.3
[19] pillar_1.9.0 rlang_1.1.3