diff --git a/DESCRIPTION b/DESCRIPTION index de95b1a..06fe99e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -25,5 +25,5 @@ Suggests: webp, tesseract, testthat -RoxygenNote: 6.1.1 +RoxygenNote: 7.0.2 Roxygen: list(markdown = TRUE) diff --git a/R/render.R b/R/render.R index b7ed10d..b7fec44 100644 --- a/R/render.R +++ b/R/render.R @@ -20,6 +20,8 @@ #' # convert few pages to png #' file.copy(file.path(Sys.getenv("R_DOC_DIR"), "NEWS.pdf"), "news.pdf") #' pdf_convert("news.pdf", pages = 1:3) +#' # specify format string for output filenames +#' pdf_convert("news.pdf", filenames="news_page_%d.%s") #' #' # render into raw bitmap #' bitmap <- pdf_render_page("news.pdf") @@ -62,7 +64,8 @@ pdf_render_page<- function(pdf, page = 1, dpi = 72, numeric = FALSE, antialias = #' to one of `poppler_config()$supported_image_formats`. #' @param pages vector with one-based page numbers to render. `NULL` means all pages. #' @param filenames vector of equal length to `pages` with output filenames. May also be -#' a format string which is expanded using `pages` and `format` respectively. +#' a format string which is expanded using `pages` and `format` respectively, i.e. `sprintf`-type +#' string containing `%d` and `%s` (in this order). #' @param verbose print some progress info to stdout pdf_convert <- function(pdf, format = "png", pages = NULL, filenames = NULL , dpi = 72, antialias = TRUE, opw = "", upw = "", verbose = TRUE){ diff --git a/man/pdf_ocr.Rd b/man/pdf_ocr.Rd index 3623e1b..3417b76 100644 --- a/man/pdf_ocr.Rd +++ b/man/pdf_ocr.Rd @@ -5,11 +5,23 @@ \alias{pdf_ocr_data} \title{OCR text extraction} \usage{ -pdf_ocr_text(pdf, pages = NULL, opw = "", upw = "", - language = "eng", dpi = 600) +pdf_ocr_text( + pdf, + pages = NULL, + opw = "", + upw = "", + language = "eng", + dpi = 600 +) -pdf_ocr_data(pdf, pages = NULL, opw = "", upw = "", - language = "eng", dpi = 600) +pdf_ocr_data( + pdf, + pages = NULL, + opw = "", + upw = "", + language = "eng", + dpi = 600 +) } \arguments{ \item{pdf}{file path or raw vector with pdf data} @@ -29,7 +41,9 @@ languge of the engine.} Perform OCR text extraction. This requires you have the \code{tesseract} package. } \seealso{ -Other pdftools: \code{\link{pdftools}}, \code{\link{qpdf}}, - \code{\link{rendering}} +Other pdftools: +\code{\link{pdftools}}, +\code{\link{qpdf}}, +\code{\link{rendering}} } \concept{pdftools} diff --git a/man/pdf_render_page.Rd b/man/pdf_render_page.Rd index aef162d..ab26cbc 100644 --- a/man/pdf_render_page.Rd +++ b/man/pdf_render_page.Rd @@ -8,11 +8,27 @@ \alias{poppler_config} \title{Render / Convert PDF} \usage{ -pdf_render_page(pdf, page = 1, dpi = 72, numeric = FALSE, - antialias = TRUE, opw = "", upw = "") - -pdf_convert(pdf, format = "png", pages = NULL, filenames = NULL, - dpi = 72, antialias = TRUE, opw = "", upw = "", verbose = TRUE) +pdf_render_page( + pdf, + page = 1, + dpi = 72, + numeric = FALSE, + antialias = TRUE, + opw = "", + upw = "" +) + +pdf_convert( + pdf, + format = "png", + pages = NULL, + filenames = NULL, + dpi = 72, + antialias = TRUE, + opw = "", + upw = "", + verbose = TRUE +) poppler_config() } @@ -38,7 +54,8 @@ to one of \code{poppler_config()$supported_image_formats}.} \item{pages}{vector with one-based page numbers to render. \code{NULL} means all pages.} \item{filenames}{vector of equal length to \code{pages} with output filenames. May also be -a format string which is expanded using \code{pages} and \code{format} respectively.} +a format string which is expanded using \code{pages} and \code{format} respectively, i.e. \code{sprintf}-type +string containing \verb{\%d} and \verb{\%s} (in this order).} \item{verbose}{print some progress info to stdout} } @@ -51,6 +68,8 @@ raw bitmap array for further processing in R. # convert few pages to png file.copy(file.path(Sys.getenv("R_DOC_DIR"), "NEWS.pdf"), "news.pdf") pdf_convert("news.pdf", pages = 1:3) +# specify format string for output filenames +pdf_convert("news.pdf", filenames="news_page_\%d.\%s") # render into raw bitmap bitmap <- pdf_render_page("news.pdf") @@ -73,7 +92,9 @@ unlink(c('news.pdf', 'news_1.png', 'news_2.png', 'news_3.png', 'page.jpeg', 'page.png', 'page.webp')) } \seealso{ -Other pdftools: \code{\link{pdf_ocr_text}}, - \code{\link{pdftools}}, \code{\link{qpdf}} +Other pdftools: +\code{\link{pdf_ocr_text}()}, +\code{\link{pdftools}}, +\code{\link{qpdf}} } \concept{pdftools} diff --git a/man/pdftools.Rd b/man/pdftools.Rd index f5119ad..0ca3c94 100644 --- a/man/pdftools.Rd +++ b/man/pdftools.Rd @@ -63,7 +63,9 @@ fonts <- pdf_fonts(pdf_file) files <- pdf_attachments(pdf_file) } \seealso{ -Other pdftools: \code{\link{pdf_ocr_text}}, - \code{\link{qpdf}}, \code{\link{rendering}} +Other pdftools: +\code{\link{pdf_ocr_text}()}, +\code{\link{qpdf}}, +\code{\link{rendering}} } \concept{pdftools} diff --git a/man/qpdf.Rd b/man/qpdf.Rd index 51033fa..b39ab51 100644 --- a/man/qpdf.Rd +++ b/man/qpdf.Rd @@ -11,8 +11,10 @@ \alias{pdf_subset} \title{qpdf utilities} \seealso{ -Other pdftools: \code{\link{pdf_ocr_text}}, - \code{\link{pdftools}}, \code{\link{rendering}} +Other pdftools: +\code{\link{pdf_ocr_text}()}, +\code{\link{pdftools}}, +\code{\link{rendering}} } \concept{pdftools} \keyword{internal}