@@ -84,10 +84,10 @@ Spark 4.0.0 is supported in the version `0.1.11` and later (need Java 17 and Sca
8484Binary package is available in the Maven Central Repository.
8585
8686
87- - ** Spark 3.5.*** : com.stabrise: spark-pdf-spark35_2 .12:0.1.16
87+ - ** Spark 3.5.*** : com.stabrise: spark-pdf-spark35_2 .12:0.1.17
8888- ** Spark 3.4.*** : com.stabrise: spark-pdf-spark34_2 .12:0.1.11 (issue with publishing fresh version)
89- - ** Spark 3.3.*** : com.stabrise: spark-pdf-spark33_2 .12:0.1.16
90- - ** Spark 4.0.*** : com.stabrise: spark-pdf-spark40_2 .13:0.1.16
89+ - ** Spark 3.3.*** : com.stabrise: spark-pdf-spark33_2 .12:0.1.17
90+ - ** Spark 4.0.*** : com.stabrise: spark-pdf-spark40_2 .13:0.1.17
9191
9292## Options for the data source:
9393
@@ -96,6 +96,7 @@ Binary package is available in the Maven Central Repository.
9696- ` pagePerPartition ` : Number pages per partition in Spark DataFrame. Default: "5".
9797- ` reader ` : Supports: ` pdfBox ` - based on PdfBox java lib, ` gs ` - based on GhostScript (need installation GhostScipt to the system)
9898- ` ocrConfig ` : Tesseract OCR configuration. Default: "psm=3". For more information see [ Tesseract OCR Params] ( TesseractParams.md )
99+ - ` password ` : Password for protected PDF files
99100
100101## Output Columns in the DataFrame:
101102
@@ -158,6 +159,7 @@ val df = spark.read.format("pdf")
158159 .option(" pagePerPartition" , " 2" )
159160 .option(" reader" , " pdfBox" )
160161 .option(" ocrConfig" , " psm=11" )
162+ .option(" password" , " pdf_password" )
161163 .load(" path to the pdf file(s)" )
162164
163165df.select(" path" , " document" ).show()
@@ -180,6 +182,7 @@ df = spark.read.format("pdf") \
180182 .option(" pagePerPartition" , " 2" ) \
181183 .option(" reader" , " pdfBox" ) \
182184 .option(" ocrConfig" , " psm=11" ) \
185+ .option(" password" , " pdf_password" ) \
183186 .load(" path to the pdf file(s)" )
184187
185188df.select(" path" , " document" ).show()
0 commit comments