Skip to content

(Bug) OCR layer not invisible, red text #733

@Artnal

Description

@Artnal

Description

When I use the OCR; the layer of text that is created is not invisible. Instead, it adds a layer of red text on top of the page (irrespective of the PDF viewer used). Attached is a random PDF document to show it.

NB: This is with Firefox, but I also tried with Brave, with the same result. I attach the console logs for both browsers.

Steps to Reproduce

  1. Use OCR
  2. Download File
  3. Open with any document viewer

Console Logs

Firefox:
Content-Security-Policy: The page’s settings blocked an inline script (script-src-elem) from being executed because it violates the following directive: “script-src 'self' 'wasm-unsafe-eval' 'unsafe-eval' blob: https://cdn.jsdelivr.net”. Consider using a hash ('sha256-nmXakBUiDCy5V6DgDXoh2VSazLpPcbJE9AEDw4sx1F0=') or a nonce. content.js:74:196
Content-Security-Policy: The page’s settings blocked an inline script (script-src-elem) from being executed because it violates the following directive: “script-src 'self' 'wasm-unsafe-eval' 'unsafe-eval' blob: https://cdn.jsdelivr.net”. Consider using a hash ('sha256-ZswfTY7H35rbv8WC7NXBoiC7WNu86vSzCDChNWwZZDM=') or a nonce. utils.js:41:10
🌐 i18next is made possible by our own product, Locize — consider powering your project with managed localization (AI, CDN, integrations): https://locize.com 💙 main-BeeXCYEk.js:27:72920
Please share our tool and share the love! main-BeeXCYEk.js:57:10373
WARNING: /input.pdf: reported number of objects (9) is not one plus the highest object number (9) main-BeeXCYEk.js:3:397
this.program: operation succeeded with warnings; resulting file may have some problems main-BeeXCYEk.js:3:397
Font loading failed, falling back to Helvetica TypeError: NetworkError when attempting to fetch resource.
    MS http://[ip]/assets/ocr-BiOVQOnV.js:47
ocr-BiOVQOnV.js:47:15192
Content-Security-Policy: The page’s settings blocked the loading of a resource (connect-src) at https://raw.githack.com/googlefonts/noto-fonts/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf because it violates the following directive: “connect-src 'self' blob: https://api.github.com https://fonts.gstatic.com https://cdn.jsdelivr.net https://bentopdf-cors-proxy.bentopdf.workers.dev” ocr-BiOVQOnV.js:47:9624
Estimating resolution as 199 tesseract-core-relaxedsimd-lstm.wasm.js:18:307
Error in boxClipToRectangle: box outside rectangle tesseract-core-relaxedsimd-lstm.wasm.js:18:307
Error in pixScanForForeground: invalid box tesseract-core-relaxedsimd-lstm.wasm.js:18:307
Estimating resolution as 247 tesseract-core-relaxedsimd-lstm.wasm.js:18:307
Font loading failed, falling back to Helvetica TypeError: NetworkError when attempting to fetch resource.
    MS http://[ip]/assets/ocr-BiOVQOnV.js:47
ocr-BiOVQOnV.js:47:15192
    tC http://[ip]/assets/ocr-BiOVQOnV.js:47
Content-Security-Policy: The page’s settings blocked the loading of a resource (connect-src) at https://raw.githack.com/googlefonts/noto-fonts/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf because it violates the following directive: “connect-src 'self' blob: https://api.github.com https://fonts.gstatic.com https://cdn.jsdelivr.net https://bentopdf-cors-proxy.bentopdf.workers.dev” ocr-BiOVQOnV.js:47:9624
Estimating resolution as 330

Brave:
The Cross-Origin-Opener-Policy header has been ignored, because the URL's origin was untrustworthy. It was defined either in the final response or a redirect. Please deliver the response using the HTTPS protocol. You can also use the 'localhost' origin instead. See https://www.w3.org/TR/powerful-features/#potentially-trustworthy-origin and https://html.spec.whatwg.org/#the-cross-origin-opener-policy-header.
ocr-BiOVQOnV.js:47 Connecting to 'https://raw.githack.com/googlefonts/noto-fonts/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf' violates the following Content Security Policy directive: "connect-src 'self' blob: https://api.github.com https://fonts.gstatic.com https://cdn.jsdelivr.net https://bentopdf-cors-proxy.bentopdf.workers.dev". The action has been blocked.
MS @ ocr-BiOVQOnV.js:47
ocr-BiOVQOnV.js:47 Fetch API cannot load https://raw.githack.com/googlefonts/noto-fonts/main/hinted/ttf/NotoSans/NotoSans-Regular.ttf. Refused to connect because it violates the document's Content Security Policy.
MS @ ocr-BiOVQOnV.js:47
ocr-BiOVQOnV.js:47 Font loading failed, falling back to Helvetica TypeError: Failed to fetch. Refused to connect because it violates the document's Content Security Policy.
    at MS (ocr-BiOVQOnV.js:47:9624)
    at async tC (ocr-BiOVQOnV.js:47:15123)
    at async HTMLButtonElement.v (ocr-pdf-Ch8jftAE.js:2:1932)
tC @ ocr-BiOVQOnV.js:47
 Estimating resolution as 318
xg @ cdn.jsdelivr.net/npm/tesseract.js-core@v7.0.0/tesseract-core-relaxedsimd-lstm.wasm.js:18
ocr-pdf:1 The file at 'blob:http://[ip]/4a7033bb-5c19-411d-9db3-243368857148' was loaded over an insecure connection. This file should be served over HTTPS.

Sample PDF or File

pdf-test.pdf
pdf-test_after OCR.pdf

Browser

Firefox

Browser Version

151.0.1

Operating System

Linux

BentoPDF Version

2.8.5

Additional Context

No response

Pre-submission Checklist

  • I have included console logs from the browser DevTools
  • I have attached a sample file or described how to reproduce the issue
  • I have searched existing issues to ensure this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingnext-releaseThis bug/feature is fixed/done. To be shipped in next release

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions