-
-
Notifications
You must be signed in to change notification settings - Fork 984
Description
Before you start - checklist
- I understand that React-PDF does not aim to be a fully-fledged PDF viewer and is only a tool to make one
- I have checked if this feature request is not already reported
Description
🐞 No reliable signal for TextLayer completion / stability
Summary
react-pdf does not expose any reliable signal to determine when the TextLayer for all rendered pages is complete and stable.
This makes it extremely difficult to safely run downstream DOM-dependent logic (e.g. text-based highlighting, annotations, selection overlays) without relying on fragile heuristics.
What I am trying to do
I am building a PDF viewer on top of react-pdf that:
- Relies on the TextLayer DOM (
spanelements) - Needs to run post-processing logic exactly once, after all text spans are present
- Must support:
- Multi-page PDFs
- Dynamic scale changes
- Both text-based PDFs and OCR-generated artificial text layers
Example use cases:
- Auto-highlighting extracted values
- Text anchoring / annotation systems
- Selection overlays synced with text
The core problem
There is no explicit signal indicating that:
“All text spans for the rendered pages have been inserted and no further DOM mutations will occur.”
Available callbacks/events today:
onLoadSuccess→ PDF bytes loaded (not rendering)onRenderSuccess→ Page canvas rendered (not text)onRenderTextLayerSuccess→ Fires per page, but:- Does not guarantee all spans are inserted
- Does not indicate global completion across pages
- Late DOM mutations (fonts, layout, async inserts) may still occur
As a result, consumers are forced to infer TextLayer readiness, typically using heuristics.
Current workaround (fragile)
The only viable workaround today is something like:
- Observe the container DOM using
MutationObserver - Watch for TextLayer span insertions
- Debounce mutations (e.g. “300ms without changes”)
- Assume the TextLayer is “ready”
This approach:
- Is timing-based, not state-based
- Can fire too early or too late
- Breaks under scale changes, large PDFs, or delayed font/layout recalculations
- Becomes brittle in real-world usage
This is not a bug in react-pdf — it is a missing lifecycle signal.
Why this matters
Many advanced features depend on knowing when text is stable:
- Auto-highlighting / value matching
- Anchored annotations
- Text-driven overlays
- OCR text synchronization
- Accurate scroll-to-text behavior
Without a reliable completion signal:
- Downstream logic becomes one-shot and irreversible
- Consumers must build fragile
MutationObserverlogic - Bugs appear nondeterministic and hard to reproduce
Minimal reproduction
Repository demonstrating the issue:
👉 https://github.com/rano667/pdf-text-extraction
The example shows that:
- TextLayer spans are inserted incrementally
- There is no reliable point at which all spans can be considered “final”
- Any DOM-dependent post-processing must rely on heuristics
What would help (suggestions, not demands)
Any of the following would significantly improve reliability:
-
A global TextLayer completion signal, for example:
onTextLayerRenderComplete({ pageNumber })onAllTextLayersRendered()
-
A documented contract clarifying:
- When
onRenderTextLayerSuccessfires relative to span insertion - Whether late DOM mutations are expected
- When
-
An internal lifecycle hook exposed to consumers indicating:
- “All pages’ text layers have finished rendering”
Even a best-effort signal would be better than none.
Environment
{
"react": "^19.2.0",
"react-pdf": "^10.2.0"
}
### Proposed solution
_No response_
### Alternatives
_No response_
### Additional information
_No response_