Progressive Loading

Progressive loading is a feature that allows PDF documents to be loaded page-by-page instead of loading all pages at once. This is particularly useful for large PDF files, as it significantly reduces initial load time and memory usage.

Overview

When you open a PDF document, pdfrx can operate in two modes:

Standard Loading (default): All pages are loaded immediately when the document is opened
Progressive Loading: Only the first page is loaded initially, and additional pages are loaded on-demand

Progressive loading is especially beneficial when:

Working with large PDF files (hundreds of pages)
Memory is constrained
You want faster initial document load times
Users typically don't view all pages in a session

Note: PdfViewer uses progressive loading by default, while PdfDocument requires explicit opt-in.

Understanding Page Loading States

When progressive loading is enabled, pages can be in one of two states:

Loaded Pages

Pages that are fully loaded have complete information:

Accurate width, height, and rotation values
Can be rendered properly
Text extraction works (loadText(), loadStructuredText())
Link extraction works (loadLinks())
isLoaded property returns true

Unloaded Pages (Progressive Loading Only)

Pages that haven't been loaded yet have limited functionality:

width, height, and rotation are estimated values (may be incorrect)
Rendering produces an empty page with the specified background color
Text extraction returns null
Link extraction returns an empty list
isLoaded property returns false

Enabling Progressive Loading

For PdfViewer

PdfViewer uses progressive loading by default (useProgressiveLoading: true). This means PDF documents are loaded page-by-page automatically as you scroll, providing optimal performance for large files.

All PdfViewer constructors support the useProgressiveLoading parameter:

// Uses progressive loading by default
PdfViewer.file('path/to/document.pdf')

// Explicitly enable progressive loading (same as default)
PdfViewer.asset(
  'assets/large-document.pdf',
  useProgressiveLoading: true,
)

// Disable progressive loading (load all pages at once)
PdfViewer.uri(
  Uri.parse('https://example.com/document.pdf'),
  useProgressiveLoading: false,
)

When using PdfViewer, progressive loading happens automatically in the background as you scroll through the document. You don't need to manually call loadPagesProgressively().

For PdfDocument (Engine-Level API)

When using PdfDocument directly (without PdfViewer), progressive loading is disabled by default (useProgressiveLoading: false). You need to explicitly enable it:

import 'package:pdfrx_engine/pdfrx_engine.dart';

// Open a document with progressive loading enabled
final document = await PdfDocument.openFile(
  'path/to/document.pdf',
  useProgressiveLoading: true,
);

// At this point, only the first page is loaded
print('First page loaded: ${document.pages[0].isLoaded}'); // true
print('Second page loaded: ${document.pages[1].isLoaded}'); // false

Working with Progressively Loaded Documents

Checking Page Load Status

Use the isLoaded property to check if a page is fully loaded:

final page = document.pages[5];

if (page.isLoaded) {
  // Page is fully loaded - all operations work normally
  final text = await page.loadText();
  print(text?.text);
} else {
  // Page is not loaded yet - dimensions may be estimates
  print('Page not loaded yet');
}

Waiting for a Page to Load

Use the ensureLoaded() extension method to wait for a specific page to load. This method waits indefinitely and always returns a loaded page:

final page = document.pages[10];

// Wait for the page to load (waits indefinitely, never returns null)
final loadedPage = await page.ensureLoaded();
final text = await loadedPage.loadText();
print(text?.text);

If you need to set a timeout, use waitForLoaded() instead. This method returns null if the timeout occurs:

final page = document.pages[10];

// Wait for the page to load with a timeout
final loadedPage = await page.waitForLoaded(
  timeout: Duration(seconds: 5),
);

if (loadedPage != null) {
  // Page loaded successfully
  final text = await loadedPage.loadText();
  print(text?.text);
} else {
  // Timeout occurred
  print('Page failed to load within timeout');
}

Important: The ensureLoaded() method may return a different instance of PdfPage than the original. Always use the returned instance:

// ❌ WRONG - using the old page instance
final page = document.pages[10];
await page.ensureLoaded();
final text = await page.loadText(); // May not work as expected

// ✅ CORRECT - using the returned loaded page instance
final page = document.pages[10];
final loadedPage = await page.ensureLoaded();
final text = await loadedPage.loadText(); // Works correctly

Monitoring Page Load Events

You can listen to page status changes using the events stream. The event provides the latest page instance directly via the page property:

final page = document.pages[5];

// Listen for status changes on this specific page
page.events.listen((change) {
  // The change.page property provides the newest page instance
  final updatedPage = change.page;
  print('Page ${updatedPage.pageNumber} status changed');
  print('Is loaded: ${updatedPage.isLoaded}');

  if (updatedPage.isLoaded) {
    print('Page dimensions: ${updatedPage.width} x ${updatedPage.height}');
  }
});

Getting Latest Page Instance

The latestPageStream provides the most recent page instance whenever the page status changes:

final page = document.pages[10];

page.latestPageStream.listen((latestPage) {
  print('Page updated, isLoaded: ${latestPage.isLoaded}');
  if (latestPage.isLoaded) {
    // Use the latest loaded instance
  }
});

Manually Triggering Progressive Loading

When using PdfDocument directly (not PdfViewer), you need to manually trigger progressive loading using loadPagesProgressively():

final document = await PdfDocument.openFile(
  'path/to/document.pdf',
  useProgressiveLoading: true,
);

// Load pages progressively with progress callback
await document.loadPagesProgressively(
  onPageLoadProgress: (data, loadedPageCount, totalPageCount) {
    print('Loaded $loadedPageCount of $totalPageCount pages');

    // Return true to continue loading, false to stop
    return true;
  },
  loadUnitDuration: Duration(milliseconds: 250),
);

The callback is invoked periodically (every loadUnitDuration) as pages are loaded. Return false from the callback to stop the loading process early.

Common Pitfalls and Solutions

Issue: Text Extraction Returns Null

Problem: Trying to extract text from an unloaded page returns null.

final page = document.pages[50];
final text = await page.loadText(); // Returns null if page not loaded

Solution: Always wait for the page to load first:

final page = document.pages[50];
final loadedPage = await page.ensureLoaded();
final text = await loadedPage.loadText();
print(text?.text);

Issue: Page Dimensions Are Incorrect

Problem: Using width, height, or rotation values from unloaded pages gives estimated values that may be wrong.

final page = document.pages[20];
print('Width: ${page.width}'); // May be an estimate if page is not loaded

Solution: Check isLoaded or use ensureLoaded():

final page = document.pages[20];
final loadedPage = await page.ensureLoaded();
print('Actual width: ${loadedPage.width}');
print('Actual height: ${loadedPage.height}');

Issue: Processing All Pages Fails Silently

Problem: Iterating through all pages without waiting for them to load:

// ❌ WRONG - pages may not be loaded yet
for (final page in document.pages) {
  final text = await page.loadText(); // May return null
  processText(text?.text); // Silently skips unloaded pages
}

Solution: Ensure pages are loaded before processing:

// ✅ CORRECT - ensure each page is loaded
for (final page in document.pages) {
  final loadedPage = await page.ensureLoaded();
  final text = await loadedPage.loadText();
  processText(text?.text);
}

// Alternative: Load all pages first using loadPagesProgressively()
await document.loadPagesProgressively();
for (final page in document.pages) {
  final text = await page.loadText();
  processText(text?.text);
}

Performance Considerations

When to Use Progressive Loading

Use progressive loading when:

Working with large PDFs (100+ pages)
Initial load time is critical
Users typically view only a few pages
Memory usage is a concern
Loading from network (reduces initial bandwidth)

Avoid progressive loading when:

Working with small PDFs (< 20 pages)
You need to process all pages immediately
All pages will be accessed anyway
Simplicity is preferred over optimization

Memory Management

Progressive loading reduces initial memory usage but doesn't automatically unload pages. Once a page is loaded, it stays in memory until the document is disposed. For very large documents, consider:

Loading and processing pages in batches
Disposing and reopening the document periodically if processing thousands of pages
Using PdfViewer which handles page lifecycle automatically

Example: Processing Pages with Progress Indicator

Here's a complete example showing how to process all pages in a large PDF with a progress indicator:

import 'package:pdfrx_engine/pdfrx_engine.dart';

Future<void> processPdfPages(String filePath) async {
  // Open document with progressive loading
  final document = await PdfDocument.openFile(
    filePath,
    useProgressiveLoading: true,
  );

  try {
    // Load pages progressively with progress reporting
    await document.loadPagesProgressively(
      onPageLoadProgress: (_, loadedCount, totalCount) {
        final progress = (loadedCount / totalCount * 100).toStringAsFixed(1);
        print('Loading pages: $progress% ($loadedCount/$totalCount)');
        return true; // Continue loading
      },
    );

    // Now all pages are loaded, safe to process
    for (int i = 0; i < document.pages.length; i++) {
      final page = document.pages[i];

      // Extract text from page
      final text = await page.loadText();
      print('Page ${i + 1}: ${text?.text.substring(0, 100)}...');

      // Extract links
      final links = await page.loadLinks();
      print('Page ${i + 1} has ${links.length} links');
    }
  } finally {
    await document.dispose();
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Progressive Loading

Overview

Understanding Page Loading States

Loaded Pages

Unloaded Pages (Progressive Loading Only)

Enabling Progressive Loading

For PdfViewer

For PdfDocument (Engine-Level API)

Working with Progressively Loaded Documents

Checking Page Load Status

Waiting for a Page to Load

Monitoring Page Load Events

Getting Latest Page Instance

Manually Triggering Progressive Loading

Common Pitfalls and Solutions

Issue: Text Extraction Returns Null

Issue: Page Dimensions Are Incorrect

Issue: Processing All Pages Fails Silently

Performance Considerations

When to Use Progressive Loading

Memory Management

Example: Processing Pages with Progress Indicator

Related Documentation

Uh oh!

FilesExpand file tree

Progressive-Loading.md

Latest commit

History

Progressive-Loading.md

File metadata and controls

Progressive Loading

Overview

Understanding Page Loading States

Loaded Pages

Unloaded Pages (Progressive Loading Only)

Enabling Progressive Loading

For PdfViewer

For PdfDocument (Engine-Level API)

Working with Progressively Loaded Documents

Checking Page Load Status

Waiting for a Page to Load

Monitoring Page Load Events

Getting Latest Page Instance

Manually Triggering Progressive Loading

Common Pitfalls and Solutions

Issue: Text Extraction Returns Null

Issue: Page Dimensions Are Incorrect

Issue: Processing All Pages Fails Silently

Performance Considerations

When to Use Progressive Loading

Memory Management

Example: Processing Pages with Progress Indicator

Related Documentation