Table of Contents Generator - Internal Link Issue #1603

jstrumbo · 2025-10-07T20:07:26Z

jstrumbo
Oct 7, 2025

Hi all,

I'm fairly new to python. A large part of my job involves adding TOC's to pdf documents that don't already have them. So I wrote the following script in jupyter to do that, it generates the TOC fine, but I want to add internal links. In theory the easiest way to do that would be to edit my toc.cell() line to include a link for each bookmark. But because of my code structure (I use pypdf to extract the outline data, then fpdf to create the TOC, then merge it with the original doc using pypdf), I'm running into an issue where I cannot create an internal link to a page that doesn't exist yet. What's the easiest fix here? I'll also take code advice if you have it.

from fpdf import FPDF
from pypdf import PdfReader, PdfWriter
from pypdf.generic import Destination
import io
import math

pdf_path = ".PDF" #INSERT PDF HERE

#Extract bookmarks
def extract_bookmarks(outlines, reader, depth=0):
    bookmarks = []
    for item in outlines:
        if isinstance(item, list):
            bookmarks.extend(extract_bookmarks(item, reader, depth + 1))
        elif isinstance(item, Destination):
            title = item.title
            page_number = reader.get_destination_page_number(item)
            bookmarks.append({
                "depth": depth,
                "title": title,
                "page": page_number
            })
    return bookmarks

reader = PdfReader(pdf_path)
raw_outlines = reader.outline
bookmarks = extract_bookmarks(raw_outlines, reader)

#Sort and analyze bookmarks
bookmarks.sort(key=lambda b: b["page"])
Entries = len(bookmarks)
TOCPGS = math.ceil(Entries/33) #Roughly Calculates how many TOC pages are necessary for ammount of bookmarks

#Create TOC page using fpdf
toc = FPDF()
toc.add_page()
toc.set_font("Helvetica", "B", size=18) #Set title as bold and larger font
toc.set_title("Table of Contents")
toc.cell(0, 10, "Table of Contents", align="C")
toc.ln(10)
toc.set_font("Helvetica", "", 12)  # Set body font to regular

for b in bookmarks[4:]: #IMPORTANT - SLICE ASSUMES 4 BOOKMARKS BEFORE TOC - CHANGE ACCORDINGLY
    indent = "    " * b["depth"]  # 4 spaces per depth level

    title = f"{indent}{b['title']}"
    page_str = str(b["page"] + TOCPGS + 1) #FPDF uses a 0-index so this +1 is necessary

    # Get width of the full line and available width
    max_width = toc.w - toc.l_margin - toc.r_margin
    title_width = toc.get_string_width(title)
    page_width = toc.get_string_width(page_str)
    dot_width = toc.get_string_width(".")

    # Calculate number of dots to fill the space
    dots_needed = int((max_width - title_width - page_width) / dot_width)
    dots = "." * max(0, dots_needed)

    line = f"{title}{dots} {page_str}"

    toc.cell(0, 10, line)
    toc.ln(7.5)

#Output the TOC page to Memory output
toc_buffer = io.BytesIO()
toc.output(toc_buffer)
toc_buffer.seek(0)

#Merge TOC with original PDF ---
toc_reader = PdfReader(toc_buffer) #Read the memory output
writer = PdfWriter()
for toc_page in toc_reader.pages:
    writer.add_page(toc_page) #add all the toc pages to the writer

writer.append(pdf_path) #INSERT PDF HERE
with open("MergedFinal.pdf", "wb") as out_file:
    writer.write(out_file)

Answered by andersonhc

Oct 10, 2025

@jstrumbo
Here is an example I could manage to do what you want using named destinations. On the TOC I pinned all destinations to page 1, and after merging I manipulated the named destinations on pypdf:

for b in bookmarks: 
    indent = "    " * b["depth"]
    title = f"{indent}{b['title']}"
    page_str = str(b["page"])
    max_width = toc.epw
    title_width = toc.get_string_width(title)
    page_width = toc.get_string_width(page_str)
    dot_width = toc.get_string_width(".")
    dots_needed = int((max_width - title_width - page_width) / dot_width)
    dots = "." * max(0, dots_needed)
    line = f"{title}{dots} {page_str}"

    # Here I resolve the named destination to page 1
    # so F…

View full answer

Lucas-C · 2025-10-09T16:13:35Z

Lucas-C
Oct 9, 2025
Maintainer

Hi @jstrumbo

Welcome to the fpdf2 users community 🙂

Your code snippet is a bit long, we usually prefer minimal code snippets to be provided, it helps us a lot to figure the underlying problem:
https://stackoverflow.com/help/minimal-reproducible-example

I'm running into an issue where I cannot create an internal link to a page that doesn't exist yet.

What exactly is your issue? Is an exception raised? If so, what is the exact error message?
Could you provide the shortest code snippet that reproduces your issue?

And have you considered using Named Destinations?
cf. https://py-pdf.github.io/fpdf2/NamedDestinations.html

0 replies

andersonhc · 2025-10-10T02:47:52Z

andersonhc
Oct 10, 2025
Maintainer

@jstrumbo
Here is an example I could manage to do what you want using named destinations. On the TOC I pinned all destinations to page 1, and after merging I manipulated the named destinations on pypdf:

for b in bookmarks: 
    indent = "    " * b["depth"]
    title = f"{indent}{b['title']}"
    page_str = str(b["page"])
    max_width = toc.epw
    title_width = toc.get_string_width(title)
    page_width = toc.get_string_width(page_str)
    dot_width = toc.get_string_width(".")
    dots_needed = int((max_width - title_width - page_width) / dot_width)
    dots = "." * max(0, dots_needed)
    line = f"{title}{dots} {page_str}"

    # Here I resolve the named destination to page 1
    # so FPDF2 can produce the TOC. Will change the named destinations
    # on pypdf later
    page_link = toc.add_link(page=1, name=f"dest_page_{page_str}")
    # write the cell linking to the named destination
    toc.cell(w=0, h=10, text=line, link=f"#dest_page_{page_str}")
    toc.ln(7.5)

toc_buffer = io.BytesIO()
toc.output(toc_buffer)
toc_buffer.seek(0)

#Merge TOC with original PDF ---
toc_reader = PdfReader(toc_buffer)
writer = PdfWriter()
for toc_page in toc_reader.pages:
    writer.add_page(toc_page) 

writer.append(pdf_path) 

# Now the PDF is merged we'll manipulate the named destinations
for b in bookmarks:
    page_str = str(b["page"])
    writer.add_named_destination(f"dest_page_{page_str}", page_number=int(page_str))

with open(HERE / "MergedFinal.pdf", "wb") as out_file:
    writer.write(out_file)

Named destinations is a feature not yet released - if you plan on using it you'll need to install fpdf2 from the github repository.

4 replies

jstrumbo Nov 12, 2025
Author

With some minor edits, I was able to get this to work!

On another less important note, these destinations seem to be 'fit to width', I would prefer 'fit to height' or 'fit page' is this changeable?

andersonhc Nov 13, 2025
Maintainer

I tried to change the zoom mode in fpdf.set_display_mode(), but when we modify the destination page with PdfWrite.add_named_destination() in pypdf it overwrites the zoom mode to FitH (full width).

You can see this behavior in the pypdf source code here:
https://github.com/py-pdf/pypdf/blob/85b53d8eb014d1c6363a71401cebfadd9d7300b0/pypdf/_writer.py#L1873

It might be worth opening an issue on the pypdf side to see if they’d consider adding a parameter to control the zoom level (fit type) when creating named destinations.

As a quick proof of concept, I tested injecting a custom method that creates named destinations using /Fit (full page) instead of /FitH:

#Merge TOC with original PDF ---
toc_reader = PdfReader(toc_buffer)
writer = PdfWriter()
for toc_page in toc_reader.pages:
    writer.add_page(toc_page) 

writer.append(pdf_path) 

# Custom function to add named destinations with full-page view
def add_named_destination_full_page (
        self,
        title: str,
        page_number: int,
    ):
        page_ref = self.get_object(self._pages)[pypdf.constants.PagesAttributes.KIDS][page_number] 
        dest = pypdf.generic.DictionaryObject()
        dest.update(
            {
                pypdf.generic.NameObject(pypdf.constants.GoToActionArguments.D): pypdf.generic.ArrayObject(
                    [page_ref, pypdf.generic.NameObject(pypdf.constants.TypFitArguments.FIT), pypdf.generic.NumberObject(826)]
                ),
                pypdf.generic.NameObject(pypdf.constants.GoToActionArguments.S): pypdf.generic.NameObject("/GoTo"),
            }
        )

        dest_ref = self._add_object(dest)
        if not isinstance(title, pypdf.generic.TextStringObject):
            title = pypdf.generic.TextStringObject(str(title))

        self.add_named_destination_array(title, dest_ref)
        return dest_ref
PdfWriter.add_named_destination_full_page = add_named_destination_full_page

# Now the PDF is merged we'll manipulate the named destinations
for b in bookmarks:
    page_str = str(b["page"])
    writer.add_named_destination_full_page(f"dest_page_{page_str}", page_number=int(page_str))

with open(HERE / "MergedFinal.pdf", "wb") as out_file:
    writer.write(out_file)

jstrumbo Nov 13, 2025
Author

Wow! Thank you for that. This is quite impressive. I will look into working this into my code.

A bit of a bummer though, I finally thought I had my script running yesterday. It generated several very large (30+ page) tables of contents perfectly. I thought it was stable, but it will not work this morning. Could you try running my code on your end and see if the problem is on my end? From what I can tell nothing has changed with FPDF since then.

from fpdf import FPDF
from pypdf import PdfReader, PdfWriter
from pypdf.generic import Destination
import io
import math

pdf_path = ".PDF"  # INSERT PDF HERE
cutoff = 4 # HOW MANY BOOKMARKS BEFORE TOC ENTRY

# Extract bookmarks
def extract_bookmarks(outlines, reader, depth=0):
    bookmarks = []
    for item in outlines:
        if isinstance(item, list):
            bookmarks.extend(extract_bookmarks(item, reader, depth + 1))
            
        elif isinstance(item, Destination):
            title = item.title
            page_number = reader.get_destination_page_number(item)
            bookmarks.append({
                "depth": depth,
                "title": title,
                "page": page_number
            })
    return bookmarks

reader = PdfReader(pdf_path)
raw_outlines = reader.outline
bookmarks = extract_bookmarks(raw_outlines, reader)
cutbookmarks = bookmarks[cutoff:]

# Sort and analyze bookmarks
cutbookmarks.sort(key=lambda b: b["page"])
Entries = len(cutbookmarks)
TOCPGS = math.ceil(Entries / 33)  # Roughly calculates how many TOC pages are necessary

# Create TOC page using fpdf
toc = FPDF()
toc.add_page()
toc.set_font("Helvetica", "B", size=18)  # Set title as bold and larger font
toc.set_title("Table of Contents")
toc.cell(0, 10, "Table of Contents", align="C")
toc.ln(10)

for b in cutbookmarks:
    indent = "    " * b["depth"]
    title = f"{indent}{b['title']}"
    page_str = str(b["page"] + TOCPGS +1) # FPDF uses 0-based indexing, so +1 is needed

    # Set font style: Bold for depth 0, regular for all others
    if b["depth"] == 0:
        toc.set_font("Helvetica", "B", 12)
    else:
        toc.set_font("Helvetica", "", 12)

    # Get width of the full line and available width
    max_width = toc.w - toc.l_margin - toc.r_margin
    title_width = toc.get_string_width(title)
    page_width = toc.get_string_width(page_str)
    dot_width = toc.get_string_width(".")

    # Calculate number of dots to fill the space
    dots_needed = int((max_width - title_width - page_width) / dot_width)
    dots = "." * max(0, dots_needed)
    line = f"{title}{dots} {page_str}"
    
    #Print Entry
    page_link = toc.add_link(name=f"dest_page_{page_str}")
    toc.cell(0, 10, line, link=f"#dest_page_{page_str}")
    toc.ln(7.5)
    
# Output the TOC page to memory
toc_buffer = io.BytesIO()
toc.output(toc_buffer)
toc_buffer.seek(0)

# Merge TOC with original PDF
toc_reader = PdfReader(toc_buffer)
writer = PdfWriter()
for toc_page in toc_reader.pages:
    writer.add_page(toc_page)

writer.append(pdf_path)


for b in cutbookmarks:
    page_str = int(b["page"]+ TOCPGS)
    print(page_str+1)
    writer.add_named_destination(f"dest_page_{page_str+1}", page_number=page_str) #needs +1 for some reason


with open("MergedFinal.pdf", "wb") as out_file:
    writer.write(out_file)

print("PDF GENERATED")

andersonhc Nov 15, 2025
Maintainer

Here is a class I wrote integrating your code with fpdf2's TableOfContents logic:

https://github.com/andersonhc/fpdf2-snippets/blob/main/create_table_of_contents/create_toc.py

I hope you'll find it useful and it can solve the last problems you're having.

jstrumbo · 2025-10-27T22:26:31Z

jstrumbo
Oct 27, 2025
Author

Sorry for the delay. here is my minimum reproducible example, hopefully it makes sense:

# Insert PDF to analyze

# Create extract bookmarks function to form a dictionary that has the following for each bookmark: bookmarks[(Parent, Title, Page #)] using pypdf

# Create TOC page using fpdf

#Use a for loop to create table of contents entries

for b in bookmarks:
    
    link_id = toc.add_link()
    toc.set_link(link_id, page=b["page #"])

    #Generate Entry and Repeat
    toc.cell(0, 10, text, link=link_id)
    toc.ln(7.5)

# Merge table of contents page with original PDF and save output

The error I get is this on pdf.output():

ValueError: Invalid reference to non-existing page (#corresponding to my first bookmark) present on page 1

So it sounds like fpdf is not able to create a link to a page that doesn't exist yet since it only can see the 1 page it has generated.

5 replies

andersonhc Oct 28, 2025
Maintainer

Did you try my suggestion above using named destinations?
I use named destinations all point to page 1 on fpdf2, and then after merging the files, pypdf's PdfWrite.add_named_destination() can remap the destinations to the correct page.

jstrumbo Oct 28, 2025
Author

This looks very clever and interesting, but I cannot figure out where to find unreleased versions. Will this require a knowledge of git and github? I am trying to just use pip here because this is a tool coworkers will use.

Lucas-C Oct 29, 2025
Maintainer

This looks very clever and interesting, but I cannot figure out where to find unreleased versions. Will this require a knowledge of git and github? I am trying to just use pip here because this is a tool coworkers will use.

No need to have much technical knowledge, we have instructions on how to install the latest, unreleased fpdf2 code there: https://py-pdf.github.io/fpdf2/#installation 🙂

Lucas-C Oct 29, 2025
Maintainer

By the way, new version 2.8.5 has just been published, including named destinations:
https://github.com/py-pdf/fpdf2/releases/tag/2.8.5

jstrumbo Oct 29, 2025
Author

By the way, new version 2.8.5 has just been published, including named destinations: https://github.com/py-pdf/fpdf2/releases/tag/2.8.5

I was able to download it, I am working on trying our friend's solution now. Will update once that is done

Table of Contents Generator - Internal Link Issue #1603

Uh oh!

Uh oh!

jstrumbo Oct 7, 2025

Replies: 3 comments · 9 replies

Uh oh!

Lucas-C Oct 9, 2025 Maintainer

Uh oh!

Uh oh!

andersonhc Oct 10, 2025 Maintainer

Uh oh!

Uh oh!

jstrumbo Nov 12, 2025 Author

Uh oh!

andersonhc Nov 13, 2025 Maintainer

Uh oh!

Uh oh!

jstrumbo Nov 13, 2025 Author

Uh oh!

andersonhc Nov 15, 2025 Maintainer

Uh oh!

Uh oh!

jstrumbo Oct 27, 2025 Author

Uh oh!

andersonhc Oct 28, 2025 Maintainer

Uh oh!

jstrumbo Oct 28, 2025 Author

Uh oh!

Lucas-C Oct 29, 2025 Maintainer

Uh oh!

Lucas-C Oct 29, 2025 Maintainer

Uh oh!

jstrumbo Oct 29, 2025 Author

jstrumbo
Oct 7, 2025

Replies: 3 comments 9 replies

Lucas-C
Oct 9, 2025
Maintainer

andersonhc
Oct 10, 2025
Maintainer

jstrumbo Nov 12, 2025
Author

andersonhc Nov 13, 2025
Maintainer

jstrumbo Nov 13, 2025
Author

andersonhc Nov 15, 2025
Maintainer

jstrumbo
Oct 27, 2025
Author

andersonhc Oct 28, 2025
Maintainer

jstrumbo Oct 28, 2025
Author

Lucas-C Oct 29, 2025
Maintainer

Lucas-C Oct 29, 2025
Maintainer

jstrumbo Oct 29, 2025
Author