Table of Contents Generator - Internal Link Issue #1603
-
|
Hi all, I'm fairly new to python. A large part of my job involves adding TOC's to pdf documents that don't already have them. So I wrote the following script in jupyter to do that, it generates the TOC fine, but I want to add internal links. In theory the easiest way to do that would be to edit my toc.cell() line to include a link for each bookmark. But because of my code structure (I use pypdf to extract the outline data, then fpdf to create the TOC, then merge it with the original doc using pypdf), I'm running into an issue where I cannot create an internal link to a page that doesn't exist yet. What's the easiest fix here? I'll also take code advice if you have it. from fpdf import FPDF
from pypdf import PdfReader, PdfWriter
from pypdf.generic import Destination
import io
import math
pdf_path = ".PDF" #INSERT PDF HERE
#Extract bookmarks
def extract_bookmarks(outlines, reader, depth=0):
bookmarks = []
for item in outlines:
if isinstance(item, list):
bookmarks.extend(extract_bookmarks(item, reader, depth + 1))
elif isinstance(item, Destination):
title = item.title
page_number = reader.get_destination_page_number(item)
bookmarks.append({
"depth": depth,
"title": title,
"page": page_number
})
return bookmarks
reader = PdfReader(pdf_path)
raw_outlines = reader.outline
bookmarks = extract_bookmarks(raw_outlines, reader)
#Sort and analyze bookmarks
bookmarks.sort(key=lambda b: b["page"])
Entries = len(bookmarks)
TOCPGS = math.ceil(Entries/33) #Roughly Calculates how many TOC pages are necessary for ammount of bookmarks
#Create TOC page using fpdf
toc = FPDF()
toc.add_page()
toc.set_font("Helvetica", "B", size=18) #Set title as bold and larger font
toc.set_title("Table of Contents")
toc.cell(0, 10, "Table of Contents", align="C")
toc.ln(10)
toc.set_font("Helvetica", "", 12) # Set body font to regular
for b in bookmarks[4:]: #IMPORTANT - SLICE ASSUMES 4 BOOKMARKS BEFORE TOC - CHANGE ACCORDINGLY
indent = " " * b["depth"] # 4 spaces per depth level
title = f"{indent}{b['title']}"
page_str = str(b["page"] + TOCPGS + 1) #FPDF uses a 0-index so this +1 is necessary
# Get width of the full line and available width
max_width = toc.w - toc.l_margin - toc.r_margin
title_width = toc.get_string_width(title)
page_width = toc.get_string_width(page_str)
dot_width = toc.get_string_width(".")
# Calculate number of dots to fill the space
dots_needed = int((max_width - title_width - page_width) / dot_width)
dots = "." * max(0, dots_needed)
line = f"{title}{dots} {page_str}"
toc.cell(0, 10, line)
toc.ln(7.5)
#Output the TOC page to Memory output
toc_buffer = io.BytesIO()
toc.output(toc_buffer)
toc_buffer.seek(0)
#Merge TOC with original PDF ---
toc_reader = PdfReader(toc_buffer) #Read the memory output
writer = PdfWriter()
for toc_page in toc_reader.pages:
writer.add_page(toc_page) #add all the toc pages to the writer
writer.append(pdf_path) #INSERT PDF HERE
with open("MergedFinal.pdf", "wb") as out_file:
writer.write(out_file) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
|
Hi @jstrumbo Welcome to the Your code snippet is a bit long, we usually prefer minimal code snippets to be provided, it helps us a lot to figure the underlying problem:
What exactly is your issue? Is an exception raised? If so, what is the exact error message? And have you considered using Named Destinations? |
Beta Was this translation helpful? Give feedback.
-
|
@jstrumbo for b in bookmarks:
indent = " " * b["depth"]
title = f"{indent}{b['title']}"
page_str = str(b["page"])
max_width = toc.epw
title_width = toc.get_string_width(title)
page_width = toc.get_string_width(page_str)
dot_width = toc.get_string_width(".")
dots_needed = int((max_width - title_width - page_width) / dot_width)
dots = "." * max(0, dots_needed)
line = f"{title}{dots} {page_str}"
# Here I resolve the named destination to page 1
# so FPDF2 can produce the TOC. Will change the named destinations
# on pypdf later
page_link = toc.add_link(page=1, name=f"dest_page_{page_str}")
# write the cell linking to the named destination
toc.cell(w=0, h=10, text=line, link=f"#dest_page_{page_str}")
toc.ln(7.5)
toc_buffer = io.BytesIO()
toc.output(toc_buffer)
toc_buffer.seek(0)
#Merge TOC with original PDF ---
toc_reader = PdfReader(toc_buffer)
writer = PdfWriter()
for toc_page in toc_reader.pages:
writer.add_page(toc_page)
writer.append(pdf_path)
# Now the PDF is merged we'll manipulate the named destinations
for b in bookmarks:
page_str = str(b["page"])
writer.add_named_destination(f"dest_page_{page_str}", page_number=int(page_str))
with open(HERE / "MergedFinal.pdf", "wb") as out_file:
writer.write(out_file)Named destinations is a feature not yet released - if you plan on using it you'll need to install fpdf2 from the github repository. |
Beta Was this translation helpful? Give feedback.
-
|
Sorry for the delay. here is my minimum reproducible example, hopefully it makes sense: The error I get is this on pdf.output(): ValueError: Invalid reference to non-existing page (#corresponding to my first bookmark) present on page 1 So it sounds like fpdf is not able to create a link to a page that doesn't exist yet since it only can see the 1 page it has generated. |
Beta Was this translation helpful? Give feedback.
@jstrumbo
Here is an example I could manage to do what you want using named destinations. On the TOC I pinned all destinations to page 1, and after merging I manipulated the named destinations on pypdf: