Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 181aa80

Browse files
committedJul 11, 2024··
Add legacy post
1 parent 8c77bea commit 181aa80

File tree

1 file changed

+80
-0
lines changed

1 file changed

+80
-0
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
+++
2+
title = 'XHTML layout converter for ceTe’s Dynamic PDF library'
3+
date = 2010-01-11T22:48:00-05:00
4+
categories = 'Tools'
5+
tags = ['XML']
6+
+++
7+
8+
At work, we are using [ceTe’s PDF libraries for .NET](http://www.cete.com/). They have a
9+
pretty easy to use, versatile set of tools for building PDFs. However, after building a
10+
few of them, it became painfully obvious that assembling all but the most anemic PDF
11+
layouts was painful. The Tweak-Compile-Generate-View-Repeat method takes quite some
12+
time. Too much time.
13+
14+
The Tweak-Refresh-Repeat model for HTML+CSS seemed to me to be a **_much_** better
15+
method of building a layout, but the Dynamic PDF API doesn’t support anything like that.
16+
It speaks a little bit of pseudo-HTML3 with `<font>`, `<b>`, `<u>`, `<i>`, and `<p>`
17+
tags. But that is about the extent of it. So, I decided to build a converter that takes
18+
full advantage of our existing tools for XHTML generation and converts those XHTML+CSS
19+
documents into a PDF. It turned out to be surprisingly easy. My converter is less than
20+
600 lines of nicely commented VB, with a few little utility functions for things like
21+
parsing CSS. I was even able to account for the “cascading” part of CSS, where I look to
22+
page level styles, then tag level styles, followed by classes, tag IDs, and finally the
23+
style attribute.
24+
25+
Currently what I have will support tables (including colspans and rowspans), images,
26+
rotated items (using the IE only `{ writing-mode:tb-rl; filter: flipv; }` syntax),
27+
nested <div>s, various positioning techniques, and text alignment and styling. I’m using
28+
NVelocity to dynamically generate the XHTML (though T4 or really anything would do), so
29+
all the dynamic parts of the PDF are done on the front end rather than intermingled with
30+
the proprietary ceTe API calls. Since XHTML is both renderable HTML and parsable XML, I
31+
didn’t need to put in any effort there. The Linq-to-XML functionality in .NET made that
32+
part a snap.
33+
34+
One of the decisions I had to make is what specific subset of XHTML would I support.
35+
Obviously, a PDF is much different than a web page in the way you layout the elements.
36+
One example would be that a PDF isn’t designed to have a flow layout. In other words,
37+
when you resize the PDF, everything stays put rather than wrapping around the way a
38+
properly designed web page would. So, my XHTML assumes you will be absolutely
39+
positioning everything, and the converter throws errors if you don’t. You simply have to
40+
specify the X and Y coordinates, as well as specifying the height and width too.
41+
Actually, that's not entirely true - as a convenience I do auto-calculate height/width
42+
in certain circumstances but mostly you ought to specify. Height and Width are defined
43+
in point units, or “Pt”s, which is most commonly thought of as a unit for fonts, but in
44+
fact is equivalent to the units ceTe uses in its box layout API. This means that your
45+
HTML output will line up size-wise with the PDF that gets generated. Again, I wasn’t
46+
looking to take just any old HTML and make a PDF, but if you tailor your XHTML output to
47+
the PDF layout concept, it’s not too constraining.
48+
49+
There are quite a few things I don’t support. I didn’t do anything too fancy like CSS2
50+
selectors. Another consideration was that the XHTML had to renderable exactly like the
51+
PDF version for this to make any sense, but since you can do lots more in XHTML than I
52+
actually support, you could technically go off the deep end. In addition, I chose to
53+
support only IE as my primary renederer. Firefox will _work_, and looks fairly close,
54+
but my purpose wasn’t really to make a standards compliant HTML output. The idea is to
55+
have a RAD tool for PDF generation, and IE-only was okay for me to meet that goal. Also,
56+
there’s no support for any JavaScript (and really no need for it in a PDF anyway). CSS
57+
that isn’t understood is just ignored, as well as tags or attributes that aren’t
58+
understood. The key parts are that everything has to have an x, y, height,and width
59+
specified in the proper units.
60+
61+
There are some things I still want to add as well. I’d like to add support for the
62+
`<span style="font-family: 'Courier New',Courier,monospace;">!important</span>` CSS
63+
directive. Also, currently I don’t support changing fonts. CeTe’s tool seems to require
64+
quite a bit of config to set up a new font. You have to specify the `<span
65+
style="font-family: 'Courier New',Courier,monospace;">.ttf</span>` file for the bold
66+
version, and the italic version, and the bold/italic version, and then there’s the
67+
underlining iterations – you get the idea. It’s on my TODO list to figure that one out.
68+
I also don’t have support for multiple pages. Currently I’m only set up to do a 1-to-1
69+
of XHTML doc to PDF page. And finally, I need to figure out how to do relative
70+
positioning of items (offset this element from that one by X,Y) as well as doing dynamic
71+
sizing of items to support an undetermined amount of text. It might also be nice to
72+
specify an analyzer that lets you know whether your XHTML input has unsupported tags/css
73+
for debugging purposes.
74+
75+
All and all, this was a _really_ fun side project. There was a lot of up-front work, but
76+
it will pay off in no time. Now we can crank out new PDFs in hours instead of days, and
77+
not have to be in the guts of ceTe’s API. I’d like to highlight some of the more
78+
interesting bits of code, but since I did this for work, I’m not sure what I can/should
79+
showcase. Perhaps I’ll rewrite it in C# – though I’m not inclined to buy the Dynamic
80+
PDF tools for my personal use.

0 commit comments

Comments
 (0)
Please sign in to comment.