|
| 1 | ++++ |
| 2 | +title = 'XHTML layout converter for ceTe’s Dynamic PDF library' |
| 3 | +date = 2010-01-11T22:48:00-05:00 |
| 4 | +categories = 'Tools' |
| 5 | +tags = ['XML'] |
| 6 | ++++ |
| 7 | + |
| 8 | +At work, we are using [ceTe’s PDF libraries for .NET](http://www.cete.com/). They have a |
| 9 | +pretty easy to use, versatile set of tools for building PDFs. However, after building a |
| 10 | +few of them, it became painfully obvious that assembling all but the most anemic PDF |
| 11 | +layouts was painful. The Tweak-Compile-Generate-View-Repeat method takes quite some |
| 12 | +time. Too much time. |
| 13 | + |
| 14 | +The Tweak-Refresh-Repeat model for HTML+CSS seemed to me to be a **_much_** better |
| 15 | +method of building a layout, but the Dynamic PDF API doesn’t support anything like that. |
| 16 | +It speaks a little bit of pseudo-HTML3 with `<font>`, `<b>`, `<u>`, `<i>`, and `<p>` |
| 17 | +tags. But that is about the extent of it. So, I decided to build a converter that takes |
| 18 | +full advantage of our existing tools for XHTML generation and converts those XHTML+CSS |
| 19 | +documents into a PDF. It turned out to be surprisingly easy. My converter is less than |
| 20 | +600 lines of nicely commented VB, with a few little utility functions for things like |
| 21 | +parsing CSS. I was even able to account for the “cascading” part of CSS, where I look to |
| 22 | +page level styles, then tag level styles, followed by classes, tag IDs, and finally the |
| 23 | +style attribute. |
| 24 | + |
| 25 | +Currently what I have will support tables (including colspans and rowspans), images, |
| 26 | +rotated items (using the IE only `{ writing-mode:tb-rl; filter: flipv; }` syntax), |
| 27 | +nested <div>s, various positioning techniques, and text alignment and styling. I’m using |
| 28 | +NVelocity to dynamically generate the XHTML (though T4 or really anything would do), so |
| 29 | +all the dynamic parts of the PDF are done on the front end rather than intermingled with |
| 30 | +the proprietary ceTe API calls. Since XHTML is both renderable HTML and parsable XML, I |
| 31 | +didn’t need to put in any effort there. The Linq-to-XML functionality in .NET made that |
| 32 | +part a snap. |
| 33 | + |
| 34 | +One of the decisions I had to make is what specific subset of XHTML would I support. |
| 35 | +Obviously, a PDF is much different than a web page in the way you layout the elements. |
| 36 | +One example would be that a PDF isn’t designed to have a flow layout. In other words, |
| 37 | +when you resize the PDF, everything stays put rather than wrapping around the way a |
| 38 | +properly designed web page would. So, my XHTML assumes you will be absolutely |
| 39 | +positioning everything, and the converter throws errors if you don’t. You simply have to |
| 40 | +specify the X and Y coordinates, as well as specifying the height and width too. |
| 41 | +Actually, that's not entirely true - as a convenience I do auto-calculate height/width |
| 42 | +in certain circumstances but mostly you ought to specify. Height and Width are defined |
| 43 | +in point units, or “Pt”s, which is most commonly thought of as a unit for fonts, but in |
| 44 | +fact is equivalent to the units ceTe uses in its box layout API. This means that your |
| 45 | +HTML output will line up size-wise with the PDF that gets generated. Again, I wasn’t |
| 46 | +looking to take just any old HTML and make a PDF, but if you tailor your XHTML output to |
| 47 | +the PDF layout concept, it’s not too constraining. |
| 48 | + |
| 49 | +There are quite a few things I don’t support. I didn’t do anything too fancy like CSS2 |
| 50 | +selectors. Another consideration was that the XHTML had to renderable exactly like the |
| 51 | +PDF version for this to make any sense, but since you can do lots more in XHTML than I |
| 52 | +actually support, you could technically go off the deep end. In addition, I chose to |
| 53 | +support only IE as my primary renederer. Firefox will _work_, and looks fairly close, |
| 54 | +but my purpose wasn’t really to make a standards compliant HTML output. The idea is to |
| 55 | +have a RAD tool for PDF generation, and IE-only was okay for me to meet that goal. Also, |
| 56 | +there’s no support for any JavaScript (and really no need for it in a PDF anyway). CSS |
| 57 | +that isn’t understood is just ignored, as well as tags or attributes that aren’t |
| 58 | +understood. The key parts are that everything has to have an x, y, height,and width |
| 59 | +specified in the proper units. |
| 60 | + |
| 61 | +There are some things I still want to add as well. I’d like to add support for the |
| 62 | +`<span style="font-family: 'Courier New',Courier,monospace;">!important</span>` CSS |
| 63 | +directive. Also, currently I don’t support changing fonts. CeTe’s tool seems to require |
| 64 | +quite a bit of config to set up a new font. You have to specify the `<span |
| 65 | +style="font-family: 'Courier New',Courier,monospace;">.ttf</span>` file for the bold |
| 66 | +version, and the italic version, and the bold/italic version, and then there’s the |
| 67 | +underlining iterations – you get the idea. It’s on my TODO list to figure that one out. |
| 68 | +I also don’t have support for multiple pages. Currently I’m only set up to do a 1-to-1 |
| 69 | +of XHTML doc to PDF page. And finally, I need to figure out how to do relative |
| 70 | +positioning of items (offset this element from that one by X,Y) as well as doing dynamic |
| 71 | +sizing of items to support an undetermined amount of text. It might also be nice to |
| 72 | +specify an analyzer that lets you know whether your XHTML input has unsupported tags/css |
| 73 | +for debugging purposes. |
| 74 | + |
| 75 | +All and all, this was a _really_ fun side project. There was a lot of up-front work, but |
| 76 | +it will pay off in no time. Now we can crank out new PDFs in hours instead of days, and |
| 77 | +not have to be in the guts of ceTe’s API. I’d like to highlight some of the more |
| 78 | +interesting bits of code, but since I did this for work, I’m not sure what I can/should |
| 79 | +showcase. Perhaps I’ll rewrite it in C# – though I’m not inclined to buy the Dynamic |
| 80 | +PDF tools for my personal use. |
0 commit comments